Building Scalable SaaS: Architecture Lessons from Real Projects
Published: February 15, 2026
Author: Vaelix Team
Category: Engineering
Read Time: 8 min read
Introduction
Building a SaaS platform that scales from 10 to 10,000 users requires careful architectural decisions from day one. Over the past year, we've helped multiple clients navigate this journey — from early-stage startups to established enterprises. Here are the key lessons we've learned.
The Multi-Tenancy Decision
One of the first architectural decisions you'll face is how to handle multi-tenancy. We've implemented three different approaches across our projects:
Database-per-Tenant (Highest Isolation)
Used in our Second Opinion healthcare platform for HIPAA compliance:
// Tenant-specific database connection
const getTenantDB = (tenantId) => {
return new Database({
host: process.env.DB_HOST,
database: `tenant_${tenantId}`,
// ... connection config
});
};
Pros:
- Maximum data isolation
- Easy to backup/restore individual tenants
- Compliance-friendly (HIPAA, SOC 2)
Cons:
- Higher infrastructure costs
- Complex migrations across all databases
- Harder to implement cross-tenant analytics
Schema-per-Tenant (Balanced Approach)
Implemented in our NovaPay fintech platform:
-- Each tenant gets their own schema
CREATE SCHEMA tenant_acme;
CREATE TABLE tenant_acme.users (...);
CREATE TABLE tenant_acme.transactions (...);
Pros:
- Good isolation with shared infrastructure
- Easier migrations than database-per-tenant
- Cost-effective for mid-market SaaS
Cons:
- Still requires tenant-aware queries
- Schema proliferation can complicate monitoring
Shared Schema with Tenant ID (Most Scalable)
Our approach for HelpDesk Pro AI support platform:
// Every table has a tenant_id column
const getTickets = async (tenantId, userId) => {
return db.tickets.findMany({
where: {
tenantId,
assignedTo: userId,
},
});
};
Pros:
- Simplest to implement and maintain
- Best for horizontal scaling
- Easy cross-tenant analytics
Cons:
- Requires careful query filtering
- Risk of data leakage if not implemented correctly
- Row-level security is critical
Billing & Subscription Management
We've integrated Stripe for subscription billing across multiple projects. Here's our battle-tested approach:
Usage-Based Billing Architecture
From our Quantum Supply Chain optimizer:
// Track usage events in real-time
const trackUsage = async (tenantId, metric, quantity) => {
await stripe.subscriptionItems.createUsageRecord(
subscriptionItemId,
{
quantity,
timestamp: Math.floor(Date.now() / 1000),
action: 'increment',
}
);
// Also store in your database for analytics
await db.usageEvents.create({
tenantId,
metric,
quantity,
timestamp: new Date(),
});
};
Key Lessons:
- Always store usage data locally — Don't rely solely on Stripe for analytics
- Implement soft limits — Warn users before hard cutoffs
- Handle failed payments gracefully — Give users time to update payment methods
- Prorate everything — Users expect fair billing when upgrading/downgrading
Observability: The Non-Negotiable
Every SaaS platform we build includes comprehensive observability from day one.
Our Standard Stack
# docker-compose.yml excerpt
services:
prometheus:
image: prom/prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
loki:
image: grafana/loki
ports:
- "3100:3100"
Critical Metrics to Track
From our ShopSphere e-commerce migration:
-
Business Metrics
- Active users per tenant
- Revenue per tenant
- Feature adoption rates
- Churn indicators
-
Technical Metrics
- API response times (p50, p95, p99)
- Error rates by endpoint
- Database query performance
- Cache hit rates
-
Infrastructure Metrics
- CPU/Memory per service
- Database connection pool usage
- Queue depths
- CDN cache hit rates
Structured Logging
// Use structured logging for easy querying
logger.info('user_action', {
tenantId: tenant.id,
userId: user.id,
action: 'subscription_upgraded',
plan: 'enterprise',
mrr_change: 500,
timestamp: new Date().toISOString(),
});
API Design for Scale
Rate Limiting by Tenant
// Redis-based rate limiting
const checkRateLimit = async (tenantId, endpoint) => {
const key = `ratelimit:${tenantId}:${endpoint}`;
const current = await redis.incr(key);
if (current === 1) {
await redis.expire(key, 60); // 1 minute window
}
const limit = await getTenantLimit(tenantId);
return current <= limit;
};
Versioning Strategy
We use URL versioning for clarity:
/api/v1/tickets
/api/v2/tickets
Key principles:
- Support at least 2 versions simultaneously
- Give 6-12 months deprecation notice
- Use feature flags for gradual rollouts
Database Optimization
Connection Pooling
From our NovaPay platform (200K+ users):
// PgBouncer configuration
const pool = new Pool({
host: 'pgbouncer',
port: 6432,
max: 20, // Max connections per instance
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
Read Replicas
// Route reads to replicas
const getUser = async (id) => {
return replicaDB.users.findUnique({ where: { id } });
};
const updateUser = async (id, data) => {
return primaryDB.users.update({ where: { id }, data });
};
Caching Strategy
Three-tier caching from HelpDesk Pro:
// 1. In-memory cache (fastest)
const userCache = new Map();
// 2. Redis cache (shared across instances)
const getCachedUser = async (id) => {
const cached = await redis.get(`user:${id}`);
if (cached) return JSON.parse(cached);
const user = await db.users.findUnique({ where: { id } });
await redis.setex(`user:${id}`, 300, JSON.stringify(user));
return user;
};
// 3. CDN cache (for static assets)
// Configured in next.config.js
Background Jobs & Queues
Bull Queue Setup
// worker.js
const Queue = require('bull');
const emailQueue = new Queue('email', process.env.REDIS_URL);
emailQueue.process(async (job) => {
const { to, subject, body } = job.data;
await sendEmail(to, subject, body);
});
// api.js
const sendWelcomeEmail = async (user) => {
await emailQueue.add({
to: user.email,
subject: 'Welcome to our platform',
body: renderTemplate('welcome', user),
}, {
attempts: 3,
backoff: {
type: 'exponential',
delay: 2000,
},
});
};
Security Best Practices
Row-Level Security (RLS)
PostgreSQL RLS policy from Second Opinion:
-- Enable RLS
ALTER TABLE tickets ENABLE ROW LEVEL SECURITY;
-- Policy: Users can only see their tenant's data
CREATE POLICY tenant_isolation ON tickets
USING (tenant_id = current_setting('app.current_tenant')::uuid);
API Authentication
// JWT-based auth with tenant context
const verifyToken = async (token) => {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
// Attach tenant context to request
return {
userId: decoded.sub,
tenantId: decoded.tenant_id,
role: decoded.role,
};
};
Cost Optimization
Auto-Scaling Configuration
From ShopSphere (99.99% uptime):
# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Database Cost Optimization
- Use connection pooling — Reduced RDS costs by 40%
- Archive old data — Move to S3 after 90 days
- Optimize indexes — Regular EXPLAIN ANALYZE reviews
- Use read replicas — Offload reporting queries
Deployment Strategy
Blue-Green Deployments
# Deploy new version (green)
kubectl apply -f deployment-v2.yaml
# Gradually shift traffic
kubectl patch service api-server -p '{"spec":{"selector":{"version":"v2"}}}'
# Monitor for 30 minutes, rollback if needed
kubectl rollout undo deployment/api-server
Feature Flags
// LaunchDarkly integration
const showNewDashboard = await ldClient.variation(
'new-dashboard',
{ key: user.id, tenantId: tenant.id },
false
);
Real-World Results
Applying these patterns across our projects:
- NovaPay: Scaled from 50K to 200K users with zero downtime
- ShopSphere: Handled 10x traffic during flash sales (99.99% uptime)
- HelpDesk Pro: Reduced infrastructure costs by 35% while improving performance
Key Takeaways
- Choose multi-tenancy model based on compliance needs — Not all approaches fit all use cases
- Invest in observability early — You can't optimize what you can't measure
- Design APIs for versioning — Breaking changes are inevitable
- Implement caching at multiple layers — Database is your bottleneck
- Use managed services — Focus on your product, not infrastructure
- Plan for failure — Circuit breakers, retries, and graceful degradation
- Automate everything — CI/CD, testing, deployments, monitoring
Next Steps
Want to build a scalable SaaS platform? We've helped dozens of companies architect for growth. Get in touch to discuss your project.
Related Case Studies:
- Fintech Platform Redesign — NovaPay
- E-Commerce Scale-Up — ShopSphere
- AI-Powered Customer Support — HelpDesk Pro
Tags: #SaaS #Architecture #Scalability #Multi-tenancy #DevOps