Horizontal Scaling
Scale your application horizontally by running multiple container instances behind a load balancer.
Basic Scaling
Set the number of replicas in your configuration:
name: "my-scalable-app"
image: "my-org/my-app:1.2.0"
replicas: 3 # Run 3 instances
domains:
- domain: "api.example.com"
name: "my-scalable-app"
image: "my-org/my-app:1.2.0"
replicas: 3 # Run 3 instances
domains:
- domain: "api.example.com"
Haloy automatically:
- Starts the specified number of containers
- Configures load balancing in the built-in reverse proxy
- Distributes traffic across all healthy instances
- Waits for any configured readiness stabilization window before routing to new replicas
- Monitors health checks
Load Balancing
The built-in reverse proxy distributes traffic using round-robin by default, sending requests to each healthy container in turn.
Traffic Flow
Internet → haloyd (80/443) → Container 1 (8080)
→ Container 2 (8080)
→ Container 3 (8080)
Internet → haloyd (80/443) → Container 1 (8080)
→ Container 2 (8080)
→ Container 3 (8080)
All containers receive approximately equal traffic.
Scaling Strategies
Start Small, Scale Up
Begin with fewer replicas and increase as needed:
# Initial deployment
name: "my-app"
replicas: 1
# After load testing
replicas: 3
# Under heavy load
replicas: 10
# Initial deployment
name: "my-app"
replicas: 1
# After load testing
replicas: 3
# Under heavy load
replicas: 10
Environment-Based Scaling
Scale differently per environment:
name: "my-app"
replicas: 1 # Default
targets:
production:
server: prod.haloy.com
replicas: 10 # High capacity
domains:
- domain: "my-app.com"
staging:
server: staging.haloy.com
replicas: 2 # Moderate capacity
domains:
- domain: "staging.my-app.com"
development:
server: dev.haloy.com
replicas: 1 # Minimal resources
domains:
- domain: "dev.my-app.com"
name: "my-app"
replicas: 1 # Default
targets:
production:
server: prod.haloy.com
replicas: 10 # High capacity
domains:
- domain: "my-app.com"
staging:
server: staging.haloy.com
replicas: 2 # Moderate capacity
domains:
- domain: "staging.my-app.com"
development:
server: dev.haloy.com
replicas: 1 # Minimal resources
domains:
- domain: "dev.my-app.com"
High Availability
For mission-critical applications, run at least 3 replicas:
name: "critical-app"
replicas: 3 # Minimum for HA
deployment_strategy: "rolling" # Zero-downtime updates
health_check_path: "/health"
image: "my-org/critical-app:v2.0.0"
domains:
- domain: "critical-app.com"
name: "critical-app"
replicas: 3 # Minimum for HA
deployment_strategy: "rolling" # Zero-downtime updates
health_check_path: "/health"
image: "my-org/critical-app:v2.0.0"
domains:
- domain: "critical-app.com"
Why 3 Replicas?
- Fault tolerance: Can lose 1 container and maintain service
- Rolling deployments: Can update without downtime
- Load distribution: Better traffic handling
- Redundancy: Protection against failures
Resource Considerations
Each replica consumes:
- CPU
- Memory
- Disk I/O
- Network bandwidth
Example Resource Planning
# If each container uses:
# - 512MB RAM
# - 0.5 CPU cores
# For 5 replicas you need:
# - 2.5GB RAM
# - 2.5 CPU cores (plus overhead)
# If each container uses:
# - 512MB RAM
# - 0.5 CPU cores
# For 5 replicas you need:
# - 2.5GB RAM
# - 2.5 CPU cores (plus overhead)
Plan server resources accordingly:
name: "resource-intensive-app"
replicas: 5
# Ensure your server has sufficient:
# - At least 4GB RAM available
# - At least 4 CPU cores available
name: "resource-intensive-app"
replicas: 5
# Ensure your server has sufficient:
# - At least 4GB RAM available
# - At least 4 CPU cores available
Health Checks and Scaling
Health checks are critical for scaling:
name: "my-app"
replicas: 5
health_check_path: "/health"
min_ready_seconds: 10
port: "8080"
# haloyd only routes to healthy containers
# Unhealthy containers are automatically excluded
name: "my-app"
replicas: 5
health_check_path: "/health"
min_ready_seconds: 10
port: "8080"
# haloyd only routes to healthy containers
# Unhealthy containers are automatically excluded
Stabilizing New Replicas
If new replicas sometimes pass health checks and then crash shortly afterward, add min_ready_seconds:
name: "my-app"
replicas: 5
deployment_strategy: "rolling"
health_check_path: "/health"
min_ready_seconds: 10
name: "my-app"
replicas: 5
deployment_strategy: "rolling"
health_check_path: "/health"
min_ready_seconds: 10
Haloy only adds a new replica to rotation after it is healthy and has remained up for the configured stabilization window. This is useful for catching late database failures, startup race conditions, or short crash loops during rolling deploys.
Health Check Best Practices
- Check dependencies: Verify database, cache, etc.
- Fast response: Return within 1-2 seconds
- Accurate status: Only return 200 when truly ready
- Log failures: Help debug health issues
Deploying with Replicas
Rolling Deployment (Recommended)
Updates one replica at a time:
name: "my-app"
replicas: 5
deployment_strategy: "rolling"
# Deployment process:
# 1. Start new container 1, wait for health check
# 2. Stop old container 1
# 3. Repeat for containers 2-5
#
# Always have 4-5 containers serving traffic
name: "my-app"
replicas: 5
deployment_strategy: "rolling"
# Deployment process:
# 1. Start new container 1, wait for health check
# 2. Stop old container 1
# 3. Repeat for containers 2-5
#
# Always have 4-5 containers serving traffic
Replace Deployment
Replaces all containers at once:
name: "my-app"
replicas: 5
deployment_strategy: "replace"
# Deployment process:
# 1. Stop all old containers
# 2. Start all new containers
# 3. Brief service interruption
name: "my-app"
replicas: 5
deployment_strategy: "replace"
# Deployment process:
# 1. Stop all old containers
# 2. Start all new containers
# 3. Brief service interruption
Monitoring Scaled Applications
Check status of all replicas:
# View all containers
haloy status
# View logs from all replicas
haloy logs --all-containers
# View all containers
haloy status
# View logs from all replicas
haloy logs --all-containers
Geographic Scaling
Scale across multiple regions:
name: "global-api"
image: "my-org/api:v2.0.0"
targets:
us-east:
server: us-east.haloy.com
replicas: 5
domains:
- domain: "us.api.example.com"
env:
- name: "REGION"
value: "us-east-1"
eu-west:
server: eu-west.haloy.com
replicas: 5
domains:
- domain: "eu.api.example.com"
env:
- name: "REGION"
value: "eu-west-1"
asia-pacific:
server: ap-southeast.haloy.com
replicas: 5
domains:
- domain: "ap.api.example.com"
env:
- name: "REGION"
value: "ap-southeast-1"
name: "global-api"
image: "my-org/api:v2.0.0"
targets:
us-east:
server: us-east.haloy.com
replicas: 5
domains:
- domain: "us.api.example.com"
env:
- name: "REGION"
value: "us-east-1"
eu-west:
server: eu-west.haloy.com
replicas: 5
domains:
- domain: "eu.api.example.com"
env:
- name: "REGION"
value: "eu-west-1"
asia-pacific:
server: ap-southeast.haloy.com
replicas: 5
domains:
- domain: "ap.api.example.com"
env:
- name: "REGION"
value: "ap-southeast-1"
Stateless Applications
For best scaling results, design stateless applications:
Good - Stateless:
- Session data in Redis/database
- Files in object storage (S3, etc.)
- No local caching (or distributed cache)
- Request data self-contained
Bad - Stateful:
- Session data in memory
- Files on local disk
- Local in-memory cache
- Sticky sessions required
Example Stateless App
name: "stateless-api"
replicas: 10
env:
# External session store
- name: "REDIS_URL"
value: "redis://redis-server:6379"
# External file storage
- name: "S3_BUCKET"
value: "my-app-uploads"
# External cache
- name: "MEMCACHED_SERVERS"
value: "memcached-1:11211,memcached-2:11211"
# No volumes needed - fully stateless
name: "stateless-api"
replicas: 10
env:
# External session store
- name: "REDIS_URL"
value: "redis://redis-server:6379"
# External file storage
- name: "S3_BUCKET"
value: "my-app-uploads"
# External cache
- name: "MEMCACHED_SERVERS"
value: "memcached-1:11211,memcached-2:11211"
# No volumes needed - fully stateless
Stopping Scaled Applications
Stop all replicas:
haloy stop
# Remove containers after stopping
haloy stop --remove-containers
haloy stop
# Remove containers after stopping
haloy stop --remove-containers
Best Practices
- Start with 1 replica: Test before scaling
- Scale based on metrics: Monitor CPU, memory, request latency
- Use odd numbers: 3, 5, 7 for better consensus/distribution
- Design for stateless: Enables unlimited horizontal scaling
- Implement health checks: Critical for load balancing
- Use
min_ready_secondsfor unstable startups: Prevents a replica from entering rotation too early - Monitor all replicas: Ensure even load distribution
- Plan resources: Ensure server can handle all replicas
- Use rolling deployments: Maintain availability during updates
Troubleshooting
Uneven Load Distribution
If some containers get more traffic:
-
Check all containers are healthy:
haloy statushaloy status -
Verify health check responses are fast
-
Review application logs for slow requests
Out of Resources
If deployment fails due to resources:
# Reduce replica count
replicas: 3 # Down from 10
# Or upgrade server resources
# - Add more RAM
# - Add more CPU cores
# Reduce replica count
replicas: 3 # Down from 10
# Or upgrade server resources
# - Add more RAM
# - Add more CPU cores
Container Crash Loops
If containers keep restarting:
# Check logs from all replicas
haloy logs --all-containers
# Common issues:
# - Missing environment variables
# - Database connection failures
# - Port conflicts
# - Resource limits (OOM)
# Check logs from all replicas
haloy logs --all-containers
# Common issues:
# - Missing environment variables
# - Database connection failures
# - Port conflicts
# - Resource limits (OOM)
Next Steps
Stay updated on Haloy
Get notified about new docs, deployment patterns, and Haloy updates.