Horizontal Scaling

Scale your application horizontally by running multiple container instances behind a load balancer.

Basic Scaling

Set the number of replicas in your configuration:

name: "my-scalable-app" image: repository: "my-org/my-app" tag: "1.2.0" replicas: 3 # Run 3 instances domains: - domain: "api.example.com" acme_email: "you@email.com"

Haloy automatically:

  • Starts the specified number of containers
  • Configures HAProxy load balancing
  • Distributes traffic across all healthy instances
  • Monitors health checks

Load Balancing

HAProxy distributes traffic using round-robin by default, sending requests to each healthy container in turn.

Traffic Flow

Internet HAProxy (80/443) → Container 1 (8080) Container 2 (8080) Container 3 (8080)

All containers receive approximately equal traffic.

Scaling Strategies

Start Small, Scale Up

Begin with fewer replicas and increase as needed:

# Initial deployment name: "my-app" replicas: 1 # After load testing replicas: 3 # Under heavy load replicas: 10

Environment-Based Scaling

Scale differently per environment:

name: "my-app" replicas: 1 # Default targets: production: server: prod.haloy.com replicas: 10 # High capacity domains: - domain: "my-app.com" staging: server: staging.haloy.com replicas: 2 # Moderate capacity domains: - domain: "staging.my-app.com" development: server: dev.haloy.com replicas: 1 # Minimal resources domains: - domain: "dev.my-app.com"

High Availability

For mission-critical applications, run at least 3 replicas:

name: "critical-app" replicas: 3 # Minimum for HA deployment_strategy: "rolling" # Zero-downtime updates health_check_path: "/health" image: repository: "my-org/critical-app" tag: "v2.0.0" domains: - domain: "critical-app.com" acme_email: "admin@critical-app.com"

Why 3 Replicas?

  • Fault tolerance: Can lose 1 container and maintain service
  • Rolling deployments: Can update without downtime
  • Load distribution: Better traffic handling
  • Redundancy: Protection against failures

Resource Considerations

Each replica consumes:

  • CPU
  • Memory
  • Disk I/O
  • Network bandwidth

Example Resource Planning

# If each container uses: # - 512MB RAM # - 0.5 CPU cores # For 5 replicas you need: # - 2.5GB RAM # - 2.5 CPU cores (plus overhead)

Plan server resources accordingly:

name: "resource-intensive-app" replicas: 5 # Ensure your server has sufficient: # - At least 4GB RAM available # - At least 4 CPU cores available

Health Checks and Scaling

Health checks are critical for scaling:

name: "my-app" replicas: 5 health_check_path: "/health" port: "8080" # HAProxy only routes to healthy containers # Unhealthy containers are automatically excluded

Health Check Best Practices

  1. Check dependencies: Verify database, cache, etc.
  2. Fast response: Return within 1-2 seconds
  3. Accurate status: Only return 200 when truly ready
  4. Log failures: Help debug health issues

Deploying with Replicas

Updates one replica at a time:

name: "my-app" replicas: 5 deployment_strategy: "rolling" # Deployment process: # 1. Start new container 1, wait for health check # 2. Stop old container 1 # 3. Repeat for containers 2-5 # # Always have 4-5 containers serving traffic

Replace Deployment

Replaces all containers at once:

name: "my-app" replicas: 5 deployment_strategy: "replace" # Deployment process: # 1. Stop all old containers # 2. Start all new containers # 3. Brief service interruption

Monitoring Scaled Applications

Check status of all replicas:

# View all containers haloy status # View logs from all replicas haloy logs

Geographic Scaling

Scale across multiple regions:

name: "global-api" image: repository: "my-org/api" tag: "v2.0.0" targets: us-east: server: us-east.haloy.com replicas: 5 domains: - domain: "us.api.example.com" env: - name: "REGION" value: "us-east-1" eu-west: server: eu-west.haloy.com replicas: 5 domains: - domain: "eu.api.example.com" env: - name: "REGION" value: "eu-west-1" asia-pacific: server: ap-southeast.haloy.com replicas: 5 domains: - domain: "ap.api.example.com" env: - name: "REGION" value: "ap-southeast-1"

Stateless Applications

For best scaling results, design stateless applications:

Good - Stateless:

  • Session data in Redis/database
  • Files in object storage (S3, etc.)
  • No local caching (or distributed cache)
  • Request data self-contained

Bad - Stateful:

  • Session data in memory
  • Files on local disk
  • Local in-memory cache
  • Sticky sessions required

Example Stateless App

name: "stateless-api" replicas: 10 env: # External session store - name: "REDIS_URL" value: "redis://redis-server:6379" # External file storage - name: "S3_BUCKET" value: "my-app-uploads" # External cache - name: "MEMCACHED_SERVERS" value: "memcached-1:11211,memcached-2:11211" # No volumes needed - fully stateless

Stopping Scaled Applications

Stop all replicas:

haloy stop # Remove containers after stopping haloy stop --remove-containers

Best Practices

  1. Start with 1 replica: Test before scaling
  2. Scale based on metrics: Monitor CPU, memory, request latency
  3. Use odd numbers: 3, 5, 7 for better consensus/distribution
  4. Design for stateless: Enables unlimited horizontal scaling
  5. Implement health checks: Critical for load balancing
  6. Monitor all replicas: Ensure even load distribution
  7. Plan resources: Ensure server can handle all replicas
  8. Use rolling deployments: Maintain availability during updates

Troubleshooting

Uneven Load Distribution

If some containers get more traffic:

  1. Check all containers are healthy:

    haloy status
  2. Verify health check responses are fast

  3. Review application logs for slow requests

Out of Resources

If deployment fails due to resources:

# Reduce replica count replicas: 3 # Down from 10 # Or upgrade server resources # - Add more RAM # - Add more CPU cores

Container Crash Loops

If containers keep restarting:

# Check logs haloy logs # Common issues: # - Missing environment variables # - Database connection failures # - Port conflicts # - Resource limits (OOM)

Next Steps