Horizontal Scaling

Scale your application horizontally by running multiple container instances behind a load balancer.

Basic Scaling

Set the number of replicas in your configuration:

name: "my-scalable-app" image: "my-org/my-app:1.2.0" replicas: 3 # Run 3 instances domains: - domain: "api.example.com"

Haloy automatically:

  • Starts the specified number of containers
  • Configures load balancing in the built-in reverse proxy
  • Distributes traffic across all healthy instances
  • Waits for any configured readiness stabilization window before routing to new replicas
  • Monitors health checks

Load Balancing

The built-in reverse proxy distributes traffic using round-robin by default, sending requests to each healthy container in turn.

Traffic Flow

Internet haloyd (80/443) → Container 1 (8080) Container 2 (8080) Container 3 (8080)

All containers receive approximately equal traffic.

Scaling Strategies

Start Small, Scale Up

Begin with fewer replicas and increase as needed:

# Initial deployment name: "my-app" replicas: 1 # After load testing replicas: 3 # Under heavy load replicas: 10

Environment-Based Scaling

Scale differently per environment:

name: "my-app" replicas: 1 # Default targets: production: server: prod.haloy.com replicas: 10 # High capacity domains: - domain: "my-app.com" staging: server: staging.haloy.com replicas: 2 # Moderate capacity domains: - domain: "staging.my-app.com" development: server: dev.haloy.com replicas: 1 # Minimal resources domains: - domain: "dev.my-app.com"

High Availability

For mission-critical applications, run at least 3 replicas:

name: "critical-app" replicas: 3 # Minimum for HA deployment_strategy: "rolling" # Zero-downtime updates health_check_path: "/health" image: "my-org/critical-app:v2.0.0" domains: - domain: "critical-app.com"

Why 3 Replicas?

  • Fault tolerance: Can lose 1 container and maintain service
  • Rolling deployments: Can update without downtime
  • Load distribution: Better traffic handling
  • Redundancy: Protection against failures

Resource Considerations

Each replica consumes:

  • CPU
  • Memory
  • Disk I/O
  • Network bandwidth

Example Resource Planning

# If each container uses: # - 512MB RAM # - 0.5 CPU cores # For 5 replicas you need: # - 2.5GB RAM # - 2.5 CPU cores (plus overhead)

Plan server resources accordingly:

name: "resource-intensive-app" replicas: 5 # Ensure your server has sufficient: # - At least 4GB RAM available # - At least 4 CPU cores available

Health Checks and Scaling

Health checks are critical for scaling:

name: "my-app" replicas: 5 health_check_path: "/health" min_ready_seconds: 10 port: "8080" # haloyd only routes to healthy containers # Unhealthy containers are automatically excluded

Stabilizing New Replicas

If new replicas sometimes pass health checks and then crash shortly afterward, add min_ready_seconds:

name: "my-app" replicas: 5 deployment_strategy: "rolling" health_check_path: "/health" min_ready_seconds: 10

Haloy only adds a new replica to rotation after it is healthy and has remained up for the configured stabilization window. This is useful for catching late database failures, startup race conditions, or short crash loops during rolling deploys.

Health Check Best Practices

  1. Check dependencies: Verify database, cache, etc.
  2. Fast response: Return within 1-2 seconds
  3. Accurate status: Only return 200 when truly ready
  4. Log failures: Help debug health issues

Deploying with Replicas

Updates one replica at a time:

name: "my-app" replicas: 5 deployment_strategy: "rolling" # Deployment process: # 1. Start new container 1, wait for health check # 2. Stop old container 1 # 3. Repeat for containers 2-5 # # Always have 4-5 containers serving traffic

Replace Deployment

Replaces all containers at once:

name: "my-app" replicas: 5 deployment_strategy: "replace" # Deployment process: # 1. Stop all old containers # 2. Start all new containers # 3. Brief service interruption

Monitoring Scaled Applications

Check status of all replicas:

# View all containers haloy status # View logs from all replicas haloy logs --all-containers

Geographic Scaling

Scale across multiple regions:

name: "global-api" image: "my-org/api:v2.0.0" targets: us-east: server: us-east.haloy.com replicas: 5 domains: - domain: "us.api.example.com" env: - name: "REGION" value: "us-east-1" eu-west: server: eu-west.haloy.com replicas: 5 domains: - domain: "eu.api.example.com" env: - name: "REGION" value: "eu-west-1" asia-pacific: server: ap-southeast.haloy.com replicas: 5 domains: - domain: "ap.api.example.com" env: - name: "REGION" value: "ap-southeast-1"

Stateless Applications

For best scaling results, design stateless applications:

Good - Stateless:

  • Session data in Redis/database
  • Files in object storage (S3, etc.)
  • No local caching (or distributed cache)
  • Request data self-contained

Bad - Stateful:

  • Session data in memory
  • Files on local disk
  • Local in-memory cache
  • Sticky sessions required

Example Stateless App

name: "stateless-api" replicas: 10 env: # External session store - name: "REDIS_URL" value: "redis://redis-server:6379" # External file storage - name: "S3_BUCKET" value: "my-app-uploads" # External cache - name: "MEMCACHED_SERVERS" value: "memcached-1:11211,memcached-2:11211" # No volumes needed - fully stateless

Stopping Scaled Applications

Stop all replicas:

haloy stop # Remove containers after stopping haloy stop --remove-containers

Best Practices

  1. Start with 1 replica: Test before scaling
  2. Scale based on metrics: Monitor CPU, memory, request latency
  3. Use odd numbers: 3, 5, 7 for better consensus/distribution
  4. Design for stateless: Enables unlimited horizontal scaling
  5. Implement health checks: Critical for load balancing
  6. Use min_ready_seconds for unstable startups: Prevents a replica from entering rotation too early
  7. Monitor all replicas: Ensure even load distribution
  8. Plan resources: Ensure server can handle all replicas
  9. Use rolling deployments: Maintain availability during updates

Troubleshooting

Uneven Load Distribution

If some containers get more traffic:

  1. Check all containers are healthy:

    haloy status
  2. Verify health check responses are fast

  3. Review application logs for slow requests

Out of Resources

If deployment fails due to resources:

# Reduce replica count replicas: 3 # Down from 10 # Or upgrade server resources # - Add more RAM # - Add more CPU cores

Container Crash Loops

If containers keep restarting:

# Check logs from all replicas haloy logs --all-containers # Common issues: # - Missing environment variables # - Database connection failures # - Port conflicts # - Resource limits (OOM)

Next Steps

Stay updated on Haloy

Get notified about new docs, deployment patterns, and Haloy updates.