Back to Blog
Telecom & VoIP

How We Handle a Million Calls a Day: Kamailio Architecture

Temkin Mengistu
Temkin Mengistu
Snapwre Engineering
March 10, 2026
15 min read
How We Handle a Million Calls a Day: Kamailio Architecture

How We Handle a Million Calls a Day: Kamailio Architecture

When we were tasked with building a VoIP platform for Infonas in Bahrain, the requirement was clear: handle millions of daily calls with carrier-grade reliability. Here's how we did it.

The Challenge

Telecom-grade requirements:

  • 1M+ calls per day
  • 99.99% uptime
  • Sub-100ms call setup time
  • IP-based carrier integration
  • Real-time billing
  • Regulatory compliance

Architecture Overview

[SIP Carriers]
    ↓
[Kamailio Load Balancer] (Dispatcher + Rate Limiting)
    ↓
[FreeSWITCH Cluster] (Media Processing + ESL)
    ↓
[PostgreSQL] (CDR + Billing)
    ↓
[Prometheus + Grafana] (Monitoring)

Kamailio: The SIP Router

Kamailio handles all SIP routing logic. No media processing - that's FreeSWITCH's job.

Basic Configuration

# kamailio.cfg
listen=udp:0.0.0.0:5060
listen=tcp:0.0.0.0:5060

loadmodule "dispatcher.so"
loadmodule "rtpengine.so"

# Dispatcher configuration
modparam("dispatcher", "list_file", "/etc/kamailio/dispatcher.list")
modparam("dispatcher", "flags", 2)
modparam("dispatcher", "dst_avp", "$avp(dst)")

IP-Based Carrier Authentication

# Whitelist carriers by IP
route[AUTH] {
    if (!($si == "203.0.113.10" ||
          $si == "203.0.113.11")) {
        sl_send_reply("403", "Forbidden");
        exit;
    }
}

Load Balancing Logic

route[LOAD_BALANCE] {
    # Algorithm 4: round-robin with failover
    if (!ds_select_dst("1", "4")) {
        send_reply("503", "No FreeSWITCH Available");
        exit;
    }

    # Set failover route
    t_on_failure("FAILOVER");

    forward();
}

failure_route[FAILOVER] {
    if (t_check_status("(408)|(5[0-9][0-9])")) {
        # Try next FreeSWITCH
        if (ds_next_dst()) {
            t_relay();
        }
    }
}

Rate Limiting

Critical for preventing abuse:

loadmodule "pike.so"

modparam("pike", "sampling_time_unit", 2)
modparam("pike", "reqs_density_per_unit", 30)

route[RATE_LIMIT] {
    # Per-IP rate limiting
    if (!pike_check_req()) {
        xlog("L_ALERT", "RATE LIMIT: $si");
        send_reply("503", "Too Many Requests");
        exit;
    }
}

FreeSWITCH: Media Processing

FreeSWITCH handles:

  • RTP media
  • Transcoding
  • Recording
  • IVR
  • Conferencing

ESL (Event Socket Layer)

We use ESL for programmatic control:

package main import ( "github.com/fiorix/go-eventsocket/eventsocket" ) func main() { c, err := eventsocket.Dial("localhost:8021", "ClueCon") if err != nil { panic(err) } // Originate call c.Send("api originate {origination_caller_id_number=1234}user/1000 &park") // Subscribe to events c.Send("event plain CHANNEL_HANGUP") for { ev, err := c.ReadEvent() if err != nil { break } handleEvent(ev) } }

Outbound Campaigns

Batch calling with rate control:

func runCampaign(contacts []Contact) { limiter := rate.NewLimiter(30, 1) // 30 calls/sec for _, contact := range contacts { limiter.Wait(context.Background()) go func(c Contact) { conn, _ := eventsocket.Dial("localhost:8021", "ClueCon") cmd := fmt.Sprintf( "api originate {origination_caller_id_number=%s}sofia/gateway/carrier/%s &playback(/path/to/message.wav)", c.CallerID, c.Number ) conn.Send(cmd) }(contact) } }

PostgreSQL: The Brain

Centralized database for everything:

-- CDR Table CREATE TABLE cdr ( id BIGSERIAL PRIMARY KEY, call_id VARCHAR(255) UNIQUE, caller_id VARCHAR(50), destination VARCHAR(50), start_time TIMESTAMP, answer_time TIMESTAMP, end_time TIMESTAMP, duration INTEGER, billsec INTEGER, disposition VARCHAR(20), carrier_id INTEGER, rate DECIMAL(10,4), cost DECIMAL(10,4) ); -- Auto-backup with pg_dump cron 0 2 * * * pg_dump voip_db | gzip > /backup/voip_$(date +\%Y\%m\%d).sql.gz

Real-time Billing

-- Trigger for automatic cost calculation CREATE OR REPLACE FUNCTION calculate_cost() RETURNS TRIGGER AS $$ BEGIN NEW.cost := NEW.billsec * ( SELECT rate FROM rate_table WHERE prefix = get_prefix(NEW.destination) ) / 60; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER before_cdr_insert BEFORE INSERT ON cdr FOR EACH ROW EXECUTE FUNCTION calculate_cost();

Monitoring with Prometheus

Kamailio Metrics

# kamailio_exporter metrics
kamailio_core_requests_total
kamailio_core_replies_total
kamailio_tm_current_transactions
kamailio_shmem_used_bytes

Custom Metrics

var ( callsTotal = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "voip_calls_total", Help: "Total calls processed", }, []string{"direction", "disposition"}, ) callDuration = prometheus.NewHistogram( prometheus.HistogramOpts{ Name: "voip_call_duration_seconds", Help: "Call duration distribution", Buckets: []float64{10, 30, 60, 120, 300, 600}, }, ) )

Grafana Dashboard

Key panels:

  • Active calls (real-time)
  • Calls per second
  • Average call duration
  • Answer seizure ratio (ASR)
  • Post-dial delay (PDD)
  • System resource usage

Deployment Architecture

High Availability

2x Kamailio (Active-Active with Keepalived)
4x FreeSWITCH (Load balanced)
2x PostgreSQL (Primary-Replica with Patroni)

Docker Compose

version: "3.8" services: kamailio: image: kamailio/kamailio:latest network_mode: host volumes: - ./kamailio.cfg:/etc/kamailio/kamailio.cfg freeswitch: image: signalwire/freeswitch:latest ports: - "5060:5060/udp" - "8021:8021" - "16384-16394:16384-16394/udp" volumes: - ./freeswitch:/etc/freeswitch postgres: image: postgres:15 environment: POSTGRES_DB: voip_db volumes: - pgdata:/var/lib/postgresql/data

Performance Tuning

Kernel Parameters

# /etc/sysctl.conf net.ipv4.ip_local_port_range = 10000 65535 net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864

FreeSWITCH Tuning

<!-- switch.conf.xml --> <param name="max-sessions" value="10000"/> <param name="sessions-per-second" value="100"/> <param name="rtp-start-port" value="16384"/> <param name="rtp-end-port" value="32768"/>

Results

After 2 years in production:

  • 1.5M+ calls/day processed
  • 99.98% uptime achieved
  • <50ms average call setup
  • $0 in downtime costs
  • 25% cost reduction vs previous system

Lessons Learned

  1. Kamailio for routing, FreeSWITCH for media - Never mix concerns
  2. IP authentication is faster than digest for carriers
  3. PostgreSQL is sufficient - No need for complex time-series DBs
  4. Monitor everything - Prometheus + Grafana is a lifesaver
  5. Cron jobs for billing - Monthly billing automation critical

Conclusion

Building carrier-grade VoIP isn't magic - it's careful architecture, monitoring, and operational discipline. This system has processed 500M+ calls with minimal intervention.

Want to discuss your VoIP architecture? Get in touch.

Tags

KamailioFreeSWITCHAsteriskVoIPSIPTelecom