How We Handle a Million Calls a Day: Kamailio Architecture
When we were tasked with building a VoIP platform for Infonas in Bahrain, the requirement was clear: handle millions of daily calls with carrier-grade reliability. Here's how we did it.
The Challenge
Telecom-grade requirements:
- 1M+ calls per day
- 99.99% uptime
- Sub-100ms call setup time
- IP-based carrier integration
- Real-time billing
- Regulatory compliance
Architecture Overview
[SIP Carriers]
↓
[Kamailio Load Balancer] (Dispatcher + Rate Limiting)
↓
[FreeSWITCH Cluster] (Media Processing + ESL)
↓
[PostgreSQL] (CDR + Billing)
↓
[Prometheus + Grafana] (Monitoring)Kamailio: The SIP Router
Kamailio handles all SIP routing logic. No media processing - that's FreeSWITCH's job.
Basic Configuration
# kamailio.cfg
listen=udp:0.0.0.0:5060
listen=tcp:0.0.0.0:5060
loadmodule "dispatcher.so"
loadmodule "rtpengine.so"
# Dispatcher configuration
modparam("dispatcher", "list_file", "/etc/kamailio/dispatcher.list")
modparam("dispatcher", "flags", 2)
modparam("dispatcher", "dst_avp", "$avp(dst)")IP-Based Carrier Authentication
# Whitelist carriers by IP
route[AUTH] {
if (!($si == "203.0.113.10" ||
$si == "203.0.113.11")) {
sl_send_reply("403", "Forbidden");
exit;
}
}Load Balancing Logic
route[LOAD_BALANCE] {
# Algorithm 4: round-robin with failover
if (!ds_select_dst("1", "4")) {
send_reply("503", "No FreeSWITCH Available");
exit;
}
# Set failover route
t_on_failure("FAILOVER");
forward();
}
failure_route[FAILOVER] {
if (t_check_status("(408)|(5[0-9][0-9])")) {
# Try next FreeSWITCH
if (ds_next_dst()) {
t_relay();
}
}
}Rate Limiting
Critical for preventing abuse:
loadmodule "pike.so"
modparam("pike", "sampling_time_unit", 2)
modparam("pike", "reqs_density_per_unit", 30)
route[RATE_LIMIT] {
# Per-IP rate limiting
if (!pike_check_req()) {
xlog("L_ALERT", "RATE LIMIT: $si");
send_reply("503", "Too Many Requests");
exit;
}
}FreeSWITCH: Media Processing
FreeSWITCH handles:
- RTP media
- Transcoding
- Recording
- IVR
- Conferencing
ESL (Event Socket Layer)
We use ESL for programmatic control:
package main
import (
"github.com/fiorix/go-eventsocket/eventsocket"
)
func main() {
c, err := eventsocket.Dial("localhost:8021", "ClueCon")
if err != nil {
panic(err)
}
// Originate call
c.Send("api originate {origination_caller_id_number=1234}user/1000 &park")
// Subscribe to events
c.Send("event plain CHANNEL_HANGUP")
for {
ev, err := c.ReadEvent()
if err != nil {
break
}
handleEvent(ev)
}
}Outbound Campaigns
Batch calling with rate control:
func runCampaign(contacts []Contact) {
limiter := rate.NewLimiter(30, 1) // 30 calls/sec
for _, contact := range contacts {
limiter.Wait(context.Background())
go func(c Contact) {
conn, _ := eventsocket.Dial("localhost:8021", "ClueCon")
cmd := fmt.Sprintf(
"api originate {origination_caller_id_number=%s}sofia/gateway/carrier/%s &playback(/path/to/message.wav)",
c.CallerID, c.Number
)
conn.Send(cmd)
}(contact)
}
}PostgreSQL: The Brain
Centralized database for everything:
-- CDR Table
CREATE TABLE cdr (
id BIGSERIAL PRIMARY KEY,
call_id VARCHAR(255) UNIQUE,
caller_id VARCHAR(50),
destination VARCHAR(50),
start_time TIMESTAMP,
answer_time TIMESTAMP,
end_time TIMESTAMP,
duration INTEGER,
billsec INTEGER,
disposition VARCHAR(20),
carrier_id INTEGER,
rate DECIMAL(10,4),
cost DECIMAL(10,4)
);
-- Auto-backup with pg_dump cron
0 2 * * * pg_dump voip_db | gzip > /backup/voip_$(date +\%Y\%m\%d).sql.gzReal-time Billing
-- Trigger for automatic cost calculation
CREATE OR REPLACE FUNCTION calculate_cost()
RETURNS TRIGGER AS $$
BEGIN
NEW.cost := NEW.billsec * (
SELECT rate FROM rate_table
WHERE prefix = get_prefix(NEW.destination)
) / 60;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER before_cdr_insert
BEFORE INSERT ON cdr
FOR EACH ROW
EXECUTE FUNCTION calculate_cost();Monitoring with Prometheus
Kamailio Metrics
# kamailio_exporter metrics kamailio_core_requests_total kamailio_core_replies_total kamailio_tm_current_transactions kamailio_shmem_used_bytes
Custom Metrics
var (
callsTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "voip_calls_total",
Help: "Total calls processed",
},
[]string{"direction", "disposition"},
)
callDuration = prometheus.NewHistogram(
prometheus.HistogramOpts{
Name: "voip_call_duration_seconds",
Help: "Call duration distribution",
Buckets: []float64{10, 30, 60, 120, 300, 600},
},
)
)Grafana Dashboard
Key panels:
- Active calls (real-time)
- Calls per second
- Average call duration
- Answer seizure ratio (ASR)
- Post-dial delay (PDD)
- System resource usage
Deployment Architecture
High Availability
2x Kamailio (Active-Active with Keepalived) 4x FreeSWITCH (Load balanced) 2x PostgreSQL (Primary-Replica with Patroni)
Docker Compose
version: "3.8"
services:
kamailio:
image: kamailio/kamailio:latest
network_mode: host
volumes:
- ./kamailio.cfg:/etc/kamailio/kamailio.cfg
freeswitch:
image: signalwire/freeswitch:latest
ports:
- "5060:5060/udp"
- "8021:8021"
- "16384-16394:16384-16394/udp"
volumes:
- ./freeswitch:/etc/freeswitch
postgres:
image: postgres:15
environment:
POSTGRES_DB: voip_db
volumes:
- pgdata:/var/lib/postgresql/dataPerformance Tuning
Kernel Parameters
# /etc/sysctl.conf
net.ipv4.ip_local_port_range = 10000 65535
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864FreeSWITCH Tuning
<!-- switch.conf.xml -->
<param name="max-sessions" value="10000"/>
<param name="sessions-per-second" value="100"/>
<param name="rtp-start-port" value="16384"/>
<param name="rtp-end-port" value="32768"/>Results
After 2 years in production:
- 1.5M+ calls/day processed
- 99.98% uptime achieved
- <50ms average call setup
- $0 in downtime costs
- 25% cost reduction vs previous system
Lessons Learned
- Kamailio for routing, FreeSWITCH for media - Never mix concerns
- IP authentication is faster than digest for carriers
- PostgreSQL is sufficient - No need for complex time-series DBs
- Monitor everything - Prometheus + Grafana is a lifesaver
- Cron jobs for billing - Monthly billing automation critical
Conclusion
Building carrier-grade VoIP isn't magic - it's careful architecture, monitoring, and operational discipline. This system has processed 500M+ calls with minimal intervention.
Want to discuss your VoIP architecture? Get in touch.

