This article is available at: https://www.ebpf.top/post/xdp_lb_demo

Author: Qiu Kang

With the progress of eBPF, we can now deploy eBPF/XDP programs directly on regular servers to achieve load balancing, saving the need for dedicated machines for LVS deployment.

The previous article shared how to use xdp/ebpf to replace LVS for SLB. It adopted the independent machine deployment mode for SLB and loaded xdp program using bpftool and hardcoded configuration, which was version 0.1.

Version 0.2 modified the 0.1 version to a programmatic loading mode based on BPF skeleton. To experience this workflow easily without changing the overall deployment mode of version 0.1, you can check out https://github.com/MageekChiu/xdp4slb/tree/dev-0.2

Version 0.3 added support for dynamically loading SLB configurations in the form of configuration files and command-line parameters.

This article belongs to version 0.4, which supports a mixed deployment mode of SLB and application, eliminating the need for dedicated SLB machines. The mixed deployment mode allows regular machines to perform load balancing directly, without impacting applications (can demonstrate in offload mode), providing cost-effectiveness. Additionally, in scenarios routing locally, the number of routing hops is reduced, resulting in better overall performance.

Creating network environment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Commands vary for different distributions
systemctl start docker

docker network create south --subnet 172.19.0.0/16 --gateway 172.19.0.1

# Check
docker network inspect south
# or
ip link

# Get the bridge of the newly created network using ifconfig
# Subsequently, you can capture all IP packets of this network on the host machine
tcpdump -i br-3512959a6150 ip
# You can also get the veth of a specific container and capture all the packets going in and out of that container
tcpdump -i vethf01d241  ip

Analysis of Principles

SLB Cluster Routing

For high availability, SLB is usually deployed in clusters. How are requests routed to each SLB instance? Generally, (dynamic) routing protocols (OSPF BGP) are used to achieve ECMP, allowing each SLB instance to evenly receive traffic from routers/switches. Since configuring dynamic routing protocols is complex and beyond the scope of this article, a simple script is used here to simulate ECMP.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash

dst="172.19.0.10"
rs1="172.19.0.2"
rs2="172.19.0.3"
ip route del $dst/32
ip route add $dst/32 nexthop via $rs1 dev eth0 weight 1
while true; do
    nexthop=$(ip route show $dst/32 | awk '{print $3}')
    # nexthop=$(ip route show "$dst" | grep -oP "nexthop \K\S+")
    echo "to ${dst} via ${nexthop} now!"
    sleep 3
    
    # the requirements for blank is crazy!
    if [ "$nexthop" = "$rs1" ]; then
        new_nexthop="$rs2"
    else
        new_nexthop="$rs1"
    fi
    ip route del $dst/32
    ip route add $dst/32 nexthop via $new_nexthop dev eth0 weight 1
done

This script repeatedly modifies the next hop to reach the VIP among several hosts (mix, which includes SLB and app), switching back and forth.

NAT Mode

Versions 0.1~0.3 all used full NAT mode, which is no longer suitable in the current mixed mode as it may cause packet looping. Without labeling packets, XDP programs cannot distinguish packets from clients or from another SLB. We adopt the DR mode, which not only avoids looping issues but also provides better performance because:

  1. The return packet has one less hop
  2. Less packet modification is required, as well as no need to recalculate IP, TCP checksums, etc.

The architecture diagram is as follows, simplified for illustrative purposes. There is actually a router/switch between the client and mix, but we used the above simulation script to directly incorporate routing functionality into the client.

image.png

Dark blue represents requests, light blue represents responses. VIP uses eCMP, routing a request only to one mix; SLB on the mix may forward this to a local app (Nginx in this article) or another mix, but the response always goes directly back from the mix and won’t go through other mixes again.

Load Balancing Algorithms

Currently, the following algorithms are supported:

  • random
  • round_roubin
  • hash

This article does not synchronize session states in the SLB cluster, so only the hash algorithm can be selected. This means that regardless of which SLB the request is routed to, it will be forwarded to the same backend app.## SLB Routing Pseudocode

if (dest_ip = local_ip){
	// Directly pass to local protocol stack
	return
}
if (dest_ip = vip && dest_port = vport){
    Select an RS using the load balancing algorithm
	If RS is the local machine, pass it directly to the local protocol stack and return

	Otherwise, set the MAC of rs as the destination of the new packet    
    Also, save the bidirectional mapping between the client and rs
    Facilitates subsequent routing directly

	Set the MAC of the local machine as the source of the new packet

	Route the new packet out
}else{
    Error, drop the packet
}

Configuring SLB and Applications

The Dockerfile for Mix is as follows

FROM debian:bookworm
RUN apt-get update -y && apt-get upgrade -y \
    && apt install -y nginx procps bpftool iproute2 net-tools telnet kmod curl tcpdump

WORKDIR /tmp/
COPY src/slb /tmp/
COPY slb.conf /tmp/

The image modification is because the libc version of my Fedora:37 host is 2.36, while the corresponding version for Debian:Bullseye is 2.31, which cannot directly run the executable compiled on the host.

Build the image and run the app (here is nginx)

docker build -t mageek/mix:0.1 .

# In case you want to run a brand new container
docker rm mix1 mix2 -f

docker run -itd --name mix1 --hostname mix1 --privileged=true \
	--net south -p 8888:80 --ip 172.19.0.2 --mac-address="02:42:ac:13:00:02" \
	-v "$(pwd)"/rs1.html:/var/www/html/index.html:ro mageek/mix:0.1 nginx -g "daemon off;"

docker run -itd --name mix2 --hostname mix2 --privileged=true \
	--net south -p 9999:80 --ip 172.19.0.3 --mac-address="02:42:ac:13:00:03" \
	-v "$(pwd)"/rs2.html:/var/www/html/index.html:ro mageek/mix:0.1 nginx -g "daemon off;"

# Check on the host
docker ps
curl 127.0.0.1:8888
curl 127.0.0.1:9999

Access each container, configure the VIP, after configuring the VIP in Mix, disable arp to avoid affecting packet routing for the client

docker exec -it mix1 bash
docker exec -it mix2 bash

ifconfig lo:0 172.19.0.10/32 up
echo "1">/proc/sys/net/ipv4/conf/all/arp_ignore
echo "1">/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2">/proc/sys/net/ipv4/conf/all/arp_announce
echo "2">/proc/sys/net/ipv4/conf/lo/arp_announce

Then run SLB

# Start SLB and specify the network card and configuration file
./slb -i eth0 -c ./slb.conf

# In another terminal
bpftool prog list
# bpftool prog show name xdp_lb  --pretty

# Check global variables
# bpftool map list
# bpftool map dump name slb_bpf.rodata

# Check attaching with 
ip link

View the log directly on the host machine (one copy for the entire machine), do not open multiple terminals (which may cause incomplete logs)

bpftool prog tracelog

During the testing phase, after compiling the executable on the host machine, copy it to the container (assuming you have created these containers and related networks)


docker start mix1 mix2 client

docker cp src/slb mix1:/tmp/ && \
docker cp slb.conf mix1:/tmp/ && \
docker cp src/slb mix2:/tmp/ && \
docker cp slb.conf mix2:/tmp/ && \
docker cp routing.sh client:/tmp/ 

Testing

Start a new client container

docker run -itd --name client --hostname client --privileged=true \
	--net south -p 10000:80 --ip 172.19.0.9 --mac-address="02:42:ac:13:00:09" \
	-v "$(pwd)"/routing.sh:/tmp/routing.sh mageek/mix:0.1 nginx -g "daemon off;"

Enter the client and configure and run the following routing script

1
2
3
docker exec -it client bash

sh routing.sh

Open another client terminal for request testing

docker exec -it client bash

# Visit rs first
curl 172.19.0.2:80
curl 172.19.0.3:80

# Visit slb 
curl 172.19.0.10:80
rs-1
curl 172.19.0.10:80
rs-2

We can do some load testing inside the client, but remember not to run routing.sh during load testing, as there exists an issue of “intermediate state where the old route is just deleted while the new route is not yet established” causing request failures in concurrent scenarios.

apt-get install apache2-utils

# Concurrent 50, total requests 5000
ab -c 50 -n 5000 http://172.19.0.10:80/

The load testing results are as follows, showing that all requests were successful.

Server Software:        nginx/1.22.1
Server Hostname:        172.19.0.10
Server Port:            80

Document Path:          /
Document Length:        5 bytes

Concurrency Level:      50
Time taken for tests:   3.141 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      1170000 bytes
HTML transferred:       25000 bytes
Requests per second:    1591.81 [#/sec] (mean)
Time per request:       31.411 [ms] (mean)
Time per request:       0.628 [ms] (mean, across all concurrent requests)
Transfer rate:          363.75 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   15   3.9     15      31
Processing:     5   16   4.7     16      48
Waiting:        0   11   4.4     10      34
Total:         17   31   3.6     30      60

Percentage of the requests served within a certain time (ms)
  50%     30
  66%     32
  75%     32
  80%     33
  90%     35
  95%     37
  98%     40
  99%     47
 100%     60 (longest request)

You can increase the number of concurrent tests. The maximum theoretical value of concurrency is the maximum number of back_map entries where we store conntrack entries. Exceeding this concurrency level may cause remapping (not under hash mode), possibly leading to TCP resets.

Preview

To build a complete SLB, there is still a lot of work to be done, such as utilizing kernel capabilities for MAC auto-addressing, numerous boundary checks, etc. These are tasks to be carried out later, and everyone is welcome to participate at https://github.com/MageekChiu/xdp4slb/.