Article address: https://www.ebpf.top/post/network_and_bpf_2024

2024 Network and eBPF Forecast

In early 2024, Nico Vibert, Senior Sales Engineer at Isovalent, made some predictions about networks and eBPF. Here, we’ll briefly outline some important conclusions, mainly related to eBPF/Cilium/cloud-native/network/observability, and security areas.

1. eBPF

1.1 Exponential Growth of eBPF

In 2023, eBPF-based networking and security projects experienced rapid growth, and it is expected that the total number of eBPF-based projects will easily surpass three digits next year. The popularity of Liz’s book Learning eBPF, eBPF Labs, eBPF Summit, and eBPF Documentary indicates that eBPF is becoming increasingly popular.

This eBPF documentary film has received over 50,000 views since its release.

1.2 eBPF Application Market

The number of projects listed on ebpf.io has increased from 9 two years ago to 41. These projects or tools cover various areas such as CNI/high-performance load balancing/cloud-native runtime security/observability. It is expected that there will be an eBPF application market in the future, where we can not only download and install eBPF programs but also detect conflicts between programs. The concept of eBPF Store is expected to develop in the next few years.

1.3 Wider Application of eBPF in Mobile Devices

Currently, eBPF is used by millions of Android users every day. The initial use of eBPF in Android was for measuring network traffic statistics, but now every network packet in your phone may be processed through eBPF programs. In the future, the author predicts that there will be more eBPF applications on mobile devices, not limited to Android devices. In the coming years, the Cilium project may also run on mobile devices.

The main uses of eBPF on Android include:

  • Data usage statistics and billing.
  • Firewall/network restrictions for energy saving when entering power-saving mode.
  • High-speed packet processing, such as shared network connections or 464xlat (a method of providing IPv4 connections over an IPv6 network since the Linux kernel itself does not support it).

This video from the IETF 116 conference in Japan provides a fascinating and concise demonstration of eBPF on Android.

1.4 Risks of eBPF Abuse

With the popularity of eBPF technology, abuse is inevitable. Although the goal of the eBPF verifier is to ensure the safe operation of eBPF programs and prevent dangerous behavior, as eBPF technology is widely applied, there may be certain forms of vulnerabilities being exploited. This may cause some concerns among people about eBPF technology, and it is predicted that the use of eBPF-based applications to control and monitor eBPF program applications will increase exponentially.

image

2. Observability

According to statistics from this blog article, observability is undoubtedly the most popular topic at KubeCon.

image

Although many organizations will invest time and effort in establishing their internal development platforms (platform engineering may become the most popular topic at KubeCon next year), the author predicts that many organizations will initially focus on improving their observability capabilities. Large organizations will need to govern clusters and deployed pods.

2.2 Reducing Observability Overhead

Everything happening in an observability cluster will generate a huge amount of data, which significantly increases the cost overhead of observability, especially in scenarios involving hundreds of Kubernetes clusters and hundreds of thousands of pods. As numerous daemons are running in clusters for purposes such as logging, monitoring, and security, the resource consumption of these applications themselves may exceed that of the actual running applications on the cluster in specific scenarios. With increasing attention given to the FinOps framework, engineers are expected to start optimizing resource-consuming observability tools. This is also one of the reasons for the soaring popularity of Tetragon: low-overhead eBPF event monitoring.

2.3 Context-Aware Kubernetes Workloads

Context-awareness is not a new concept (it was proposed more than 10 years ago during work at Cisco), but it is something that will receive more attention in the coming years. The many layers of abstractions introduced by containerization lead to the loss of context during processes. For example, a pod typically contains multiple containers (sometimes even dozens) that share an IP address. Setting secure container application policies often means granting more permissions to the containers in the pod. Integrating Tetragon’s context with Cilium network policies may be a transformative initiative.

2.4 AI-Assisted Network Troubleshooting

Network tools will start to integrate with LLM natively, providing a chat-like experience to interact with the network.For example, logging into the Grafana dashboard and troubleshooting the issue of decreasing traffic between 10pm and 11pm through dialogue is most likely caused by a misconfigured network policy. The chatbot can respond to simple instructions about building architecture and provide corresponding Terraform configurations. Eventually, through chat conversations, the reason for the routing loop can be identified.

3. Networking

3.1 Container Networking Performance Matching Host Networking Performance

At KubeCon in Chicago, Daniel Borkmann gave a talk titled “Turning up Performance to 11: Cilium, NetKit Devices, and Going Big with TCP”. After co-creating one of the most transformative network technologies (eBPF) over the past decade with his collaborators, Daniel set himself another challenge: to improve container networking performance to match that of host networking.

With the recent release of netkit (as well as the ability to run eBPF programs at the container level), we can expect to achieve comparable levels of network performance. While these features may take some time to become available on commonly deployed kernels, the networking capabilities are indeed exciting.

The Linux network protocol stack contains so many paths that Daniel needed to work with developers behind technologies like XDP (Express Data Path) / BIG TCP to find clever ways to find shortcuts or research optimizations.

3.2 Transformation in the Networking Industry

The networking industry is poised to undergo one of its most significant transformations next year, driven by several converging trends.

  • Broadcom’s acquisition of VMware will lead some organizations to reevaluate their usage of the VMware technology stack. This, in turn, will have an impact on the adoption of NSX as a virtual networking technology.
  • Open-source networking technologies have never been this popular. 2023 saw the inaugural Network Automation and NetDevOps conference - a trend primarily supported by open-source projects. The popularity of the Awesome Network Automation GitHub repository and many highlighted tools, such as GoBGP and containerlab, underscore the transformation the networking industry has undergone over the past decade.
  • Open-source networking has never been this powerful. In the past, open-source networking had some discount in terms of functionality and performance compared to proprietary solutions. This situation has significantly improved when considering some performance improvements that can be achieved through technologies like eBPF and XDP. The performance improvements in this use case are incredible (who wouldn’t want a 72-fold increase in CPU utilization).

Expect fundamental changes in the networking industry in the coming years; perhaps the most significant transformation since the advent of software-defined networking.

3.3 Cilium in Home Environments

In a recent episode of eCHO, we explored how and why Cilium is finding its way into home environments. With Cilium, you can protect applications using network policies and expose them externally through built-in BGP and Gateway API support. Cilium is expected to gain popularity for home use this year.

Cilium’s adoption in edge environments has been growing in recent years. Many organizations in industries such as retail and healthcare are deploying cloud-native edge computing platforms and using Cilium to bring the network and firewall closer to the workloads.

3.4 Network Operators Seeking LLM Help - Not All Roses

Network engineers have been using ChatGPT to help troubleshoot network problems and generate network configurations. However, even with frequent use of ChatGPT, we still cannot entrust decision-making to LLM. While minor issues are inevitable, AI and LLM will still have a transformative impact on networks.

4. Cloud Native

4.1 Kubernetes Users Pushing Back on Complexity

The cloud-native ecosystem is becoming increasingly complex. Complaints about the complexity of operating cloud-native platform stacks can often be seen on social media. The proliferation of numerous cloud-native projects and tools has added more fatigue for platform and DevOps engineers. What do platform and DevOps engineers want in 2024? Simplicity.

The author expects many platform and DevOps engineers to invest time in simplifying their cloud-native toolsets in 2024. As a result, the cloud-native ecosystem is expected to start shrinking.

4.2 IPv6-Only Kubernetes Clusters Becoming More Common

The percentage of Google users using IPv6 has increased from 31% three years ago to 45%, and at the current rate, the primary protocol seen by global Google users is expected to be IPv6 by the end of 2024 (over 70% in countries such as France and India). But does the adoption of IPv6 extend to the Kubernetes platform? Partially. Although hosted Kubernetes service providers (AKS, GKE, and EKS) have made significant improvements, IPv6 is still a minor option and is typically lacking in proper documentation and recommendations. Until recently, many fundamental cloud-native tools had some deficiencies in IPv6 support.

Cilium has indeed introduced native support for NAT46/NAT64 to provide interoperability between pure IPv6 clusters and the IPv4 world, which is a positive development.

GitHub and Docker Hub are also actively driving the adoption of IPv6. Furthermore, there are several other factors that will accelerate the adoption of IPv6 in Kubernetes:- Cost: The recent pricing increase for AWS Public IPv4 addresses will also lead users to switch to IPv6.

  • Ease of operation: Telecommunication companies using SRv6 technology, which will soon be applied to Kubernetes, could greatly simplify network connectivity and programmability. The author expects telcos to be highly interested in utilizing SRv6 in their clusters.

  • Performance: High-performance networking features like BIG TCP will give users more reasons to switch to IPv6.

While scalability and IP address exhaustion are commonly cited reasons for the promotion of IPv6, cheaper and faster IPv6 networks will be more readily accepted by customers.

4.3 Rapid Growth of WSAM

We often describe eBPF technology as “eBPF is to the kernel what Javascript is to the browser.” Interestingly, another popular cloud-native technology, Wasm, has a similar saying. Wasm provides an abstraction that allows running C/C++, C#, and Rust programs in web browsers. With Containerd’s native support for Wasm, it is expected that Wasm will soar even higher in 2024. Moreover, leveraging the powerful capabilities of eBPF, Cilium can seamlessly protect and connect these applications.

4.4 The Not-to-Be-Forgotten Heterogeneous Networks

Apart from the Kubernetes platform, there are still a significant number of service instances deployed on virtual machines and bare metal servers. According to VMware/Broadcom statistics, at least 85 million virtual machines are running on vSphere, and there are likely a comparable number of EC2 instances running on AWS. Several sessions at CiliumCon (an event focused on Cilium held in Chicago before KubeCon) discussed technologies beyond Kubernetes, such as OpenStack and Nomad, and how Cilium is used for network connectivity and security in these environments.

In 2024, we will see projects like Cilium Mesh aiming to bridge the gap between Kubernetes, VMs, serverless, and other heterogeneous workloads, striving to connect and protect them.

4.5 The Challenges of Platform Engineering and Network Growth

According to Gartner’s definition, “platform engineering improves developer experience and productivity by providing self-service capabilities with automated infrastructure operations.”

Expectations for the intersection of platform engineering and networking exist, but at the same time, some platform engineers might be hesitant to give developers self-service network capabilities.

How can organizations strike a balance between autonomy and control, providing independence and flexibility to internal developer portals (IDPs) users without allowing them to make poor network decisions that could jeopardize their applications under development? Finding the right balance between autonomy and control is crucial.

While building developer portals will still be a focus for many in 2024, there will be growing pains in terms of how much network independence to provide.