BPF Programs vs. Kernel Modules in Linux Networking
Optimizing Networking Control and Performance through Different Attachment Points
BPF (Berkeley Packet Filter) programs have brought about a transformative impact on Linux networking, introducing a potent mechanism to attach custom programs at various stages within the kernel. These programs empower network administrators and developers with unprecedented control and visibility over network events, allowing them to optimize performance, enhance security, and gain valuable insights.
In this article, we will delve into the different stages of the Linux networking stack where BPF programs can be attached, exploring why BPF is favored over kernel modules. While the choice of attachment point may appear inconsequential initially, it carries significant implications for the program’s access to information and entails crucial trade-offs. By comprehending the advantages and disadvantages of each stage, network practitioners can make informed decisions that align with their specific requirements.
To fully grasp eBPF (extended Berkeley Packet Filter), it is essential to establish a solid understanding of the differentiation between the kernel and user space in the Linux operating system.
The Linux Kernel
The Linux kernel serves as the intermediary software layer between applications and the underlying hardware on which they run. Applications operate within an unprivileged layer known as user space, which does not have direct access to hardware. Instead, applications make requests through the system call (syscall) interface to communicate their needs to the kernel. These hardware-related requests can involve activities such as reading and writing files, transmitting or receiving network traffic, or accessing memory. Additionally, the kernel is responsible for managing concurrent processes, allowing multiple applications to run simultaneously. This relationship is illustrated in the image below.
The Linux kernel is a highly intricate system, comprising approximately 30 million lines of code as of the time of this writing. Introducing modifications to such a vast codebase requires a certain level of familiarity with the existing code. Unless you are already a kernel developer, this can pose a significant challenge.
Moreover, if your intention is to contribute your changes upstream, you will encounter a challenge that extends beyond the purely technical realm. Linux is a versatile operating system employed in various environments and scenarios. Consequently, for your modifications to become part of an official Linux release, it is not sufficient to write functional code alone. The code must be embraced by the community and, more specifically, gain the acceptance of Linus Torvalds, the creator and principal developer of Linux. It must be viewed as a change that will benefit the collective good. However, such acceptance is not guaranteed, as only approximately one-third of submitted kernel patches are ultimately incorporated.
Adding New Functionality to the Kernel
If you’re seeking a faster alternative to waiting for your change to be incorporated into the kernel, there are few other options available.
The Linux kernel was designed to support kernel modules, which can be dynamically loaded and unloaded as needed. If you intend to modify or expand the kernel’s behavior, developing a module is a viable approach. A kernel module can be distributed independently of the official Linux kernel release, allowing others to utilize it without the need for acceptance into the main upstream codebase.
However, it’s important to acknowledge that creating a kernel module still involves extensive kernel programming. Historically, users have been cautious about utilizing kernel modules due to a fundamental concern: if the kernel code within a module crashes, it can lead to system failure and disrupt all running processes. This raises a crucial question:
How can users have confidence in the safety and reliability of a kernel module?
Being deemed “safe to run” entails more than just avoiding crashes — users also want assurance that a kernel module is secure. They need to know if it contains vulnerabilities that could be exploited by attackers or if the module’s authors can be trusted not to insert malicious code. Given that the kernel operates with privileged access, granting it control over all data on the machine, the presence of malicious code in the kernel or its modules raises serious concerns.
Ensuring the safety and security of the kernel is a primary reason why Linux distributions undergo an extensive process before incorporating new releases. By allowing others to run a kernel version in various scenarios for extended periods, potential issues can be uncovered and addressed. Distribution maintainers can then have reasonable confidence that the kernel they ship to their users or customers has been thoroughly hardened and is indeed safe to run.
eBPF offers a very different approach to safety: the eBPF verifier, which ensures that an eBPF program is loaded only if it’s safe to run — it won’t crash the machine or lock it up in a hard loop, and it won’t allow data to be compromised. We won’t discuss the verification process here, but I might do in future in another post.
eBPF programs offer the flexibility of dynamic loading and unloading into the kernel. Once attached to an event, these programs are triggered whenever that event occurs, regardless of the underlying cause. For instance, if you attach a program to the system call responsible for file openings, it will be triggered every time any process attempts to open a file. The program will even respond to file-opening events from processes that were already running before the program was loaded. This capability provides a significant advantage over upgrading the kernel and requiring a system reboot to access new functionalities.
This attribute leads to one of the remarkable strengths of observability and security tools utilizing eBPF — they provide instantaneous visibility into all activities taking place on the machine. In containerized environments, this visibility encompasses not only the processes running within the containers but also those operating on the host machine. Thus, eBPF-based tools offer comprehensive insights into the entire system’s operation.
I trust that this rather long introduction has provided you with valuable insights into the tremendous power of the eBPF platform. It empowers us to modify the kernel’s behavior, offering the flexibility to develop tailored tools or customized policies. With eBPF-based tools, we can effectively monitor and analyze any event occurring throughout the kernel, spanning all applications running on a (virtual) machine, irrespective of whether they are containerized or not.
As mentioned before, the eBPF programs can be attached to different types of events. To name a few — kprobes, fentry/fexit, tracepoints, LSM, etc. And depending upon where the eBPF program is attached this in turn also defines the type of context information it receives. For example, tracepoint (TC) programs receive a pointer to some tracepoint data. The format of that data depends on the particular tracepoint. At the bottom of the stack, data is held in the form of Layer 2 network packets, which are essentially a series of bytes that have been or are ready to be transmitted “on the wire.” At the top of the stack, applications use sockets, and the kernel creates socket buffers to handle data being sent and received from these sockets.
There are currently around 30 program types enumerated in uapi/linux/bpf.h, and more than 40 attachment types. The attachment type defines more specifically where the program gets attached — for lots of program types, the attachment type can be inferred from the program type, but some program types can be attached to multiple different points in the kernel, so an attachment type has to be specified as well.
Among these, there are also XDP BPF programs that can be attached to events that handle incoming network packets, which is what we will focus on in the next section.
The eXpress Data Path (XDP) is a framework that works by defining a limited execution environment in the form of a virtual machine running eBPF code. This environment executes custom programs directly in the kernel context before the kernel itself touches the network packet data, which enables custom processing (including redirection) at the earliest possible point after a packet is received at the networking hardware. XDP programs can be attached to specific interfaces (or virtual interfaces), meaning you can have multiple XDP programs attached to different interfaces simultaneously.
XDP Hook Points
As mentioned earlier, the eBPF program type itself does not determine the specific attachment point of the program. Instead, it provides the flexibility to be attached at various stages within the networking stack. In the context of XDP (eXpress Data Path) BPF programs, the attachment points are commonly referred to as XDP hooks or hook points. These hook points can be categorized into three distinct types or locations within the networking stack.
The Generic XDP hook is invoked from netif_receive_skb(). However, it is called after the packet’s Direct Memory Allocation (DMA) and Socket Buffer (SKB) allocation have already been completed. Consequently, attaching an XDP program at this hook point results in a loss of many performance benefits.
Driver Native (xdpdrv)
The Native XDP hook allows you to attach an eBPF program to a lower-level hook within the ingress traffic processing function. Typically, this hook is located within the NAPI poll() method, which is executed before an sk_buff (socket buffer) is allocated for the current packet. Attaching an XDP program at this hook point provides the opportunity to process packets at an earlier stage, resulting in better performance compared to the Generic hook.
The Offloaded XDP hook enables the attachment of eBPF programs directly into hardware network devices. This hook point is specific to device offloading capabilities, allowing for the highest level of performance among all hook types. However, it is important to note that Offloaded mode is device-specific and only applicable when utilizing hardware devices that support XDP offloading.
Considering that you do not have a smart NIC (Network Interface Card), the optimal place to run your XDP program would be in Native mode. By attaching your XDP program at the Native hook point, you can benefit significantly from the performance gains achieved through early packet processing.
BPF (Berkeley Packet Filter) programs have revolutionized Linux networking, enabling the attachment of custom programs at various stages within the kernel. These programs offer unprecedented control, visibility, and flexibility over network events, optimizing performance, enhancing security, and providing valuable insights. By understanding the advantages and trade-offs of different attachment points in the Linux networking stack, network practitioners can make informed decisions to meet their specific requirements, favoring BPF over traditional kernel modules for its dynamic capabilities and superior performance.
Thanks for reading! 😎 If you enjoyed this article, hit that clap button below 👏
Do you want to start reading exclusive stories on Medium? Use this referral link 🔗
If you liked my post you can buy me a Hot dog 🌭
Checkout the rest of my content on Teodor J. Podobnik, @dorkamotorka and follow me for more, cheers!