Tracepoints, Kprobes, or Fprobes: Which One Should You Choose?
The art of writing eBPF Tracing programs
It is safe to say that almost all eBPF programs can extract and send kernel event data to user space applications.
However, tracing programs like kprobes, fprobes, and tracepoints are often preferred because they hook onto kernel events with access to rich, actionable data for tasks like performance monitoring or syscall argument tracing.
But their overlapping functionality can make choosing the right one confusing.
Today’s newsletter covers how to use each and why prefer one over another.
Tracepoint
Tracepoints are predefined hook points in the Linux kernel, and eBPF programs can be attached to these tracepoints to execute custom logic whenever the kernel reaches those points.
For example, the sys_enter_execve
tracepoint captures the entry of the execve
system call, providing information about the program being executed and its arguments, making it a valuable in things like auditing security events, or analyzing Linux user activity.
You can find all events that eBPF tracepoints can hook onto, using:
The output format is in the form <category>:<name>
.
You can view the input arguments for a tracepoint by checking the contents of /sys/kernel/debug/tracing/events/<category>/<name>/format
.
The first four arguments, are not accessible by the eBPF code. This is a choice that dates back to the original inclusion of this code.
💡 See explaination in commit 98b5c2c65c29.
But other fields can generally be accessed using our eBPF program like showcased at the bottom in the print
fmt
line.
Using that we can write our eBPF Tracepoint program.
💡
SEC("tp/xx/yy")
andSEC("tracepoint/xx/yy")
are equivalent, and you can use either one according to personal preference.
But there are two downsides to this:
- Tracepoints only exists in places where kernel devs have put them. If you need to trace something that isn’t supported you need another technique.
- Additionally, you need to make sure the tracepoint you are attaching to is available under your kernel version.
Making a tracepoint portable across different kernel versions is not significantly challenging. Tracepoints generally remain stable across kernel versions, and if they do change, we can utilize the BPF_CORE_READ()
family of helpers for CO-RE relocatable reads.
Additionally, we must ensure the input context variable exists in the kernel where the program is loaded. For instance, with our custom struct trace_sys_enter_execve
, it won't have a corresponding type in the kernel's BTF. This prevents CO-RE from adjusting instructions to read variables at the correct offsets if they differ across kernel versions.
Therefore, we need to use struct trace_event_raw_sys_enter
defined in vmlinux.h
.
💡vmlinux.h
is a kernel header file, providing access to kernel structures and definitions for eBPF programs.bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
Due to a newsletter length limit, I will write more extensively about building portable eBPF programs and the usage of BPF_CORE_READ()
and BPF_PROBE_READ()
family of helpers in the next week’s post.
Raw Tracepoint
Raw Tracepoint may seem not much different than the regular Tracepoint. They are both able to attach to events listed in the /sys/kernel/debug/tracing/available_events
file.
But the main difference is that raw tracepoint does not pass the input context to the eBPF program as tracepoints do — a.k.a. constructing the appropriate parameter fields. The Raw tracepoint eBPF program accesses the raw arguments of the event using struct bpf_raw_tracepoint_args
.
Therefore, raw tracepoint usually performs a little better than tracepoint.
Another (rather large) difference is that, in the kernel, there’s actually no static defined tracepoints on single syscalls but only on generic sys_enter
/sys_exit
.
💡sys_enter
hooks trigger on every syscall event entry, whilesys_exit
hooks trigger on its return, capturing the return value of the syscall.
Therefore, if we want to act on specific syscall kernel event, we need to “filter” by syscall ID inside our Raw Tracepoint eBPF program.
This is different than regular tracepoints that rely on perf events which allows them to directly attach to a specific kernel event like tp/syscalls/sys_enter_execve
as showcased above.
💡 Perf events are a kernel feature for monitoring and profiling Linux systems, capturing hardware events (e.g., cache misses), software events (e.g., context switches), and kernel tracepoints.
Notice also that we are reading the arguments of the syscall by extracting them from the CPU registers. The System V ABI specifies which arguments should be present in which CPU registers.
Since we rely on CPU registers, we need to target our binary for specific system architectures. One way to achieve this is to provide a —-target
flag if you are using clang
.
💡 For the example above, I intentionally read the register value using
®s→di
, while the rest of the examples will utilizePT_REGS_PARM*
macros which should be preffered.
Kernel Probe (kprobe)
Regular and raw eBPF tracepoints might in fact be sufficient for your use case, but their main limitation is that they are limited to a set of predefined hook points…
Read the full post, on my Substack Newsletter: https://ebpfchirp.substack.com/p/tracepoints-kprobes-or-fprobes-which