READMME
eBPF原理
BPF basic
11个64位寄存器、r0 用于保存返回值,r1~r5用于保存bpf调用的参数、r6~r9用于被调用的函数在内部使用 执行BPF程序的时候,寄存器r1用户存放context,不同类型的程序其context内容不同。
bpf helper function
1 2 3 4 |
|
XDP generic 和 XDP native
前者是在网卡收到包后,创建skb的时候(需要设置XDP_FLAGS_SKB_MODE启用),作用在Linux内核层面, 后者则是直接在网卡驱动层。前者拿到包后可以任意处理,但是后者要redirect的时候,只能redirect到另外一个支持XDP native的设备中。
eBPF map
per cpu的,多个实例共享同一个map。避免了多个CPU cache之间进行同步
tail call
多个eBPF程序可以级连,通过tail call
来调用另外一个eBPF程序。
eBPF工作流
- Create an eBPF program
- Call new new bpf() syscall to inject the program in the kernel and obtain a reference to it
- Attach the program to a socket (with the new SO_ATTACH_BPF setsockopt() option):
- setsockopt(socket, SOL_SOCKET, SO_ATTACH_BPF, &fd, sizeof(fd));
- where “socket” represents the socket of interest, and “fd” holds the file descriptor for the loaded eBPF program
- Once the program is loaded, it will be fired on each packet that shows up on the given socket
- limitation: programs cannot do anything to influence the delivery or contents of the packet
- These programs are not actually “filters”; all they can do is store information in eBPF “maps” for consumption by user space
USDT(Userland Statically Defined Tracepoints)
DTRACE_PROBE,定义用户态的tracepoint,需要引入systemtap-sdt-dev包,并包含#include <sys/sdt.h>
头文件
eBPF载入程序后会进行深度搜索CFG来检测,如果发现不可达的指令就禁止执行、
BPF verifier.
- Providing a verdict for kernel whether safe to run
- Simulation of execution of all paths of the program
- Steps involved (extract):
- Checking control flow graph for loops
- Detecting out of range jumps, unreachable instructions
- Tracking context access, initialized memory, stack spill/fills
- Checking unpriviledged pointer leaks
- Verifying helper function call arguments
- Value and alignment tracking for data access (pkt pointer, map access)
- Register liveness analysis for pruning
- State pruning for reducing verification complexity
bcc
1 2 3 4 5 6 7 |
|
bpf_ktime_get_ns
获取当前时间,单位是nanosecondsBPF_HASH(last)
创建名为last的关联数组,如果没有指定额外参数的化,key和value的类型都是u64last.lookup(&key)
查询key是否在hash中,不在就返回NULLlast.delete(&key)
从hash中删除keylast.update(&key, &value)
更新数据bpf_trace_printk
bpf程序输出- bpf程序中所有要运行的函数,其第一个参数都需要是
struct pt_regs*
bpf_get_current_pid_tgid
返回进程PIDbpf_get_current_comm
获取当前程序名称BPF_PERF_OUTPUT(events)
定义输出的channel名称events.perf_submit()
提交event到用户空间b["events"].open_perf_buffer(print_event)
把输出函数和输出的channel关联起来b.perf_buffer_poll()
阻塞等待eventsBPF_HISTOGRAM
定义BPF Map对象,这是一个histogram(dist.increment()bucket递增)bpf_log2l
返回log2函数计算的结果。b["dist"].print_log2_hist("kbytes")
按照log2作为key,kbytes作为header,打印dist这个histogram中的数据attach_kretprobe
attach到一个内核函数的return
点BPF(src_file = "vfsreadlat.c")
从源码中读取BPF程序attach_uprobe
attach到一个uprobePT_REGS_PARM1
获取到要trace的函数中的第一个参数bpf_usdt_readarg(6, ctx, &addr)
读取USDT probe的第六个参数到addr变量中bpf_probe_read(&path, sizeof(path), (void *)addr)
将addr中的内容读取出来赋值给path变量,可以理解是安全版本的memcpyUSDT(pid=int(pid))
对指定PID开启USDT tracing功能enable_probe(probe="http__server__request", fn_name="do_trace")
attach do_trace函数到Node.js的http__server__request
USDT probeBPF(text=bpf_text, usdt_contexts=[u])
将USDT对象u传递给BPF对象
uprobe
核心就是b.attach_uretprobe(name=name, sym="readline", fn_name="printret")
这段代码,共享库位置/二进制位置,要uprobe的符号名称,触发的function
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
tracepoint
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|