0%

Cilium 开发环境搭建

这篇文章介绍如何在本地搭建 Cilium 开发环境,这样就可以对 Cilium 代码进行调试了。

下载 Cilum 源代码

1
git clone https://github.com/cilium/cilium

虽然当前 Cilium 最新的稳定版本是 1.14.5,但是如果使用 1.14.5 来搭建开发环境,kind-image-fast 这个 target 有点问题,详见我提交的一个 issue:Build development setup quickly seems not work in 1.14.5

所以这里我直接使用主线 main 分支代码(最新commit f8a7616)。

检查开发环境

执行 make dev-doctor 检查开发环境,取决于你的需求,并不是所有依赖都需要满足。

安装 kind

1
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64

将 kind 移动到 $PATH 包含的某个目录中,这里移动到 /usr/local/sbin 下:

1
2
mv kind /usr/local/sbin/
chmod a+x /usr/local/sbin/kind

安装 kubectl

1
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

将 kubectl 移动到 $PATH 包含的某个目录中,这里移动到 /usr/local/sbin 下:

1
2
mv kubectl /usr/local/sbin/
chmod a+x /usr/local/sbin/kubectl

搭建 kind 集群

首先配置使用两个 worker 节点:

1
# export WORKERS=2

在 Cilium 代码的根目录下,执行如下命令搭建 kind 集群:

1
make kind

kind 对应的安装脚本:

1
./contrib/scripts/kind.sh

确定集群安装成功

1
2
3
# kubectl cluster-info --context kind-kind
Kubernetes control plane is running at https://127.0.0.1:42127
CoreDNS is running at https://127.0.0.1:42127/api/v1/namespaces/kube-system/services/kube-dns:dns/pro
1
2
3
4
5
# kubectl get nodes --context kind-kind
NAME STATUS ROLES AGE VERSION
kind-control-plane NotReady control-plane 69s v1.27.3
kind-worker NotReady <none> 38s v1.27.3
kind-worker2 NotReady <none> 37s v1.27.3

本地安装 Cilium CLI

1
2
3
4
5
6
7
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

确认 Cilium CLI 安装成功:

1
2
3
4
cilium version --client
cilium-cli: v0.15.18 compiled with go1.21.5 on linux/amd64
cilium image (default): v1.14.4
cilium image (stable): v1.14.5

在 Kind 集群中安装 Cilium

在 cilium 代码目录下执行如下命令来安装 Cilium:

1
make kind-install-cilium-fast

kind-install-cilium-fast 使用 Cilium CLI 来在 kind 集群中安装 Cilium,所以在上一步我们在开发环境中安装了 Cilium CLI。

修改 cilium 配置,开启其调试信息

1
2
# cilium config set debug true
# cilium config set debug-verbose datapath

确认配置修改成功:

1
2
3
# cilium config view | grep debug
debug true
debug-verbose datapath

修改本地 Cilium 代码

接下来修改本地的 Cilium 代码。首先修改 daemon/cmd/daemon_main.go,在其 initEnv 中增加一行日志输出:

1
2
3
4
5
6
func initEnv() {
......
log.Infof("Cilium %s", version.Version)
// 新增一行打印
log.Info("It's self compiled daemon version")
}

继续修改 bpf/bpf_lxc.c,在其 cil_from_container 入口处也增加一处打印

1
2
3
4
5
6
7
8
__section_entry
int cil_from_container(struct __ctx_buff *ctx)
......
// 增加一行打印
printk("It's self compiled datapath version");

bpf_clear_meta(ctx);
......

编译 Cilium 代码并加载到 Kind 集群

1
make kind-image-fast

kind-image-fast 会编译 cilium-agent cilium CLI 等交付件,并将生成的交付件安装到 Kind 集群中。同时它也会将源码目录下的 bpf 代码拷贝到 Kind 集群的相应路径(/cilium-binariesvar/lib/cilium/bpf/)中,从而可以重新生成并加载 bpf 程序。

检查 Cilium 状态

此时查看 k8s 集群,应该可以看到节点正确运行:

1
2
3
4
5
# kubectl get nodes --context=kind-kind
NAME STATUS ROLES AGE VERSION
kind-control-plane Ready control-plane 48m v1.27.3
kind-worker Ready <none> 47m v1.27.3
kind-worker2 Ready <none> 47m v1.27.3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# kubectl get pods --context=kind-kind -n kube-system
NAME READY STATUS RESTARTS AGE
cilium-8zn67 1/1 Running 0 55s
cilium-fqdnq 1/1 Running 0 57s
cilium-j8tqr 1/1 Running 0 56s
cilium-operator-5f55bb7b45-kjxvs 1/1 Running 0 25s
coredns-5d78c9869d-9fqb4 1/1 Running 0 42h
coredns-5d78c9869d-qvpdq 1/1 Running 0 42h
etcd-kind-control-plane 1/1 Running 0 42h
kube-apiserver-kind-control-plane 1/1 Running 0 42h
kube-controller-manager-kind-control-plane 1/1 Running 1 (16h ago) 42h
kube-proxy-bpz9q 1/1 Running 0 42h
kube-proxy-lbpwf 1/1 Running 0 42h
kube-proxy-n227w 1/1 Running 0 42h
kube-scheduler-kind-control-plane 1/1 Running 2 (16h ago) 42h

使用 cilium status 查看 Cilium 状态也是正确的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# cilium status
/¯¯\
/¯¯\__/¯¯\ Cilium: OK
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: disabled (using embedded mode)
\__/¯¯\__/ Hubble Relay: disabled
\__/ ClusterMesh: disabled

DaemonSet cilium Desired: 3, Ready: 3/3, Available: 3/3
Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1
Containers: cilium Running: 3
cilium-operator Running: 1
Cluster Pods: 7/7 managed by Cilium
Helm chart version: 1.16.0-dev
Image versions cilium quay.io/cilium/cilium-ci:latest: 3
cilium-operator quay.io/cilium/operator-generic-ci:latest: 1

检查代码修改生效

查看某个 Cilium-agent Pod 的日志,可以看到我们的修改生效:

1
2
3
# kubectl logs cilium-fqdnq  -n kube-system | grep self
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
level=info msg="It's self compiled daemon version" subsys=daemon

检查数据平面打印的调试信息:

1
2
3
# cat /sys/kernel/debug/tracing/trace_pipe
......
coredns-2156337 [000] d.s1. 8541189.106506: bpf_trace_printk: It's self compiled datapath version

Reference