diff --git a/docs/kubernetes/README.md b/docs/kubernetes/README.md new file mode 100644 index 0000000..b53b992 --- /dev/null +++ b/docs/kubernetes/README.md @@ -0,0 +1 @@ +# kubernetes(k8s)相关内容 \ No newline at end of file diff --git a/docs/kubernetes/install.md b/docs/kubernetes/install.md new file mode 100644 index 0000000..515ef0c --- /dev/null +++ b/docs/kubernetes/install.md @@ -0,0 +1,605 @@ +# K8s安装过程(Debian12+Kubernetes1.28+Containerd+Calico用apt-get在线搭建1master+2node的K8s集群) + +> wandoubaba / 2024-10-21 + +截止本文发布时,kubernetes最新版本是`v1.31`,而补丁已经全部打完的最新稳定版本是`v1.28`,本文基于后者。 + +## 安装准备 + +### 资源准备 + +|准备项|内容| +|---|---| +|操作系统|Debian 12(bookworm)| +|系统内核|6.1.0-23-amd64| +|容器运行时|containerd CRI| + +### 主机清单 + +|IP(假设)|主机名|CPU|内存| +|---|---|---|---| +|172.31.0.11|k8s-master01|8c|8G| +|172.31.0.14|k8s-master04|8c|16G| +|172.31.0.15|k8s-master05|8c|16G| + +## 过程 + +### 确认主机基本信息(每台主机) + +```sh +# 查看IP地址,确认设置为静态地址 +ip addr | awk '/inet /{split($2, ip, "/"); print ip[1]}' +# 查看MAC地址,确保每台主机的MAC唯一 +ip link | awk '/state UP/ {getline; print $2}' +# 查看主机的UUID,确保product_uuid的唯一性 +sudo cat /sys/class/dmi/id/product_uuid +# 查看内核版本 +uname -r +# 查看操作系统发行版本信息 +cat /etc/os-release +# 确认cpu核心数量 +lscpu -p | grep -v "^#" | wc -l +# 确认内存容量 +free -h | awk '/Mem/{print $2}' +# 确认磁盘空间 +lsblk +``` + +### 设置主机名并更新/etc/hosts文件(每台主机) + +#### 设置主机名 + +```sh +# 在主控制节点(k8s-master01)上执行 +sudo hostnamectl set-hostname k8s-master01 +# 在工作节点(k8s-node01、k8s-node02)上执行 +sudo hostnamectl set-hostname k8s-node01 +sudo hostnamectl set-hostname k8s-node02 +``` + +主机名设置成功后可以`exit`退出终端再重新连接,或者直接执行`bash`,都可以看到修改后的效果 + +#### 修改/etc/hosts文件 + +要根据自己的真实IP进行修改 + +```sh +sudo bash -c 'cat <> /etc/hosts +172.31.0.11 k8s-master01 +172.31.0.14 k8s-node01 +172.31.0.15 k8s-node02 +EOF' +``` + +### 设置时区并安装时间服务(每台主机) + +```sh +sudo timedatectl set-timezone Asia/Shanghai +sudo apt-get update && sudo apt-get install -y chrony +``` + +#### 配置阿里云时间服务器(可选) + +```conf +pool ntp1.aliyun.com iburst maxsources 4 +``` + +提示:在/etc/chrony/chrony.conf中加入上述配置,将其他pool开头的配置注释掉; + +重启chrony,并验证 + +```sh +sudo systemctl restart chrony +sudo systemctl status chrony +sudo chronyc sources +``` + +### 禁用swap(每台主机) + +```sh +sudo swapoff -a +``` + +还要在`/etc/fstab`文件中注释关于swapr挂载的行。 + +### 禁用防火墙(每台主机) + +```sh +sudo ufw disable +sudo apt-get remove ufw +``` + +### 优化内核参数 + +```sh +sudo bash -c 'cat > /etc/sysctl.d/kubernetes.conf < /etc/modules-load.d/kubernetes.conf << EOF +# /etc/modules-load.d/kubernetes.conf + +# Linux 网桥支持 +br_netfilter + +# IPVS 加载均衡器 +ip_vs +ip_vs_rr +ip_vs_wrr +ip_vs_sh + +# IPv4 连接跟踪 +nf_conntrack_ipv4 + +# IP 表规则 +ip_tables +EOF' + +# 添加可执行权限 +sudo chmod a+x /etc/modules-load.d/kubernetes.conf +``` + +### 关闭安全策略服务(每台主机) + +```sh +# 停止 AppArmor 服务 +sudo systemctl stop apparmor.service + +# 禁用 AppArmor 服务 +sudo systemctl disable apparmor.service +``` + +### 关闭防火墙(每台主机) + +```sh +# 禁用ufw +sudo ufw disable +sudo systemctl stop ufw.service +sudo systemctl disable ufw.service +``` + +### 安装容器运行时(每台主机) + +#### 下载 + +在查看最新版本,然后选择对应的`cri-containerd-x.x.x-linux-platform`文件下载: + +```sh +curl -L -O https://github.com/containerd/containerd/releases/download/v1.7.23/cri-containerd-1.7.23-linux-amd64.tar.gz +``` + +#### 安装 + +```sh +sudo tar xf cri-containerd-1.7.23-linux-amd64.tar.gz -C / +``` + +#### 配置 + +```sh +sudo mkdir /etc/containerd +sudo bash -c 'containerd config default > /etc/containerd/config.toml' +sudo sed -i '/sandbox_image/s/3.8/3.9/' /etc/containerd/config.toml +sudo sed -i '/SystemdCgroup/s/false/true/' /etc/containerd/config.toml +``` + +#### 启动 + +```sh +# 启用并立即启动containerd服务 +sudo systemctl enable --now containerd.service +# 检查containerd服务的当前状态 +sudo systemctl status containerd.service +``` + +#### 验证 + +```sh +# 检查containerd的版本 +containerd --version +# 与CRI(Container Runtime Interface)兼容的容器运行时交互的命令行工具 +crictl --version +# 运行符合 OCI(Open Container Initiative)标准的容器 +sudo runc --version +``` + +### 安装docker(每个主机,在k8s中可选,仅用于构建镜像) + +```sh +# Add Docker's official GPG key: +sudo apt-get update +sudo apt-get install -y ca-certificates curl +sudo install -m 0755 -d /etc/apt/keyrings +sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc +sudo chmod a+r /etc/apt/keyrings/docker.asc + +# Add the repository to Apt sources: +echo \ + "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \ + $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ + sudo tee /etc/apt/sources.list.d/docker.list > /dev/null +sudo apt-get update +sudo apt-get install -y docker-ce docker-ce-cli docker-buildx-plugin docker-compose-plugin +``` + +### 安装k8s组件(每个主机) + +指的是安装`kubelet`、`kubeadm`、`kubectl` + +```sh +sudo apt-get update +# apt-transport-https may be a dummy package; if so, you can skip that package + +sudo apt-get install -y apt-transport-https ca-certificates curl gpg +# If the directory `/etc/apt/keyrings` does not exist, it should be created before the curl command, read the note below. + +# sudo mkdir -p -m 755 /etc/apt/keyrings +curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg + +# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list +echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list + +sudo apt-get update +sudo apt-get install -y kubelet kubeadm kubectl +sudo apt-mark hold kubelet kubeadm kubectl +sudo systemctl enable --now kubelet +``` + +#### 配置kubelet + +```sh +sudo bash -c 'cat > /etc/default/kubelet << EOF +# 该参数指定了 kubelet 使用 systemd 作为容器运行时的 cgroup 驱动程序 +KUBELET_EXTRA_ARGS="--cgroup-driver=systemd" +EOF' +# 这里先设置kubelet为开机自启 +sudo systemctl enable kubelet +``` + +### 初始化master01主机 + +#### 查看k82镜像(可选) + +```sh +sudo kubeadm config images list +``` + +应该能列出以下信息: + +```sh +registry.k8s.io/kube-apiserver:v1.31.1 +registry.k8s.io/kube-controller-manager:v1.31.1 +registry.k8s.io/kube-scheduler:v1.31.1 +registry.k8s.io/kube-proxy:v1.31.1 +registry.k8s.io/coredns/coredns:v1.11.3 +registry.k8s.io/pause:3.10 +registry.k8s.io/etcd:3.5.15-0 +``` + +> 如果看到类似`remote version is much newer: v1.31.1; falling back to: stable-1.28`的提示说版本低,忽略它就行了。 + +> k8s的镜像默认是谷歌仓库地址,需要代理才可以正常访问;如果你没有代理,请使用阿里云仓库也是可以的;用--image-repository="registry.aliyuncs.com/google_containers"来指定使用阿里云镜像仓库中的镜像部署k8s集群。 + +#### 下载镜像(可选) + +```sh +sudo kubeadm config images pull +``` + +#### 创建k8s集群 + +##### 初始化master01节点 + +- 要把下面的`--apiserver-advertise-address`参数换成实际的`k8s-master01`主机IP地址. +- `--pod-network-cidr`参数指的是本k8s集群中要让pod使用的网段。 + +```sh +sudo kubeadm init --control-plane-endpoint=k8s-master01 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=172.31.0.11 --cri-socket unix:///run/containerd/containerd.sock +``` + +如果执行顺利的话,应该会看到下面的信息(每次的具体参数应该不同) + +```sh +Your Kubernetes control-plane has initialized successfully! + +To start using your cluster, you need to run the following as a regular user: + + mkdir -p $HOME/.kube + sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config + sudo chown $(id -u):$(id -g) $HOME/.kube/config + +Alternatively, if you are the root user, you can run: + + export KUBECONFIG=/etc/kubernetes/admin.conf + +You should now deploy a pod network to the cluster. +Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: + https://kubernetes.io/docs/concepts/cluster-administration/addons/ + +You can now join any number of control-plane nodes by copying certificate authorities +and service account keys on each node and then running the following as root: + + kubeadm join k8s-master01:6443 --token 1ahq7i.sv3pqgcss8v5oecj \ + --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca \ + --control-plane + +Then you can join any number of worker nodes by running the following on each as root: + +kubeadm join k8s-master01:6443 --token 1ahq7i.sv3pqgcss8v5oecj \ + --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca +``` + +接下来我们就先配置一下kubectl的配置文件 + +```sh +mkdir -p $HOME/.kube +sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config +sudo chown $(id -u):$(id -g) $HOME/.kube/config +``` + +接下来可以查看节点状态: + +```sh +kubectl get nodes -o wide +``` + +应该能看到类似下面的结果: + +```sh +# 查看节点状态 +kubectl get nodes -o wide +# 结果类似 +NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME +k8s-master01 NotReady control-plane 25m v1.28.14 172.31.0.11 Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23 +# 查看集群信息 +kubectl cluster-info +# 结果类似 +Kubernetes control plane is running at https://k8s-master01:6443 +CoreDNS is running at https://k8s-master01:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + +To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. +# 列出所有CRI容器列表 +sudo crictl ps -a +# 结果类似(其中STATE一列应该都是Running) +CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD +6177ae20a68e6 6a89d0ef825cb 29 minutes ago Running kube-proxy 0 e8d15cb2bcd1a kube-proxy-jlndc +a1a43a29df5c2 6cbf215f8d44e 30 minutes ago Running kube-scheduler 1 3163922b00a0e kube-scheduler-k8s-master01 +19dfb26520340 7abec2d806048 30 minutes ago Running kube-controller-manager 1 f6df8f333fcf0 kube-controller-manager-k8s-master01 +b4c7a5f9c967f 3438637c2f3ae 30 minutes ago Running kube-apiserver 0 b05316fac4cad kube-apiserver-k8s-master01 +8a4c587d9b8d9 2e96e5913fc06 30 minutes ago Running etcd 0 9a8c10ea30b80 etcd-k8s-master01 +``` + +### 添加worker节点(把k8s-node01和k8s-node02添加到集群) + +先在`k8s-master01`上得到添加节点命令(添加`k8s-node01`和`k8s-node02`之前分别是在`k8s-master01`上执行一次) + +```sh +sudo kubeadm token create --print-join-command +# 结果与下面的类似(每一次的token应该都是不一样的),把下面的结果复制下来,准备到worker节点上去执行 +kubeadm join k8s-master01:6443 --token epvxya.fh4qmay5uwc8628a --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca +``` + +下面的操作主要在`k8s-node01`和`k8s-node02`上分别执行 + +```sh +# 安装nmap用于在worker节点上验证master节点上的api-server服务端口的连通性 +sudo apt-get install nmap -y +# 把下面的ip地址换成实际master节点主机ip +nmap -p 6443 -Pn 10.31.0.11 +# 结果 +Starting Nmap 7.93 ( https://nmap.org ) at 2024-10-21 18:50 CST +Nmap scan report for k8s-master01 (172.31.0.11) +Host is up (0.00081s latency). + +PORT STATE SERVICE +6443/tcp open sun-sr-https + +Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds +# 把刚才在master节得得到的join命令粘贴过来执行(建议用非root用户,在前面加上sudo) +sudo kubeadm join k8s-master01:6443 --token epvxya.fh4qmay5uwc8628a --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca +# 结果类似下面 +This node has joined the cluster: +* Certificate signing request was sent to apiserver and a response was received. +* The Kubelet was informed of the new secure connection details. + +Run 'kubectl get nodes' on the control-plane to see this node join the cluster. +``` + +再到`k8s-master01`上验证一下 + +```sh +kubectl get nodes +# 结果类似 +NAME STATUS ROLES AGE VERSION +k8s-master01 NotReady control-plane 41m v1.28.14 +k8s-node01 NotReady 2m6s v1.28.14 +``` + +然后在master节点上再执行一次`sudo kubeadm token create --print-join-command`,在`k8s-node02`上重复上面的过程。 + +最后在`k8s-master01`上查看节点信息 + +```sh +kubectl get nodes +# 结果类似下面 +NAME STATUS ROLES AGE VERSION +k8s-master01 NotReady control-plane 43m v1.28.14 +k8s-node01 NotReady 4m26s v1.28.14 +k8s-node02 NotReady 40s v1.28.14 +# 查看k8s的pod信息 +kubectl get pods -n kube-system +# 结果类似 +NAME READY STATUS RESTARTS AGE +coredns-5dd5756b68-4btx5 0/1 Pending 0 45m +coredns-5dd5756b68-8v2z8 0/1 Pending 0 45m +etcd-k8s-master01 1/1 Running 0 45m +kube-apiserver-k8s-master01 1/1 Running 0 45m +kube-controller-manager-k8s-master01 1/1 Running 1 45m +kube-proxy-5tqw2 1/1 Running 0 6m33s +kube-proxy-864zg 1/1 Running 0 2m47s +kube-proxy-jlndc 1/1 Running 0 45m +kube-scheduler-k8s-master01 1/1 Running 1 45m +``` + +注意到每一个node的`STATUS`列都是`NotReady`,这是因为还没有安装配置网络插件,pod间的通信有问题。 + +### 安装calico网络插件(master节点) + +参考文档 + +#### 安装Tigera Calico operator + +```sh +kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/tigera-operator.yaml +# 结果类似下面 +serviceaccount/tigera-operator created +clusterrole.rbac.authorization.k8s.io/tigera-operator created +clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created +deployment.apps/tigera-operator created +# 查看集群命名空间 +kubectl get ns +# 结果类似下面 +NAME STATUS AGE +default Active 63m +kube-node-lease Active 63m +kube-public Active 63m +kube-system Active 63m +tigera-operator Active 13s +# 查看tigera-operator下的pod +kubectl get pods -n tigera-operator +# 结果 +NAME READY STATUS RESTARTS AGE +tigera-operator-5cfff76b77-tdswm 1/1 Running 0 3m46s +``` + +#### 安装Calico + +```sh +curl -L -O https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/custom-resources.yaml +# 修改ip池,需与初始化时一致 +sed -i 's/192.168.0.0/10.244.0.0/' custom-resources.yaml +# 安装calico +kubectl create -f custom-resources.yaml +# 结果 +installation.operator.tigera.io/default created +apiserver.operator.tigera.io/default created +``` + +再执行`watch`命令等到所有pod的`STATUS`都变成`Running` + +```sh +watch kubectl get pods -n calico-system +# 结果 +Every 2.0s: kubectl get pods -n calico-system k8s-master01: Mon Oct 21 19:22:54 2024 + +NAME READY STATUS RESTARTS AGE +calico-kube-controllers-5846f6d55d-87n88 1/1 Running 0 85s +calico-node-4mhxj 1/1 Running 0 85s +calico-node-6c64k 1/1 Running 0 85s +calico-node-sbzwz 1/1 Running 0 85s +calico-typha-6c76968df6-lcjm6 1/1 Running 0 84s +calico-typha-6c76968df6-xbnk5 1/1 Running 0 85s +csi-node-driver-2vrg7 2/2 Running 0 85s +csi-node-driver-gmb7m 2/2 Running 0 85s +csi-node-driver-mnqvx 2/2 Running 0 85s +``` + +确认pod运行状态直到所有的pod状态都是`Running` + +```sh +watch kubectl get pods -n calico-system +# 结果类似 +Every 2.0s: kubectl get pods -n calico-system k8s-master01: Mon Oct 21 19:23:47 2024 + +NAME READY STATUS RESTARTS AGE +calico-kube-controllers-5846f6d55d-87n88 1/1 Running 0 2m18s +calico-node-4mhxj 1/1 Running 0 2m18s +calico-node-6c64k 1/1 Running 0 2m18s +calico-node-sbzwz 1/1 Running 0 2m18s +calico-typha-6c76968df6-lcjm6 1/1 Running 0 2m17s +calico-typha-6c76968df6-xbnk5 1/1 Running 0 2m18s +csi-node-driver-2vrg7 2/2 Running 0 2m18s +csi-node-driver-gmb7m 2/2 Running 0 2m18s +csi-node-driver-mnqvx 2/2 Running 0 2m18s +``` + +用`ctrl+c`退出`watch`状态,再查看k8s系统的pod状态 + +```sh +kubectl get pods -n kube-system -o wide +# 结果类似 +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES +coredns-5dd5756b68-4btx5 1/1 Running 0 72m 10.244.58.196 k8s-node02 +coredns-5dd5756b68-8v2z8 1/1 Running 0 72m 10.244.58.193 k8s-node02 +etcd-k8s-master01 1/1 Running 0 72m 172.31.0.11 k8s-master01 +kube-apiserver-k8s-master01 1/1 Running 0 72m 172.31.0.11 k8s-master01 +kube-controller-manager-k8s-master01 1/1 Running 1 72m 172.31.0.11 k8s-master01 +kube-proxy-5tqw2 1/1 Running 0 33m 172.31.0.14 k8s-node01 +kube-proxy-864zg 1/1 Running 0 29m 172.31.0.15 k8s-node02 +kube-proxy-jlndc 1/1 Running 0 72m 172.31.0.11 k8s-master01 +kube-scheduler-k8s-master01 1/1 Running 1 72m 172.31.0.11 k8s-master01 +``` + +清理taints(污点) + +```sh +kubectl taint nodes --all node-role.kubernetes.io/control-plane- +``` + +再次确认集群节点 + +```sh +kubectl get nodes -o wide +# 结果类似 +NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME +k8s-master01 Ready control-plane 75m v1.28.14 172.31.0.11 Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23 +k8s-node01 Ready 35m v1.28.14 172.31.0.14 Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23 +k8s-node02 Ready 32m v1.28.14 172.31.0.15 Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23 +```