22 KiB
K8s安装过程(Debian12+Kubernetes1.28+Containerd+Calico用apt-get在线搭建1master+2node的K8s集群)
wandoubaba / 2024-10-21
截止本文发布时,kubernetes最新版本是v1.31
,而补丁已经全部打完的最新稳定版本是v1.28
,本文基于后者。
安装准备
资源准备
准备项 | 内容 |
---|---|
操作系统 | Debian 12(bookworm) |
系统内核 | 6.1.0-23-amd64 |
容器运行时 | containerd CRI |
主机清单
IP(假设) | 主机名 | CPU | 内存 |
---|---|---|---|
172.31.0.11 | k8s-master01 | 8c | 8G |
172.31.0.14 | k8s-master04 | 8c | 16G |
172.31.0.15 | k8s-master05 | 8c | 16G |
过程
确认主机基本信息(每台主机)
# 查看IP地址,确认设置为静态地址
ip addr | awk '/inet /{split($2, ip, "/"); print ip[1]}'
# 查看MAC地址,确保每台主机的MAC唯一
ip link | awk '/state UP/ {getline; print $2}'
# 查看主机的UUID,确保product_uuid的唯一性
sudo cat /sys/class/dmi/id/product_uuid
# 查看内核版本
uname -r
# 查看操作系统发行版本信息
cat /etc/os-release
# 确认cpu核心数量
lscpu -p | grep -v "^#" | wc -l
# 确认内存容量
free -h | awk '/Mem/{print $2}'
# 确认磁盘空间
lsblk
设置主机名并更新/etc/hosts文件(每台主机)
设置主机名
# 在主控制节点(k8s-master01)上执行
sudo hostnamectl set-hostname k8s-master01
# 在工作节点(k8s-node01、k8s-node02)上执行
sudo hostnamectl set-hostname k8s-node01
sudo hostnamectl set-hostname k8s-node02
主机名设置成功后可以exit
退出终端再重新连接,或者直接执行bash
,都可以看到修改后的效果
修改/etc/hosts文件
要根据自己的真实IP进行修改
sudo bash -c 'cat <<EOF >> /etc/hosts
172.31.0.11 k8s-master01
172.31.0.14 k8s-node01
172.31.0.15 k8s-node02
EOF'
设置时区并安装时间服务(每台主机)
sudo timedatectl set-timezone Asia/Shanghai
sudo apt-get update && sudo apt-get install -y chrony
配置阿里云时间服务器(可选)
pool ntp1.aliyun.com iburst maxsources 4
提示:在/etc/chrony/chrony.conf中加入上述配置,将其他pool开头的配置注释掉;
重启chrony,并验证
sudo systemctl restart chrony
sudo systemctl status chrony
sudo chronyc sources
禁用swap(每台主机)
sudo swapoff -a
还要在/etc/fstab
文件中注释关于swapr挂载的行。
禁用防火墙(每台主机)
sudo ufw disable
sudo apt-get remove ufw
优化内核参数
sudo bash -c 'cat > /etc/sysctl.d/kubernetes.conf <<EOF
# 允许 IPv6 转发请求通过iptables进行处理(如果禁用防火墙或不是iptables,则该配置无效)
net.bridge.bridge-nf-call-ip6tables = 1
# 允许 IPv4 转发请求通过iptables进行处理(如果禁用防火墙或不是iptables,则该配置无效)
net.bridge.bridge-nf-call-iptables = 1
# 启用IPv4数据包的转发功能
net.ipv4.ip_forward = 1
# 禁用发送 ICMP 重定向消息
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
# 提高 TCP 连接跟踪的最大数量
net.netfilter.nf_conntrack_max = 1000000
# 提高连接追踪表的超时时间
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
# 提高监听队列大小
net.core.somaxconn = 1024
# 防止 SYN 攻击
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_synack_retries = 2
# 提高文件描述符限制
fs.file-max = 65536
# 设置虚拟内存交换(swap)的使用策略为0,减少对磁盘的频繁读写
vm.swappiness = 0
EOF'
# 加载或启动内核模块 br_netfilter,该模块提供了网络桥接所需的网络过滤功能
sudo modprobe br_netfilter
# 查看是否已成功加载模块
lsmod | grep br_netfilter
# 将读取该文件中的参数设置,并将其应用到系统的当前运行状态中
sudo sysctl -p /etc/sysctl.d/kubernetes.conf
安装ipset和ipvsadm(每台主机)
ipset
主要用于支持Service
的负载均衡和网络策略。它可以帮助实现高性能的数据包过滤和转发,以及对IP地址和端口进行快速匹配。ipvsadm
主要用于配置和管理IPVS
负载均衡器,以实现Service
的负载均衡。
sudo apt-get install -y ipset ipvsadm
# 检查是否安装
dpkg -l ipset ipvsadm
内核模块配置(每台主机)
sudo bash -c 'cat > /etc/modules-load.d/kubernetes.conf << EOF
# /etc/modules-load.d/kubernetes.conf
# Linux 网桥支持
br_netfilter
# IPVS 加载均衡器
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
# IPv4 连接跟踪
nf_conntrack_ipv4
# IP 表规则
ip_tables
EOF'
# 添加可执行权限
sudo chmod a+x /etc/modules-load.d/kubernetes.conf
关闭安全策略服务(每台主机)
# 停止 AppArmor 服务
sudo systemctl stop apparmor.service
# 禁用 AppArmor 服务
sudo systemctl disable apparmor.service
关闭防火墙(每台主机)
# 禁用ufw
sudo ufw disable
sudo systemctl stop ufw.service
sudo systemctl disable ufw.service
安装容器运行时(每台主机)
下载
在https://github.com/containerd/containerd/releases查看最新版本,然后选择对应的cri-containerd-x.x.x-linux-platform
文件下载:
curl -L -O https://github.com/containerd/containerd/releases/download/v1.7.23/cri-containerd-1.7.23-linux-amd64.tar.gz
安装
sudo tar xf cri-containerd-1.7.23-linux-amd64.tar.gz -C /
配置
sudo mkdir /etc/containerd
sudo bash -c 'containerd config default > /etc/containerd/config.toml'
sudo sed -i '/sandbox_image/s/3.8/3.9/' /etc/containerd/config.toml
sudo sed -i '/SystemdCgroup/s/false/true/' /etc/containerd/config.toml
启动
# 启用并立即启动containerd服务
sudo systemctl enable --now containerd.service
# 检查containerd服务的当前状态
sudo systemctl status containerd.service
验证
# 检查containerd的版本
containerd --version
# 与CRI(Container Runtime Interface)兼容的容器运行时交互的命令行工具
crictl --version
# 运行符合 OCI(Open Container Initiative)标准的容器
sudo runc --version
安装docker(每个主机,在k8s中可选,仅用于构建镜像)
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli docker-buildx-plugin docker-compose-plugin
安装k8s组件(每个主机)
指的是安装kubelet
、kubeadm
、kubectl
sudo apt-get update
# apt-transport-https may be a dummy package; if so, you can skip that package
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
# If the directory `/etc/apt/keyrings` does not exist, it should be created before the curl command, read the note below.
# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable --now kubelet
配置kubelet
sudo bash -c 'cat > /etc/default/kubelet << EOF
# 该参数指定了 kubelet 使用 systemd 作为容器运行时的 cgroup 驱动程序
KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
EOF'
# 这里先设置kubelet为开机自启
sudo systemctl enable kubelet
初始化master01主机
查看k82镜像(可选)
sudo kubeadm config images list
应该能列出以下信息:
registry.k8s.io/kube-apiserver:v1.31.1
registry.k8s.io/kube-controller-manager:v1.31.1
registry.k8s.io/kube-scheduler:v1.31.1
registry.k8s.io/kube-proxy:v1.31.1
registry.k8s.io/coredns/coredns:v1.11.3
registry.k8s.io/pause:3.10
registry.k8s.io/etcd:3.5.15-0
如果看到类似
remote version is much newer: v1.31.1; falling back to: stable-1.28
的提示说版本低,忽略它就行了。
k8s的镜像默认是谷歌仓库地址,需要代理才可以正常访问;如果你没有代理,请使用阿里云仓库也是可以的;用--image-repository="registry.aliyuncs.com/google_containers"来指定使用阿里云镜像仓库中的镜像部署k8s集群。
下载镜像(可选)
sudo kubeadm config images pull
创建k8s集群
初始化master01节点
- 要把下面的
--apiserver-advertise-address
参数换成实际的k8s-master01
主机IP地址. --pod-network-cidr
参数指的是本k8s集群中要让pod使用的网段。
sudo kubeadm init --control-plane-endpoint=k8s-master01 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=172.31.0.11 --cri-socket unix:///run/containerd/containerd.sock
如果执行顺利的话,应该会看到下面的信息(每次的具体参数应该不同)
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join k8s-master01:6443 --token 1ahq7i.sv3pqgcss8v5oecj \
--discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join k8s-master01:6443 --token 1ahq7i.sv3pqgcss8v5oecj \
--discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca
接下来我们就先配置一下kubectl的配置文件
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
接下来可以查看节点状态:
kubectl get nodes -o wide
应该能看到类似下面的结果:
# 查看节点状态
kubectl get nodes -o wide
# 结果类似
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master01 NotReady control-plane 25m v1.28.14 172.31.0.11 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23
# 查看集群信息
kubectl cluster-info
# 结果类似
Kubernetes control plane is running at https://k8s-master01:6443
CoreDNS is running at https://k8s-master01:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# 列出所有CRI容器列表
sudo crictl ps -a
# 结果类似(其中STATE一列应该都是Running)
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
6177ae20a68e6 6a89d0ef825cb 29 minutes ago Running kube-proxy 0 e8d15cb2bcd1a kube-proxy-jlndc
a1a43a29df5c2 6cbf215f8d44e 30 minutes ago Running kube-scheduler 1 3163922b00a0e kube-scheduler-k8s-master01
19dfb26520340 7abec2d806048 30 minutes ago Running kube-controller-manager 1 f6df8f333fcf0 kube-controller-manager-k8s-master01
b4c7a5f9c967f 3438637c2f3ae 30 minutes ago Running kube-apiserver 0 b05316fac4cad kube-apiserver-k8s-master01
8a4c587d9b8d9 2e96e5913fc06 30 minutes ago Running etcd 0 9a8c10ea30b80 etcd-k8s-master01
添加worker节点(把k8s-node01和k8s-node02添加到集群)
先在k8s-master01
上得到添加节点命令(添加k8s-node01
和k8s-node02
之前分别是在k8s-master01
上执行一次)
sudo kubeadm token create --print-join-command
# 结果与下面的类似(每一次的token应该都是不一样的),把下面的结果复制下来,准备到worker节点上去执行
kubeadm join k8s-master01:6443 --token epvxya.fh4qmay5uwc8628a --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca
下面的操作主要在k8s-node01
和k8s-node02
上分别执行
# 安装nmap用于在worker节点上验证master节点上的api-server服务端口的连通性
sudo apt-get install nmap -y
# 把下面的ip地址换成实际master节点主机ip
nmap -p 6443 -Pn 10.31.0.11
# 结果
Starting Nmap 7.93 ( https://nmap.org ) at 2024-10-21 18:50 CST
Nmap scan report for k8s-master01 (172.31.0.11)
Host is up (0.00081s latency).
PORT STATE SERVICE
6443/tcp open sun-sr-https
Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds
# 把刚才在master节得得到的join命令粘贴过来执行(建议用非root用户,在前面加上sudo)
sudo kubeadm join k8s-master01:6443 --token epvxya.fh4qmay5uwc8628a --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca
# 结果类似下面
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
再到k8s-master01
上验证一下
kubectl get nodes
# 结果类似
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 41m v1.28.14
k8s-node01 NotReady <none> 2m6s v1.28.14
然后在master节点上再执行一次sudo kubeadm token create --print-join-command
,在k8s-node02
上重复上面的过程。
最后在k8s-master01
上查看节点信息
kubectl get nodes
# 结果类似下面
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 43m v1.28.14
k8s-node01 NotReady <none> 4m26s v1.28.14
k8s-node02 NotReady <none> 40s v1.28.14
# 查看k8s的pod信息
kubectl get pods -n kube-system
# 结果类似
NAME READY STATUS RESTARTS AGE
coredns-5dd5756b68-4btx5 0/1 Pending 0 45m
coredns-5dd5756b68-8v2z8 0/1 Pending 0 45m
etcd-k8s-master01 1/1 Running 0 45m
kube-apiserver-k8s-master01 1/1 Running 0 45m
kube-controller-manager-k8s-master01 1/1 Running 1 45m
kube-proxy-5tqw2 1/1 Running 0 6m33s
kube-proxy-864zg 1/1 Running 0 2m47s
kube-proxy-jlndc 1/1 Running 0 45m
kube-scheduler-k8s-master01 1/1 Running 1 45m
注意到每一个node的STATUS
列都是NotReady
,这是因为还没有安装配置网络插件,pod间的通信有问题。
安装calico网络插件(master节点)
参考文档https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart
安装Tigera Calico operator
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/tigera-operator.yaml
# 结果类似下面
serviceaccount/tigera-operator created
clusterrole.rbac.authorization.k8s.io/tigera-operator created
clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
deployment.apps/tigera-operator created
# 查看集群命名空间
kubectl get ns
# 结果类似下面
NAME STATUS AGE
default Active 63m
kube-node-lease Active 63m
kube-public Active 63m
kube-system Active 63m
tigera-operator Active 13s
# 查看tigera-operator下的pod
kubectl get pods -n tigera-operator
# 结果
NAME READY STATUS RESTARTS AGE
tigera-operator-5cfff76b77-tdswm 1/1 Running 0 3m46s
安装Calico
curl -L -O https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/custom-resources.yaml
# 修改ip池,需与初始化时一致
sed -i 's/192.168.0.0/10.244.0.0/' custom-resources.yaml
# 安装calico
kubectl create -f custom-resources.yaml
# 结果
installation.operator.tigera.io/default created
apiserver.operator.tigera.io/default created
再执行watch
命令等到所有pod的STATUS
都变成Running
watch kubectl get pods -n calico-system
# 结果
Every 2.0s: kubectl get pods -n calico-system k8s-master01: Mon Oct 21 19:22:54 2024
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5846f6d55d-87n88 1/1 Running 0 85s
calico-node-4mhxj 1/1 Running 0 85s
calico-node-6c64k 1/1 Running 0 85s
calico-node-sbzwz 1/1 Running 0 85s
calico-typha-6c76968df6-lcjm6 1/1 Running 0 84s
calico-typha-6c76968df6-xbnk5 1/1 Running 0 85s
csi-node-driver-2vrg7 2/2 Running 0 85s
csi-node-driver-gmb7m 2/2 Running 0 85s
csi-node-driver-mnqvx 2/2 Running 0 85s
确认pod运行状态直到所有的pod状态都是Running
watch kubectl get pods -n calico-system
# 结果类似
Every 2.0s: kubectl get pods -n calico-system k8s-master01: Mon Oct 21 19:23:47 2024
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5846f6d55d-87n88 1/1 Running 0 2m18s
calico-node-4mhxj 1/1 Running 0 2m18s
calico-node-6c64k 1/1 Running 0 2m18s
calico-node-sbzwz 1/1 Running 0 2m18s
calico-typha-6c76968df6-lcjm6 1/1 Running 0 2m17s
calico-typha-6c76968df6-xbnk5 1/1 Running 0 2m18s
csi-node-driver-2vrg7 2/2 Running 0 2m18s
csi-node-driver-gmb7m 2/2 Running 0 2m18s
csi-node-driver-mnqvx 2/2 Running 0 2m18s
用ctrl+c
退出watch
状态,再查看k8s系统的pod状态
kubectl get pods -n kube-system -o wide
# 结果类似
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-5dd5756b68-4btx5 1/1 Running 0 72m 10.244.58.196 k8s-node02 <none> <none>
coredns-5dd5756b68-8v2z8 1/1 Running 0 72m 10.244.58.193 k8s-node02 <none> <none>
etcd-k8s-master01 1/1 Running 0 72m 172.31.0.11 k8s-master01 <none> <none>
kube-apiserver-k8s-master01 1/1 Running 0 72m 172.31.0.11 k8s-master01 <none> <none>
kube-controller-manager-k8s-master01 1/1 Running 1 72m 172.31.0.11 k8s-master01 <none> <none>
kube-proxy-5tqw2 1/1 Running 0 33m 172.31.0.14 k8s-node01 <none> <none>
kube-proxy-864zg 1/1 Running 0 29m 172.31.0.15 k8s-node02 <none> <none>
kube-proxy-jlndc 1/1 Running 0 72m 172.31.0.11 k8s-master01 <none> <none>
kube-scheduler-k8s-master01 1/1 Running 1 72m 172.31.0.11 k8s-master01 <none> <none>
清理taints(污点)
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
再次确认集群节点
kubectl get nodes -o wide
# 结果类似
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master01 Ready control-plane 75m v1.28.14 172.31.0.11 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23
k8s-node01 Ready <none> 35m v1.28.14 172.31.0.14 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23
k8s-node02 Ready <none> 32m v1.28.14 172.31.0.15 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23