knowledge/docs/kubernetes/install.md
2024-11-03 14:36:03 +08:00

22 KiB
Raw Permalink Blame History

K8s安装过程Debian12+Kubernetes1.28+Containerd+Calico用apt-get在线搭建1master+2node的K8s集群

wandoubaba / 2024-10-21

截止本文发布时kubernetes最新版本是v1.31,而补丁已经全部打完的最新稳定版本是v1.28,本文基于后者。

安装准备

资源准备

准备项 内容
操作系统 Debian 12(bookworm)
系统内核 6.1.0-23-amd64
容器运行时 containerd CRI

主机清单

IP假设 主机名 CPU 内存
172.31.0.11 k8s-master01 8c 8G
172.31.0.14 k8s-master04 8c 16G
172.31.0.15 k8s-master05 8c 16G

过程

确认主机基本信息(每台主机)

# 查看IP地址确认设置为静态地址
ip addr | awk '/inet /{split($2, ip, "/"); print ip[1]}'
# 查看MAC地址确保每台主机的MAC唯一
ip link | awk '/state UP/ {getline; print $2}'
# 查看主机的UUID确保product_uuid的唯一性
sudo cat /sys/class/dmi/id/product_uuid
# 查看内核版本
uname -r
# 查看操作系统发行版本信息
cat /etc/os-release
# 确认cpu核心数量
lscpu -p | grep -v "^#" | wc -l
# 确认内存容量
free -h | awk '/Mem/{print $2}'
# 确认磁盘空间
lsblk

设置主机名并更新/etc/hosts文件每台主机

设置主机名

# 在主控制节点k8s-master01上执行
sudo hostnamectl set-hostname k8s-master01
# 在工作节点k8s-node01、k8s-node02上执行
sudo hostnamectl set-hostname k8s-node01
sudo hostnamectl set-hostname k8s-node02

主机名设置成功后可以exit退出终端再重新连接,或者直接执行bash,都可以看到修改后的效果

修改/etc/hosts文件

要根据自己的真实IP进行修改

sudo bash -c 'cat <<EOF >> /etc/hosts
172.31.0.11 k8s-master01
172.31.0.14 k8s-node01
172.31.0.15 k8s-node02
EOF'

设置时区并安装时间服务(每台主机)

sudo timedatectl set-timezone Asia/Shanghai
sudo apt-get update && sudo apt-get install -y chrony

配置阿里云时间服务器(可选)

pool    ntp1.aliyun.com iburst maxsources 4

提示:在/etc/chrony/chrony.conf中加入上述配置将其他pool开头的配置注释掉

重启chrony,并验证

sudo systemctl restart chrony
sudo systemctl status chrony
sudo chronyc sources

禁用swap每台主机

sudo swapoff -a

还要在/etc/fstab文件中注释关于swapr挂载的行。

禁用防火墙(每台主机)

sudo ufw disable
sudo apt-get remove ufw

优化内核参数

sudo bash -c 'cat > /etc/sysctl.d/kubernetes.conf <<EOF
# 允许 IPv6 转发请求通过iptables进行处理如果禁用防火墙或不是iptables则该配置无效
net.bridge.bridge-nf-call-ip6tables = 1
# 允许 IPv4 转发请求通过iptables进行处理如果禁用防火墙或不是iptables则该配置无效
net.bridge.bridge-nf-call-iptables = 1
# 启用IPv4数据包的转发功能
net.ipv4.ip_forward = 1
# 禁用发送 ICMP 重定向消息
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
# 提高 TCP 连接跟踪的最大数量
net.netfilter.nf_conntrack_max = 1000000
# 提高连接追踪表的超时时间
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
# 提高监听队列大小
net.core.somaxconn = 1024
# 防止 SYN 攻击
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_synack_retries = 2
# 提高文件描述符限制
fs.file-max = 65536
# 设置虚拟内存交换swap的使用策略为0减少对磁盘的频繁读写
vm.swappiness = 0
EOF'


# 加载或启动内核模块 br_netfilter该模块提供了网络桥接所需的网络过滤功能
sudo modprobe br_netfilter

# 查看是否已成功加载模块
lsmod | grep br_netfilter

# 将读取该文件中的参数设置,并将其应用到系统的当前运行状态中
sudo sysctl -p /etc/sysctl.d/kubernetes.conf

安装ipset和ipvsadm每台主机

  • ipset主要用于支持Service的负载均衡和网络策略。它可以帮助实现高性能的数据包过滤和转发以及对IP地址和端口进行快速匹配。
  • ipvsadm主要用于配置和管理IPVS负载均衡器,以实现Service的负载均衡。
sudo apt-get install -y ipset ipvsadm
# 检查是否安装
dpkg -l ipset ipvsadm

内核模块配置(每台主机)

sudo bash -c 'cat > /etc/modules-load.d/kubernetes.conf << EOF
# /etc/modules-load.d/kubernetes.conf

# Linux 网桥支持
br_netfilter

# IPVS 加载均衡器
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh

# IPv4 连接跟踪
nf_conntrack_ipv4

# IP 表规则
ip_tables
EOF'

# 添加可执行权限
sudo chmod a+x /etc/modules-load.d/kubernetes.conf

关闭安全策略服务(每台主机)

# 停止 AppArmor 服务
sudo systemctl stop apparmor.service

# 禁用 AppArmor 服务
sudo systemctl disable apparmor.service

关闭防火墙(每台主机)

# 禁用ufw
sudo ufw disable
sudo systemctl stop ufw.service
sudo systemctl disable ufw.service

安装容器运行时(每台主机)

下载

https://github.com/containerd/containerd/releases查看最新版本,然后选择对应的cri-containerd-x.x.x-linux-platform文件下载:

curl -L -O https://github.com/containerd/containerd/releases/download/v1.7.23/cri-containerd-1.7.23-linux-amd64.tar.gz

安装

sudo tar xf cri-containerd-1.7.23-linux-amd64.tar.gz -C /

配置

sudo mkdir /etc/containerd
sudo bash -c 'containerd config default > /etc/containerd/config.toml'
sudo sed -i '/sandbox_image/s/3.8/3.9/' /etc/containerd/config.toml
sudo sed -i '/SystemdCgroup/s/false/true/' /etc/containerd/config.toml

启动

# 启用并立即启动containerd服务
sudo systemctl enable --now containerd.service
# 检查containerd服务的当前状态
sudo systemctl status containerd.service

验证

# 检查containerd的版本
containerd --version
# 与CRIContainer Runtime Interface兼容的容器运行时交互的命令行工具
crictl --version
# 运行符合 OCIOpen Container Initiative标准的容器
sudo runc --version

安装docker每个主机在k8s中可选仅用于构建镜像

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli docker-buildx-plugin docker-compose-plugin

安装k8s组件每个主机

指的是安装kubeletkubeadmkubectl

sudo apt-get update
# apt-transport-https may be a dummy package; if so, you can skip that package

sudo apt-get install -y apt-transport-https ca-certificates curl gpg
# If the directory `/etc/apt/keyrings` does not exist, it should be created before the curl command, read the note below.

# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable --now kubelet

配置kubelet

sudo bash -c 'cat > /etc/default/kubelet << EOF
# 该参数指定了 kubelet 使用 systemd 作为容器运行时的 cgroup 驱动程序
KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
EOF'
# 这里先设置kubelet为开机自启
sudo systemctl enable kubelet

初始化master01主机

查看k82镜像可选

sudo kubeadm config images list

应该能列出以下信息:

registry.k8s.io/kube-apiserver:v1.31.1
registry.k8s.io/kube-controller-manager:v1.31.1
registry.k8s.io/kube-scheduler:v1.31.1
registry.k8s.io/kube-proxy:v1.31.1
registry.k8s.io/coredns/coredns:v1.11.3
registry.k8s.io/pause:3.10
registry.k8s.io/etcd:3.5.15-0

如果看到类似remote version is much newer: v1.31.1; falling back to: stable-1.28的提示说版本低,忽略它就行了。

k8s的镜像默认是谷歌仓库地址需要代理才可以正常访问如果你没有代理请使用阿里云仓库也是可以的用--image-repository="registry.aliyuncs.com/google_containers"来指定使用阿里云镜像仓库中的镜像部署k8s集群。

下载镜像(可选)

sudo kubeadm config images pull

创建k8s集群

初始化master01节点
  • 要把下面的--apiserver-advertise-address参数换成实际的k8s-master01主机IP地址.
  • --pod-network-cidr参数指的是本k8s集群中要让pod使用的网段。
sudo kubeadm init --control-plane-endpoint=k8s-master01 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=172.31.0.11 --cri-socket unix:///run/containerd/containerd.sock

如果执行顺利的话,应该会看到下面的信息(每次的具体参数应该不同)

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

  kubeadm join k8s-master01:6443 --token 1ahq7i.sv3pqgcss8v5oecj \
	--discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca \
	--control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join k8s-master01:6443 --token 1ahq7i.sv3pqgcss8v5oecj \
	--discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca

接下来我们就先配置一下kubectl的配置文件

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

接下来可以查看节点状态:

kubectl get nodes -o wide

应该能看到类似下面的结果:

# 查看节点状态
kubectl get nodes -o wide
# 结果类似
NAME           STATUS     ROLES           AGE   VERSION    INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION         CONTAINER-RUNTIME
k8s-master01   NotReady   control-plane   25m   v1.28.14   172.31.0.11   <none>        Debian GNU/Linux 12 (bookworm)   6.1.0-23-cloud-amd64   containerd://1.7.23
# 查看集群信息
kubectl cluster-info
# 结果类似
Kubernetes control plane is running at https://k8s-master01:6443
CoreDNS is running at https://k8s-master01:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# 列出所有CRI容器列表
sudo crictl ps -a
# 结果类似其中STATE一列应该都是Running
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
6177ae20a68e6       6a89d0ef825cb       29 minutes ago      Running             kube-proxy                0                   e8d15cb2bcd1a       kube-proxy-jlndc
a1a43a29df5c2       6cbf215f8d44e       30 minutes ago      Running             kube-scheduler            1                   3163922b00a0e       kube-scheduler-k8s-master01
19dfb26520340       7abec2d806048       30 minutes ago      Running             kube-controller-manager   1                   f6df8f333fcf0       kube-controller-manager-k8s-master01
b4c7a5f9c967f       3438637c2f3ae       30 minutes ago      Running             kube-apiserver            0                   b05316fac4cad       kube-apiserver-k8s-master01
8a4c587d9b8d9       2e96e5913fc06       30 minutes ago      Running             etcd                      0                   9a8c10ea30b80       etcd-k8s-master01

添加worker节点把k8s-node01和k8s-node02添加到集群

先在k8s-master01上得到添加节点命令(添加k8s-node01k8s-node02之前分别是在k8s-master01上执行一次)

sudo kubeadm token create --print-join-command
# 结果与下面的类似每一次的token应该都是不一样的把下面的结果复制下来准备到worker节点上去执行
kubeadm join k8s-master01:6443 --token epvxya.fh4qmay5uwc8628a --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca

下面的操作主要在k8s-node01k8s-node02上分别执行

# 安装nmap用于在worker节点上验证master节点上的api-server服务端口的连通性
sudo apt-get install nmap -y
# 把下面的ip地址换成实际master节点主机ip
nmap -p 6443 -Pn 10.31.0.11
# 结果
Starting Nmap 7.93 ( https://nmap.org ) at 2024-10-21 18:50 CST
Nmap scan report for k8s-master01 (172.31.0.11)
Host is up (0.00081s latency).

PORT     STATE SERVICE
6443/tcp open  sun-sr-https

Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds
# 把刚才在master节得得到的join命令粘贴过来执行建议用非root用户在前面加上sudo
sudo kubeadm join k8s-master01:6443 --token epvxya.fh4qmay5uwc8628a --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca
# 结果类似下面
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

再到k8s-master01上验证一下

kubectl get nodes
# 结果类似
NAME           STATUS     ROLES           AGE    VERSION
k8s-master01   NotReady   control-plane   41m    v1.28.14
k8s-node01     NotReady   <none>          2m6s   v1.28.14

然后在master节点上再执行一次sudo kubeadm token create --print-join-command,在k8s-node02上重复上面的过程。

最后在k8s-master01上查看节点信息

kubectl get nodes
# 结果类似下面
NAME           STATUS     ROLES           AGE     VERSION
k8s-master01   NotReady   control-plane   43m     v1.28.14
k8s-node01     NotReady   <none>          4m26s   v1.28.14
k8s-node02     NotReady   <none>          40s     v1.28.14
# 查看k8s的pod信息
kubectl get pods -n kube-system
# 结果类似
NAME                                   READY   STATUS    RESTARTS   AGE
coredns-5dd5756b68-4btx5               0/1     Pending   0          45m
coredns-5dd5756b68-8v2z8               0/1     Pending   0          45m
etcd-k8s-master01                      1/1     Running   0          45m
kube-apiserver-k8s-master01            1/1     Running   0          45m
kube-controller-manager-k8s-master01   1/1     Running   1          45m
kube-proxy-5tqw2                       1/1     Running   0          6m33s
kube-proxy-864zg                       1/1     Running   0          2m47s
kube-proxy-jlndc                       1/1     Running   0          45m
kube-scheduler-k8s-master01            1/1     Running   1          45m

注意到每一个node的STATUS列都是NotReady这是因为还没有安装配置网络插件pod间的通信有问题。

安装calico网络插件master节点

参考文档https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart

安装Tigera Calico operator

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/tigera-operator.yaml
# 结果类似下面
serviceaccount/tigera-operator created
clusterrole.rbac.authorization.k8s.io/tigera-operator created
clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
deployment.apps/tigera-operator created
# 查看集群命名空间
kubectl get ns
# 结果类似下面
NAME              STATUS   AGE
default           Active   63m
kube-node-lease   Active   63m
kube-public       Active   63m
kube-system       Active   63m
tigera-operator   Active   13s
# 查看tigera-operator下的pod
kubectl get pods -n tigera-operator
# 结果
NAME                               READY   STATUS    RESTARTS   AGE
tigera-operator-5cfff76b77-tdswm   1/1     Running   0          3m46s

安装Calico

curl -L -O https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/custom-resources.yaml
# 修改ip池需与初始化时一致
sed -i 's/192.168.0.0/10.244.0.0/' custom-resources.yaml
# 安装calico
kubectl create -f custom-resources.yaml
# 结果
installation.operator.tigera.io/default created
apiserver.operator.tigera.io/default created

再执行watch命令等到所有pod的STATUS都变成Running

watch kubectl get pods -n calico-system
# 结果
Every 2.0s: kubectl get pods -n calico-system                                                                                                                         k8s-master01: Mon Oct 21 19:22:54 2024

NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-5846f6d55d-87n88   1/1     Running   0          85s
calico-node-4mhxj                          1/1     Running   0          85s
calico-node-6c64k                          1/1     Running   0          85s
calico-node-sbzwz                          1/1     Running   0          85s
calico-typha-6c76968df6-lcjm6              1/1     Running   0          84s
calico-typha-6c76968df6-xbnk5              1/1     Running   0          85s
csi-node-driver-2vrg7                      2/2     Running   0          85s
csi-node-driver-gmb7m                      2/2     Running   0          85s
csi-node-driver-mnqvx                      2/2     Running   0          85s

确认pod运行状态直到所有的pod状态都是Running

watch kubectl get pods -n calico-system
# 结果类似
Every 2.0s: kubectl get pods -n calico-system                                                                                                                         k8s-master01: Mon Oct 21 19:23:47 2024

NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-5846f6d55d-87n88   1/1     Running   0          2m18s
calico-node-4mhxj                          1/1     Running   0          2m18s
calico-node-6c64k                          1/1     Running   0          2m18s
calico-node-sbzwz                          1/1     Running   0          2m18s
calico-typha-6c76968df6-lcjm6              1/1     Running   0          2m17s
calico-typha-6c76968df6-xbnk5              1/1     Running   0          2m18s
csi-node-driver-2vrg7                      2/2     Running   0          2m18s
csi-node-driver-gmb7m                      2/2     Running   0          2m18s
csi-node-driver-mnqvx                      2/2     Running   0          2m18s

ctrl+c退出watch状态再查看k8s系统的pod状态

kubectl get pods -n kube-system -o wide
# 结果类似
NAME                                   READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
coredns-5dd5756b68-4btx5               1/1     Running   0          72m   10.244.58.196   k8s-node02     <none>           <none>
coredns-5dd5756b68-8v2z8               1/1     Running   0          72m   10.244.58.193   k8s-node02     <none>           <none>
etcd-k8s-master01                      1/1     Running   0          72m   172.31.0.11   k8s-master01   <none>           <none>
kube-apiserver-k8s-master01            1/1     Running   0          72m   172.31.0.11   k8s-master01   <none>           <none>
kube-controller-manager-k8s-master01   1/1     Running   1          72m   172.31.0.11   k8s-master01   <none>           <none>
kube-proxy-5tqw2                       1/1     Running   0          33m   172.31.0.14   k8s-node01     <none>           <none>
kube-proxy-864zg                       1/1     Running   0          29m   172.31.0.15   k8s-node02     <none>           <none>
kube-proxy-jlndc                       1/1     Running   0          72m   172.31.0.11   k8s-master01   <none>           <none>
kube-scheduler-k8s-master01            1/1     Running   1          72m   172.31.0.11   k8s-master01   <none>           <none>

清理taints污点

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

再次确认集群节点

kubectl get nodes -o wide
# 结果类似
NAME           STATUS   ROLES           AGE   VERSION    INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION         CONTAINER-RUNTIME
k8s-master01   Ready    control-plane   75m   v1.28.14   172.31.0.11   <none>        Debian GNU/Linux 12 (bookworm)   6.1.0-23-cloud-amd64   containerd://1.7.23
k8s-node01     Ready    <none>          35m   v1.28.14   172.31.0.14   <none>        Debian GNU/Linux 12 (bookworm)   6.1.0-23-cloud-amd64   containerd://1.7.23
k8s-node02     Ready    <none>          32m   v1.28.14   172.31.0.15   <none>        Debian GNU/Linux 12 (bookworm)   6.1.0-23-cloud-amd64   containerd://1.7.23