knowledge/docs/kubernetes/install.md
2024-11-03 14:36:03 +08:00

606 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# K8s安装过程Debian12+Kubernetes1.28+Containerd+Calico用apt-get在线搭建1master+2node的K8s集群
> wandoubaba / 2024-10-21
截止本文发布时kubernetes最新版本是`v1.31`,而补丁已经全部打完的最新稳定版本是`v1.28`,本文基于后者。
## 安装准备
### 资源准备
|准备项|内容|
|---|---|
|操作系统|Debian 12(bookworm)|
|系统内核|6.1.0-23-amd64|
|容器运行时|containerd CRI|
### 主机清单
|IP假设|主机名|CPU|内存|
|---|---|---|---|
|172.31.0.11|k8s-master01|8c|8G|
|172.31.0.14|k8s-master04|8c|16G|
|172.31.0.15|k8s-master05|8c|16G|
## 过程
### 确认主机基本信息(每台主机)
```sh
# 查看IP地址确认设置为静态地址
ip addr | awk '/inet /{split($2, ip, "/"); print ip[1]}'
# 查看MAC地址确保每台主机的MAC唯一
ip link | awk '/state UP/ {getline; print $2}'
# 查看主机的UUID确保product_uuid的唯一性
sudo cat /sys/class/dmi/id/product_uuid
# 查看内核版本
uname -r
# 查看操作系统发行版本信息
cat /etc/os-release
# 确认cpu核心数量
lscpu -p | grep -v "^#" | wc -l
# 确认内存容量
free -h | awk '/Mem/{print $2}'
# 确认磁盘空间
lsblk
```
### 设置主机名并更新/etc/hosts文件每台主机
#### 设置主机名
```sh
# 在主控制节点k8s-master01上执行
sudo hostnamectl set-hostname k8s-master01
# 在工作节点k8s-node01、k8s-node02上执行
sudo hostnamectl set-hostname k8s-node01
sudo hostnamectl set-hostname k8s-node02
```
主机名设置成功后可以`exit`退出终端再重新连接,或者直接执行`bash`,都可以看到修改后的效果
#### 修改/etc/hosts文件
要根据自己的真实IP进行修改
```sh
sudo bash -c 'cat <<EOF >> /etc/hosts
172.31.0.11 k8s-master01
172.31.0.14 k8s-node01
172.31.0.15 k8s-node02
EOF'
```
### 设置时区并安装时间服务(每台主机)
```sh
sudo timedatectl set-timezone Asia/Shanghai
sudo apt-get update && sudo apt-get install -y chrony
```
#### 配置阿里云时间服务器(可选)
```conf
pool ntp1.aliyun.com iburst maxsources 4
```
提示:在/etc/chrony/chrony.conf中加入上述配置将其他pool开头的配置注释掉
重启chrony,并验证
```sh
sudo systemctl restart chrony
sudo systemctl status chrony
sudo chronyc sources
```
### 禁用swap每台主机
```sh
sudo swapoff -a
```
还要在`/etc/fstab`文件中注释关于swapr挂载的行。
### 禁用防火墙(每台主机)
```sh
sudo ufw disable
sudo apt-get remove ufw
```
### 优化内核参数
```sh
sudo bash -c 'cat > /etc/sysctl.d/kubernetes.conf <<EOF
# 允许 IPv6 转发请求通过iptables进行处理如果禁用防火墙或不是iptables则该配置无效
net.bridge.bridge-nf-call-ip6tables = 1
# 允许 IPv4 转发请求通过iptables进行处理如果禁用防火墙或不是iptables则该配置无效
net.bridge.bridge-nf-call-iptables = 1
# 启用IPv4数据包的转发功能
net.ipv4.ip_forward = 1
# 禁用发送 ICMP 重定向消息
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
# 提高 TCP 连接跟踪的最大数量
net.netfilter.nf_conntrack_max = 1000000
# 提高连接追踪表的超时时间
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
# 提高监听队列大小
net.core.somaxconn = 1024
# 防止 SYN 攻击
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_synack_retries = 2
# 提高文件描述符限制
fs.file-max = 65536
# 设置虚拟内存交换swap的使用策略为0减少对磁盘的频繁读写
vm.swappiness = 0
EOF'
# 加载或启动内核模块 br_netfilter该模块提供了网络桥接所需的网络过滤功能
sudo modprobe br_netfilter
# 查看是否已成功加载模块
lsmod | grep br_netfilter
# 将读取该文件中的参数设置,并将其应用到系统的当前运行状态中
sudo sysctl -p /etc/sysctl.d/kubernetes.conf
```
### 安装ipset和ipvsadm每台主机
- `ipset`主要用于支持`Service`的负载均衡和网络策略。它可以帮助实现高性能的数据包过滤和转发以及对IP地址和端口进行快速匹配。
- `ipvsadm`主要用于配置和管理`IPVS`负载均衡器,以实现`Service`的负载均衡。
```sh
sudo apt-get install -y ipset ipvsadm
# 检查是否安装
dpkg -l ipset ipvsadm
```
### 内核模块配置(每台主机)
```sh
sudo bash -c 'cat > /etc/modules-load.d/kubernetes.conf << EOF
# /etc/modules-load.d/kubernetes.conf
# Linux 网桥支持
br_netfilter
# IPVS 加载均衡器
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
# IPv4 连接跟踪
nf_conntrack_ipv4
# IP 表规则
ip_tables
EOF'
# 添加可执行权限
sudo chmod a+x /etc/modules-load.d/kubernetes.conf
```
### 关闭安全策略服务(每台主机)
```sh
# 停止 AppArmor 服务
sudo systemctl stop apparmor.service
# 禁用 AppArmor 服务
sudo systemctl disable apparmor.service
```
### 关闭防火墙(每台主机)
```sh
# 禁用ufw
sudo ufw disable
sudo systemctl stop ufw.service
sudo systemctl disable ufw.service
```
### 安装容器运行时(每台主机)
#### 下载
<https://github.com/containerd/containerd/releases>查看最新版本,然后选择对应的`cri-containerd-x.x.x-linux-platform`文件下载:
```sh
curl -L -O https://github.com/containerd/containerd/releases/download/v1.7.23/cri-containerd-1.7.23-linux-amd64.tar.gz
```
#### 安装
```sh
sudo tar xf cri-containerd-1.7.23-linux-amd64.tar.gz -C /
```
#### 配置
```sh
sudo mkdir /etc/containerd
sudo bash -c 'containerd config default > /etc/containerd/config.toml'
sudo sed -i '/sandbox_image/s/3.8/3.9/' /etc/containerd/config.toml
sudo sed -i '/SystemdCgroup/s/false/true/' /etc/containerd/config.toml
```
#### 启动
```sh
# 启用并立即启动containerd服务
sudo systemctl enable --now containerd.service
# 检查containerd服务的当前状态
sudo systemctl status containerd.service
```
#### 验证
```sh
# 检查containerd的版本
containerd --version
# 与CRIContainer Runtime Interface兼容的容器运行时交互的命令行工具
crictl --version
# 运行符合 OCIOpen Container Initiative标准的容器
sudo runc --version
```
### 安装docker每个主机在k8s中可选仅用于构建镜像
```sh
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli docker-buildx-plugin docker-compose-plugin
```
### 安装k8s组件每个主机
指的是安装`kubelet`、`kubeadm`、`kubectl`
```sh
sudo apt-get update
# apt-transport-https may be a dummy package; if so, you can skip that package
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
# If the directory `/etc/apt/keyrings` does not exist, it should be created before the curl command, read the note below.
# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable --now kubelet
```
#### 配置kubelet
```sh
sudo bash -c 'cat > /etc/default/kubelet << EOF
# 该参数指定了 kubelet 使用 systemd 作为容器运行时的 cgroup 驱动程序
KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
EOF'
# 这里先设置kubelet为开机自启
sudo systemctl enable kubelet
```
### 初始化master01主机
#### 查看k82镜像可选
```sh
sudo kubeadm config images list
```
应该能列出以下信息:
```sh
registry.k8s.io/kube-apiserver:v1.31.1
registry.k8s.io/kube-controller-manager:v1.31.1
registry.k8s.io/kube-scheduler:v1.31.1
registry.k8s.io/kube-proxy:v1.31.1
registry.k8s.io/coredns/coredns:v1.11.3
registry.k8s.io/pause:3.10
registry.k8s.io/etcd:3.5.15-0
```
> 如果看到类似`remote version is much newer: v1.31.1; falling back to: stable-1.28`的提示说版本低,忽略它就行了。
> k8s的镜像默认是谷歌仓库地址需要代理才可以正常访问如果你没有代理请使用阿里云仓库也是可以的用--image-repository="registry.aliyuncs.com/google_containers"来指定使用阿里云镜像仓库中的镜像部署k8s集群。
#### 下载镜像(可选)
```sh
sudo kubeadm config images pull
```
#### 创建k8s集群
##### 初始化master01节点
- 要把下面的`--apiserver-advertise-address`参数换成实际的`k8s-master01`主机IP地址.
- `--pod-network-cidr`参数指的是本k8s集群中要让pod使用的网段。
```sh
sudo kubeadm init --control-plane-endpoint=k8s-master01 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=172.31.0.11 --cri-socket unix:///run/containerd/containerd.sock
```
如果执行顺利的话,应该会看到下面的信息(每次的具体参数应该不同)
```sh
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join k8s-master01:6443 --token 1ahq7i.sv3pqgcss8v5oecj \
--discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join k8s-master01:6443 --token 1ahq7i.sv3pqgcss8v5oecj \
--discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca
```
接下来我们就先配置一下kubectl的配置文件
```sh
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
```
接下来可以查看节点状态:
```sh
kubectl get nodes -o wide
```
应该能看到类似下面的结果:
```sh
# 查看节点状态
kubectl get nodes -o wide
# 结果类似
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master01 NotReady control-plane 25m v1.28.14 172.31.0.11 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23
# 查看集群信息
kubectl cluster-info
# 结果类似
Kubernetes control plane is running at https://k8s-master01:6443
CoreDNS is running at https://k8s-master01:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# 列出所有CRI容器列表
sudo crictl ps -a
# 结果类似其中STATE一列应该都是Running
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
6177ae20a68e6 6a89d0ef825cb 29 minutes ago Running kube-proxy 0 e8d15cb2bcd1a kube-proxy-jlndc
a1a43a29df5c2 6cbf215f8d44e 30 minutes ago Running kube-scheduler 1 3163922b00a0e kube-scheduler-k8s-master01
19dfb26520340 7abec2d806048 30 minutes ago Running kube-controller-manager 1 f6df8f333fcf0 kube-controller-manager-k8s-master01
b4c7a5f9c967f 3438637c2f3ae 30 minutes ago Running kube-apiserver 0 b05316fac4cad kube-apiserver-k8s-master01
8a4c587d9b8d9 2e96e5913fc06 30 minutes ago Running etcd 0 9a8c10ea30b80 etcd-k8s-master01
```
### 添加worker节点把k8s-node01和k8s-node02添加到集群
先在`k8s-master01`上得到添加节点命令(添加`k8s-node01`和`k8s-node02`之前分别是在`k8s-master01`上执行一次)
```sh
sudo kubeadm token create --print-join-command
# 结果与下面的类似每一次的token应该都是不一样的把下面的结果复制下来准备到worker节点上去执行
kubeadm join k8s-master01:6443 --token epvxya.fh4qmay5uwc8628a --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca
```
下面的操作主要在`k8s-node01`和`k8s-node02`上分别执行
```sh
# 安装nmap用于在worker节点上验证master节点上的api-server服务端口的连通性
sudo apt-get install nmap -y
# 把下面的ip地址换成实际master节点主机ip
nmap -p 6443 -Pn 10.31.0.11
# 结果
Starting Nmap 7.93 ( https://nmap.org ) at 2024-10-21 18:50 CST
Nmap scan report for k8s-master01 (172.31.0.11)
Host is up (0.00081s latency).
PORT STATE SERVICE
6443/tcp open sun-sr-https
Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds
# 把刚才在master节得得到的join命令粘贴过来执行建议用非root用户在前面加上sudo
sudo kubeadm join k8s-master01:6443 --token epvxya.fh4qmay5uwc8628a --discovery-token-ca-cert-hash sha256:8bea18bff8c86d0bc23214974d6b2045c90760448cd4731c94546a9ae836e9ca
# 结果类似下面
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
```
再到`k8s-master01`上验证一下
```sh
kubectl get nodes
# 结果类似
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 41m v1.28.14
k8s-node01 NotReady <none> 2m6s v1.28.14
```
然后在master节点上再执行一次`sudo kubeadm token create --print-join-command`,在`k8s-node02`上重复上面的过程。
最后在`k8s-master01`上查看节点信息
```sh
kubectl get nodes
# 结果类似下面
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 43m v1.28.14
k8s-node01 NotReady <none> 4m26s v1.28.14
k8s-node02 NotReady <none> 40s v1.28.14
# 查看k8s的pod信息
kubectl get pods -n kube-system
# 结果类似
NAME READY STATUS RESTARTS AGE
coredns-5dd5756b68-4btx5 0/1 Pending 0 45m
coredns-5dd5756b68-8v2z8 0/1 Pending 0 45m
etcd-k8s-master01 1/1 Running 0 45m
kube-apiserver-k8s-master01 1/1 Running 0 45m
kube-controller-manager-k8s-master01 1/1 Running 1 45m
kube-proxy-5tqw2 1/1 Running 0 6m33s
kube-proxy-864zg 1/1 Running 0 2m47s
kube-proxy-jlndc 1/1 Running 0 45m
kube-scheduler-k8s-master01 1/1 Running 1 45m
```
注意到每一个node的`STATUS`列都是`NotReady`这是因为还没有安装配置网络插件pod间的通信有问题。
### 安装calico网络插件master节点
参考文档<https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart>
#### 安装Tigera Calico operator
```sh
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/tigera-operator.yaml
# 结果类似下面
serviceaccount/tigera-operator created
clusterrole.rbac.authorization.k8s.io/tigera-operator created
clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
deployment.apps/tigera-operator created
# 查看集群命名空间
kubectl get ns
# 结果类似下面
NAME STATUS AGE
default Active 63m
kube-node-lease Active 63m
kube-public Active 63m
kube-system Active 63m
tigera-operator Active 13s
# 查看tigera-operator下的pod
kubectl get pods -n tigera-operator
# 结果
NAME READY STATUS RESTARTS AGE
tigera-operator-5cfff76b77-tdswm 1/1 Running 0 3m46s
```
#### 安装Calico
```sh
curl -L -O https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/custom-resources.yaml
# 修改ip池需与初始化时一致
sed -i 's/192.168.0.0/10.244.0.0/' custom-resources.yaml
# 安装calico
kubectl create -f custom-resources.yaml
# 结果
installation.operator.tigera.io/default created
apiserver.operator.tigera.io/default created
```
再执行`watch`命令等到所有pod的`STATUS`都变成`Running`
```sh
watch kubectl get pods -n calico-system
# 结果
Every 2.0s: kubectl get pods -n calico-system k8s-master01: Mon Oct 21 19:22:54 2024
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5846f6d55d-87n88 1/1 Running 0 85s
calico-node-4mhxj 1/1 Running 0 85s
calico-node-6c64k 1/1 Running 0 85s
calico-node-sbzwz 1/1 Running 0 85s
calico-typha-6c76968df6-lcjm6 1/1 Running 0 84s
calico-typha-6c76968df6-xbnk5 1/1 Running 0 85s
csi-node-driver-2vrg7 2/2 Running 0 85s
csi-node-driver-gmb7m 2/2 Running 0 85s
csi-node-driver-mnqvx 2/2 Running 0 85s
```
确认pod运行状态直到所有的pod状态都是`Running`
```sh
watch kubectl get pods -n calico-system
# 结果类似
Every 2.0s: kubectl get pods -n calico-system k8s-master01: Mon Oct 21 19:23:47 2024
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5846f6d55d-87n88 1/1 Running 0 2m18s
calico-node-4mhxj 1/1 Running 0 2m18s
calico-node-6c64k 1/1 Running 0 2m18s
calico-node-sbzwz 1/1 Running 0 2m18s
calico-typha-6c76968df6-lcjm6 1/1 Running 0 2m17s
calico-typha-6c76968df6-xbnk5 1/1 Running 0 2m18s
csi-node-driver-2vrg7 2/2 Running 0 2m18s
csi-node-driver-gmb7m 2/2 Running 0 2m18s
csi-node-driver-mnqvx 2/2 Running 0 2m18s
```
用`ctrl+c`退出`watch`状态再查看k8s系统的pod状态
```sh
kubectl get pods -n kube-system -o wide
# 结果类似
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-5dd5756b68-4btx5 1/1 Running 0 72m 10.244.58.196 k8s-node02 <none> <none>
coredns-5dd5756b68-8v2z8 1/1 Running 0 72m 10.244.58.193 k8s-node02 <none> <none>
etcd-k8s-master01 1/1 Running 0 72m 172.31.0.11 k8s-master01 <none> <none>
kube-apiserver-k8s-master01 1/1 Running 0 72m 172.31.0.11 k8s-master01 <none> <none>
kube-controller-manager-k8s-master01 1/1 Running 1 72m 172.31.0.11 k8s-master01 <none> <none>
kube-proxy-5tqw2 1/1 Running 0 33m 172.31.0.14 k8s-node01 <none> <none>
kube-proxy-864zg 1/1 Running 0 29m 172.31.0.15 k8s-node02 <none> <none>
kube-proxy-jlndc 1/1 Running 0 72m 172.31.0.11 k8s-master01 <none> <none>
kube-scheduler-k8s-master01 1/1 Running 1 72m 172.31.0.11 k8s-master01 <none> <none>
```
清理taints污点
```sh
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
```
再次确认集群节点
```sh
kubectl get nodes -o wide
# 结果类似
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master01 Ready control-plane 75m v1.28.14 172.31.0.11 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23
k8s-node01 Ready <none> 35m v1.28.14 172.31.0.14 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23
k8s-node02 Ready <none> 32m v1.28.14 172.31.0.15 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-23-cloud-amd64 containerd://1.7.23
```