问题描述
在尝试在Ubuntu 20.04 TLS上安装带有Dashboard的Kubernetes集群时,遇到了问题。他按照以下命令进行操作:
swapoff -a
sudo apt update
sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
sudo apt install apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" >> ~/kubernetes.list
sudo mv ~/kubernetes.list /etc/apt/sources.list.d
sudo apt update
sudo apt install kubeadm kubelet kubectl kubernetes-cni
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.0/aio/deploy/recommended.yaml
kubectl proxy --address 192.168.1.133 --accept-hosts '.*'
但是当他打开http://192.168.1.133:8001/api/v1/namespaces/default/services/https:kubernetes-dashboard:https/proxy
时,出现以下错误:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "services \"kubernetes-dashboard\" not found",
"reason": "NotFound",
"details": {
"name": "kubernetes-dashboard",
"kind": "services"
},
"code": 404
}
他尝试列出Pods,发现kube-flannel的Pod处于错误状态:
kubectl get pods --all-namespaces
输出:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-f6bwx 0/1 Error 11 (29s ago) 76m
kube-system coredns-6d4b75cb6d-rk4kq 0/1 ContainerCreating 0 77m
kube-system coredns-6d4b75cb6d-vkpcm 0/1 ContainerCreating 0 77m
kube-system etcd-ubuntukubernetis1 1/1 Running 1 (52s ago) 77m
kube-system kube-apiserver-ubuntukubernetis1 1/1 Running 1 (52s ago) 77m
kube-system kube-controller-manager-ubuntukubernetis1 1/1 Running 1 (52s ago) 77m
kube-system kube-proxy-n6ldq 1/1 Running 1 (52s ago) 77m
kube-system kube-scheduler-ubuntukubernetis1 1/1 Running 1 (52s ago) 77m
kubernetes-dashboard dashboard-metrics-scraper-7bfdf779ff-sdnc8 0/1 Pending 0 75m
kubernetes-dashboard dashboard-metrics-scraper-8c47d4b5d-2sxrb 0/1 Pending 0 59m
kubernetes-dashboard kubernetes-dashboard-5676d8b865-fws4j 0/1 Pending 0 59m
kubernetes-dashboard kubernetes-dashboard-6cdd697d84-nmpv2 0/1 Pending 0 75m
他尝试查看kube-flannel的Pod日志:
kubectl logs -n kube-flannel kube-flannel-ds-f6bwx -p
输出:
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0724 14:49:57.782499 1 main.go:207] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W0724 14:49:57.782676 1 client_config.go:614] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E0724 14:49:57.892230 1 main.go:224] Failed to create SubnetManager: error retrieving pod spec for 'kube-flannel/kube-flannel-ds-f6bwx': pods "kube-flannel-ds-f6bwx" is forbidden: User "system:serviceaccount:kube-flannel:flannel" cannot get resource "pods" in API group "" in the namespace "kube-flannel"
用户想知道如何解决这个问题。
解决方案
请注意以下操作注意版本差异及修改前做好备份。
步骤1
首先,你查询的URL是错误的。根据你应用的最后一个yaml文件和你的kubectl get pods -A
,可以猜测kubernetes-dashboard Service位于kubernetes-dashboard命名空间,而不是默认的命名空间。
如果你只是想连接Kubernetes Dashboard,可以使用以下命令代替kubectl proxy
命令,然后在浏览器中打开https://localhost:8443
:
kubectl port-forward -n kubernetes-dashboard deploy/kubernetes-dashboard 8443:8443
步骤2
其次,你的SDN出现了问题。在你的kubectl get pods
输出中,我们可以看到kube-flannel的Pod处于错误状态。
查看这个容器的日志,尝试找出它为什么无法启动:
kubectl logs -n kube-flannel kube-flannel-ds-xxxx -p
多年来我没有设置flannel了,但我记得除了应用它们的RBAC和daemonsets yaml文件之外,我还需要修补节点,为它们分配一个CIDR。例如:kubectl patch node my-node-1 -p '{ "spec": { "podCIDR": "10.32.3.0/24" } }' --type merge
(每个podCIDR必须是唯一的,每个节点都有自己的范围来托管Pods。如果我没记错,每个podCIDR必须是flannel的net-conf.json网络子网的子集,请查看安装flannel时创建的ConfigMap)。
至于你的最后一条评论,错误告诉我们以下内容:
Failed to create SubnetManager: error retrieving pod spec for 'kube-flannel/kube-flannel-ds-f6bwx': pods "kube-flannel-ds-f6bwx" is forbidden: User "system:serviceaccount:kube-flannel:flannel" cannot get resource "pods" in API group "" in the namespace "kube-flannel"
回顾一下你用于设置flannel的文件,你提到了一个设置在kube-flannel命名空间中的文件,然后是一个为kube-system中的serviceaccount创建clusterrolebinding的rbac文件。为了修复你的SDN问题,你可以创建以下内容:
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel-fix
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-flannel
顺便说一下:在你的情况下,kube-flannel-rbac是不必要的。如果你从他们的旧版清单安装flannel,它将是必需的(https://github.com/flannel-io/flannel/blob/master/Documentation/k8s-manifests/kube-flannel-legacy.yml)。在你的情况下,我们正在修复的ClusterRoleBinding应该已经正确创建,只需应用https://github.com/flannel-io/flannel/blob/master/Documentation/kube-flannel.yml即可。