Bug 1797908
| Summary: | kubelet service failed to start on RHEL worker | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Gaoyun Pei <gpei> |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> |
| Status: | CLOSED ERRATA | QA Contact: | Gaoyun Pei <gpei> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.4 | CC: | jialiu, rteague, wjiang, yanyang |
| Target Milestone: | --- | Keywords: | Regression, TestBlocker |
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-05-04 11:33:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
--v="${KUBELET_LOG_LEVEL}"
Quotes around KUBELET_LOG_LEVEL seem to be crashing this. Removing quotes allowed the kubelet to start up.
Introduced here, https://github.com/openshift/machine-config-operator/pull/1390
Moving to MCO
Verify this bug on payload 4.4.0-0.nightly-2020-02-07-012035.
After launching a cluster using payload 4.4.0-0.nightly-2020-02-07-012035, add RHEL worker to the cluster.
RHEL worker could be added, kubelet service is running well.
[root@gpei-4-kx6l2-w-a-l-rhel-0 ~]# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-default-env.conf
Active: active (running) since Fri 2020-02-07 08:47:49 UTC; 1min 59s ago
Main PID: 1928 (kubelet)
Memory: 65.8M
CGroup: /system.slice/kubelet.service
└─1928 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-...
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593479 1928 helpers.go:781] eviction manager: observations: signal=imagefs.available, available: 55124216K...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593524 1928 helpers.go:781] eviction manager: observations: signal=imagefs.inodesFree, available: 31368371...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593537 1928 helpers.go:781] eviction manager: observations: signal=pid.available, available: 4193911, capa...7.405779327
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593548 1928 helpers.go:781] eviction manager: observations: signal=memory.available, available: 14378836Ki...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593557 1928 helpers.go:781] eviction manager: observations: signal=allocatableMemory.available, available:...7.406378244
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593566 1928 helpers.go:781] eviction manager: observations: signal=nodefs.available, available: 55124216Ki...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593575 1928 helpers.go:781] eviction manager: observations: signal=nodefs.inodesFree, available: 31368371,...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593636 1928 eviction_manager.go:320] eviction manager: no resources are starved
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.777180 1928 prober.go:129] Liveness probe for "ovs-4wzcz_openshift-sdn(1d3517e7-a06b-4962-98b0-b9672bf5ff2..." succeeded
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.826080 1928 prober.go:129] Readiness probe for "ovs-4wzcz_openshift-sdn(1d3517e7-a06b-4962-98b0-b9672bf5ff..." succeeded
Hint: Some lines were ellipsized, use -l to show in full.
[root@gpei-4-kx6l2-w-a-l-rhel-0 ~]# systemctl cat kubelet
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service
[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
Environment="KUBELET_LOG_LEVEL=3"
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env
ExecStart=/usr/bin/hyperkube \
kubelet \
--config=/etc/kubernetes/kubelet.conf \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--container-runtime=remote \
--container-runtime-endpoint=/var/run/crio/crio.sock \
--node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \
--minimum-container-ttl-duration=6m0s \
--volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
--cloud-provider=gce \
\
--v=${KUBELET_LOG_LEVEL}
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/kubelet.service.d/10-default-env.conf
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |
Description of problem: Add RHEL worker to a running 4.4 cluster, playbook failed at TASK [openshift_node : Approve node-bootstrapper CSR] TASK [openshift_node : Approve node-bootstrapper CSR] ************************** Tuesday 04 February 2020 13:43:02 +0800 (0:00:35.343) 0:09:46.254 ****** FAILED - RETRYING: Approve node-bootstrapper CSR (6 retries left). ... FAILED - RETRYING: Approve node-bootstrapper CSR (2 retries left). FAILED - RETRYING: Approve node-bootstrapper CSR (1 retries left). failed: [10.0.32.5 -> localhost] (item=10.0.32.5) => {"ansible_loop_var": "item", "attempts": 6, "changed": true, "cmd": "count=0; for csr in `oc --config=/tmp/installer-_NKK7v/auth/kubeconfig get csr --no-headers | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \" | cut -d \" \" -f1`;\ndo\n oc --config=/tmp/installer-_NKK7v/auth/kubeconfig describe csr/$csr | grep \" system:node:gpei1-02032223-w-a-l-rhel-0$\";\n if [ $? -eq 0 ];\n then\n oc --config=/tmp/installer-_NKK7v/auth/kubeconfig adm certificate approve ${csr};\n if [ $? -eq 0 ];\n then\n count=$((count+1));\n fi;\n fi;\ndone; exit $((!count));\n", "delta": "0:00:00.420416", "end": "2020-02-04 13:43:36.570825", "item": "10.0.32.5", "msg": "non-zero return code", "rc": 1, "start": "2020-02-04 13:43:36.150409", "stderr": "No resources found in default namespace.", "stderr_lines": ["No resources found in default namespace."], "stdout": "", "stdout_lines": []} Check the kubelet service on RHEL worker [root@gpei1-02032223-w-a-l-rhel-0 cloud-user]# systemctl status kubelet ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-default-env.conf Active: activating (auto-restart) (Result: exit-code) since Tue 2020-02-04 07:44:12 UTC; 6s ago Process: 14676 ExecStart=/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=gce --v="${KUBELET_LOG_LEVEL}" (code=exited, status=255) Process: 14673 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS) Process: 14671 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS) Main PID: 14676 (code=exited, status=255) Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-cipher-suites strings Comma-separat...CDSA_WITH_A Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-min-version string Minimum TLS version supp... Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-private-key-file string File containing x509 pri... Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --topology-manager-policy string Topology Manager policy ... Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: -v, --v Level number fo...l verbosity Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --version version[=true] Print version...on and quit Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --vmodule moduleSpec comma-separat...red logging Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --volume-plugin-dir string The full path...ume/exec/") Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --volume-stats-agg-period duration Specifies interval for k... Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: F0204 07:44:12.380198 14676 server.go:156] invalid argument "\"3\"" for "-v, --v" flag: strconv.ParseInt: parsing "\"3\...alid syntax Hint: Some lines were ellipsized, use -l to show in full. [root@gpei1-02032223-w-a-l-rhel-0 ~]# systemctl cat kubelet # /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Wants=rpc-statd.service network-online.target crio.service After=network-online.target crio.service [Service] Type=notify ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state Environment="KUBELET_LOG_LEVEL=3" EnvironmentFile=/etc/os-release EnvironmentFile=-/etc/kubernetes/kubelet-workaround EnvironmentFile=-/etc/kubernetes/kubelet-env ExecStart=/usr/bin/hyperkube \ kubelet \ --config=/etc/kubernetes/kubelet.conf \ --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \ --container-runtime=remote \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \ --minimum-container-ttl-duration=6m0s \ --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ --cloud-provider=gce \ \ --v="${KUBELET_LOG_LEVEL}" The --v="3" flag couldn't be recognized by the kubelet service. Version-Release number of the following components: openshift-hyperkube-4.4.0-202001311131.git.0.8c9e9b7.el7.x86_64 Cluster version 4.4.0-0.nightly-2020-02-03-043955 How reproducible: Always Steps to Reproduce: 1. Actual results: Expected results: Additional info: