Description of problem: Add RHEL worker to a running 4.4 cluster, playbook failed at TASK [openshift_node : Approve node-bootstrapper CSR] TASK [openshift_node : Approve node-bootstrapper CSR] ************************** Tuesday 04 February 2020 13:43:02 +0800 (0:00:35.343) 0:09:46.254 ****** FAILED - RETRYING: Approve node-bootstrapper CSR (6 retries left). ... FAILED - RETRYING: Approve node-bootstrapper CSR (2 retries left). FAILED - RETRYING: Approve node-bootstrapper CSR (1 retries left). failed: [10.0.32.5 -> localhost] (item=10.0.32.5) => {"ansible_loop_var": "item", "attempts": 6, "changed": true, "cmd": "count=0; for csr in `oc --config=/tmp/installer-_NKK7v/auth/kubeconfig get csr --no-headers | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \" | cut -d \" \" -f1`;\ndo\n oc --config=/tmp/installer-_NKK7v/auth/kubeconfig describe csr/$csr | grep \" system:node:gpei1-02032223-w-a-l-rhel-0$\";\n if [ $? -eq 0 ];\n then\n oc --config=/tmp/installer-_NKK7v/auth/kubeconfig adm certificate approve ${csr};\n if [ $? -eq 0 ];\n then\n count=$((count+1));\n fi;\n fi;\ndone; exit $((!count));\n", "delta": "0:00:00.420416", "end": "2020-02-04 13:43:36.570825", "item": "10.0.32.5", "msg": "non-zero return code", "rc": 1, "start": "2020-02-04 13:43:36.150409", "stderr": "No resources found in default namespace.", "stderr_lines": ["No resources found in default namespace."], "stdout": "", "stdout_lines": []} Check the kubelet service on RHEL worker [root@gpei1-02032223-w-a-l-rhel-0 cloud-user]# systemctl status kubelet ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-default-env.conf Active: activating (auto-restart) (Result: exit-code) since Tue 2020-02-04 07:44:12 UTC; 6s ago Process: 14676 ExecStart=/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=gce --v="${KUBELET_LOG_LEVEL}" (code=exited, status=255) Process: 14673 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS) Process: 14671 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS) Main PID: 14676 (code=exited, status=255) Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-cipher-suites strings Comma-separat...CDSA_WITH_A Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-min-version string Minimum TLS version supp... Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-private-key-file string File containing x509 pri... Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --topology-manager-policy string Topology Manager policy ... Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: -v, --v Level number fo...l verbosity Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --version version[=true] Print version...on and quit Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --vmodule moduleSpec comma-separat...red logging Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --volume-plugin-dir string The full path...ume/exec/") Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --volume-stats-agg-period duration Specifies interval for k... Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: F0204 07:44:12.380198 14676 server.go:156] invalid argument "\"3\"" for "-v, --v" flag: strconv.ParseInt: parsing "\"3\...alid syntax Hint: Some lines were ellipsized, use -l to show in full. [root@gpei1-02032223-w-a-l-rhel-0 ~]# systemctl cat kubelet # /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Wants=rpc-statd.service network-online.target crio.service After=network-online.target crio.service [Service] Type=notify ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state Environment="KUBELET_LOG_LEVEL=3" EnvironmentFile=/etc/os-release EnvironmentFile=-/etc/kubernetes/kubelet-workaround EnvironmentFile=-/etc/kubernetes/kubelet-env ExecStart=/usr/bin/hyperkube \ kubelet \ --config=/etc/kubernetes/kubelet.conf \ --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \ --container-runtime=remote \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \ --minimum-container-ttl-duration=6m0s \ --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ --cloud-provider=gce \ \ --v="${KUBELET_LOG_LEVEL}" The --v="3" flag couldn't be recognized by the kubelet service. Version-Release number of the following components: openshift-hyperkube-4.4.0-202001311131.git.0.8c9e9b7.el7.x86_64 Cluster version 4.4.0-0.nightly-2020-02-03-043955 How reproducible: Always Steps to Reproduce: 1. Actual results: Expected results: Additional info:
--v="${KUBELET_LOG_LEVEL}" Quotes around KUBELET_LOG_LEVEL seem to be crashing this. Removing quotes allowed the kubelet to start up. Introduced here, https://github.com/openshift/machine-config-operator/pull/1390 Moving to MCO
Verify this bug on payload 4.4.0-0.nightly-2020-02-07-012035. After launching a cluster using payload 4.4.0-0.nightly-2020-02-07-012035, add RHEL worker to the cluster. RHEL worker could be added, kubelet service is running well. [root@gpei-4-kx6l2-w-a-l-rhel-0 ~]# systemctl status kubelet ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-default-env.conf Active: active (running) since Fri 2020-02-07 08:47:49 UTC; 1min 59s ago Main PID: 1928 (kubelet) Memory: 65.8M CGroup: /system.slice/kubelet.service └─1928 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-... Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593479 1928 helpers.go:781] eviction manager: observations: signal=imagefs.available, available: 55124216K...7.390078863 Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593524 1928 helpers.go:781] eviction manager: observations: signal=imagefs.inodesFree, available: 31368371...7.390078863 Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593537 1928 helpers.go:781] eviction manager: observations: signal=pid.available, available: 4193911, capa...7.405779327 Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593548 1928 helpers.go:781] eviction manager: observations: signal=memory.available, available: 14378836Ki...7.390078863 Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593557 1928 helpers.go:781] eviction manager: observations: signal=allocatableMemory.available, available:...7.406378244 Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593566 1928 helpers.go:781] eviction manager: observations: signal=nodefs.available, available: 55124216Ki...7.390078863 Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593575 1928 helpers.go:781] eviction manager: observations: signal=nodefs.inodesFree, available: 31368371,...7.390078863 Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593636 1928 eviction_manager.go:320] eviction manager: no resources are starved Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.777180 1928 prober.go:129] Liveness probe for "ovs-4wzcz_openshift-sdn(1d3517e7-a06b-4962-98b0-b9672bf5ff2..." succeeded Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.826080 1928 prober.go:129] Readiness probe for "ovs-4wzcz_openshift-sdn(1d3517e7-a06b-4962-98b0-b9672bf5ff..." succeeded Hint: Some lines were ellipsized, use -l to show in full. [root@gpei-4-kx6l2-w-a-l-rhel-0 ~]# systemctl cat kubelet # /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Wants=rpc-statd.service network-online.target crio.service After=network-online.target crio.service [Service] Type=notify ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state Environment="KUBELET_LOG_LEVEL=3" EnvironmentFile=/etc/os-release EnvironmentFile=-/etc/kubernetes/kubelet-workaround EnvironmentFile=-/etc/kubernetes/kubelet-env ExecStart=/usr/bin/hyperkube \ kubelet \ --config=/etc/kubernetes/kubelet.conf \ --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \ --container-runtime=remote \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \ --minimum-container-ttl-duration=6m0s \ --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ --cloud-provider=gce \ \ --v=${KUBELET_LOG_LEVEL} Restart=always RestartSec=10 [Install] WantedBy=multi-user.target # /etc/systemd/system/kubelet.service.d/10-default-env.conf
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581