Bug 1797908 - kubelet service failed to start on RHEL worker
Summary: kubelet service failed to start on RHEL worker
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.0
Assignee: Antonio Murdaca
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-04 08:10 UTC by Gaoyun Pei
Modified: 2020-05-04 11:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 11:33:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1439 0 None closed Bug 1797908: kubelet: Remove quotes from log level argument 2021-02-16 16:29:16 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:33:36 UTC

Description Gaoyun Pei 2020-02-04 08:10:15 UTC
Description of problem:
Add RHEL worker to a running 4.4 cluster, playbook failed at TASK [openshift_node : Approve node-bootstrapper CSR] 

TASK [openshift_node : Approve node-bootstrapper CSR] **************************
Tuesday 04 February 2020  13:43:02 +0800 (0:00:35.343)       0:09:46.254 ****** 
FAILED - RETRYING: Approve node-bootstrapper CSR (6 retries left).
...
FAILED - RETRYING: Approve node-bootstrapper CSR (2 retries left).
FAILED - RETRYING: Approve node-bootstrapper CSR (1 retries left).
failed: [10.0.32.5 -> localhost] (item=10.0.32.5) => {"ansible_loop_var": "item", "attempts": 6, "changed": true, "cmd": "count=0; for csr in `oc --config=/tmp/installer-_NKK7v/auth/kubeconfig get csr --no-headers  | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/tmp/installer-_NKK7v/auth/kubeconfig describe csr/$csr    | grep \" system:node:gpei1-02032223-w-a-l-rhel-0$\";\n  if [ $? -eq 0 ];\n  then\n    oc --config=/tmp/installer-_NKK7v/auth/kubeconfig adm certificate approve ${csr};\n    if [ $? -eq 0 ];\n    then\n      count=$((count+1));\n    fi;\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:00.420416", "end": "2020-02-04 13:43:36.570825", "item": "10.0.32.5", "msg": "non-zero return code", "rc": 1, "start": "2020-02-04 13:43:36.150409", "stderr": "No resources found in default namespace.", "stderr_lines": ["No resources found in default namespace."], "stdout": "", "stdout_lines": []}


Check the kubelet service on RHEL worker

[root@gpei1-02032223-w-a-l-rhel-0 cloud-user]# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-default-env.conf
   Active: activating (auto-restart) (Result: exit-code) since Tue 2020-02-04 07:44:12 UTC; 6s ago
  Process: 14676 ExecStart=/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=gce --v="${KUBELET_LOG_LEVEL}" (code=exited, status=255)
  Process: 14673 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
  Process: 14671 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 14676 (code=exited, status=255)

Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-cipher-suites strings                                                                                 Comma-separat...CDSA_WITH_A
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-min-version string                                                                                    Minimum TLS version supp...
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --tls-private-key-file string                                                                               File containing x509 pri...
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --topology-manager-policy string                                                                            Topology Manager policy ...
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: -v, --v Level                                                                                                   number fo...l verbosity
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --version version[=true]                                                                                    Print version...on and quit
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --vmodule moduleSpec                                                                                        comma-separat...red logging
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --volume-plugin-dir string                                                                                  The full path...ume/exec/")
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: --volume-stats-agg-period duration                                                                          Specifies interval for k...
Feb 04 07:44:12 gpei1-02032223-w-a-l-rhel-0 hyperkube[14676]: F0204 07:44:12.380198   14676 server.go:156] invalid argument "\"3\"" for "-v, --v" flag: strconv.ParseInt: parsing "\"3\...alid syntax
Hint: Some lines were ellipsized, use -l to show in full.


[root@gpei1-02032223-w-a-l-rhel-0 ~]# systemctl cat kubelet
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
Environment="KUBELET_LOG_LEVEL=3"
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \
      --minimum-container-ttl-duration=6m0s \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
      --cloud-provider=gce \
       \
      --v="${KUBELET_LOG_LEVEL}"


The --v="3" flag couldn't be recognized by the kubelet service.


Version-Release number of the following components:
openshift-hyperkube-4.4.0-202001311131.git.0.8c9e9b7.el7.x86_64
Cluster version 4.4.0-0.nightly-2020-02-03-043955


How reproducible:
Always

Steps to Reproduce:
1.

Actual results:

Expected results:

Additional info:

Comment 1 Russell Teague 2020-02-04 18:34:29 UTC
--v="${KUBELET_LOG_LEVEL}"

Quotes around KUBELET_LOG_LEVEL seem to be crashing this.  Removing quotes allowed the kubelet to start up.

Introduced here, https://github.com/openshift/machine-config-operator/pull/1390

Moving to MCO

Comment 3 Gaoyun Pei 2020-02-07 08:59:03 UTC
Verify this bug on payload 4.4.0-0.nightly-2020-02-07-012035.

After launching a cluster using payload 4.4.0-0.nightly-2020-02-07-012035, add RHEL worker to the cluster.
RHEL worker could be added, kubelet service is running well.

[root@gpei-4-kx6l2-w-a-l-rhel-0 ~]# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-default-env.conf
   Active: active (running) since Fri 2020-02-07 08:47:49 UTC; 1min 59s ago
 Main PID: 1928 (kubelet)
   Memory: 65.8M
   CGroup: /system.slice/kubelet.service
           └─1928 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-...

Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593479    1928 helpers.go:781] eviction manager: observations: signal=imagefs.available, available: 55124216K...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593524    1928 helpers.go:781] eviction manager: observations: signal=imagefs.inodesFree, available: 31368371...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593537    1928 helpers.go:781] eviction manager: observations: signal=pid.available, available: 4193911, capa...7.405779327
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593548    1928 helpers.go:781] eviction manager: observations: signal=memory.available, available: 14378836Ki...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593557    1928 helpers.go:781] eviction manager: observations: signal=allocatableMemory.available, available:...7.406378244
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593566    1928 helpers.go:781] eviction manager: observations: signal=nodefs.available, available: 55124216Ki...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593575    1928 helpers.go:781] eviction manager: observations: signal=nodefs.inodesFree, available: 31368371,...7.390078863
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.593636    1928 eviction_manager.go:320] eviction manager: no resources are starved
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.777180    1928 prober.go:129] Liveness probe for "ovs-4wzcz_openshift-sdn(1d3517e7-a06b-4962-98b0-b9672bf5ff2..." succeeded
Feb 07 08:49:46 gpei-4-kx6l2-w-a-l-rhel-0 hyperkube[1928]: I0207 08:49:46.826080    1928 prober.go:129] Readiness probe for "ovs-4wzcz_openshift-sdn(1d3517e7-a06b-4962-98b0-b9672bf5ff..." succeeded
Hint: Some lines were ellipsized, use -l to show in full.

[root@gpei-4-kx6l2-w-a-l-rhel-0 ~]# systemctl cat  kubelet
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
Environment="KUBELET_LOG_LEVEL=3"
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \
      --minimum-container-ttl-duration=6m0s \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
      --cloud-provider=gce \
       \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/kubelet.service.d/10-default-env.conf

Comment 5 errata-xmlrpc 2020-05-04 11:33:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.