Bug 1938192 - machine-config-daemon pod in CrashLoopBackOff state
Summary: machine-config-daemon pod in CrashLoopBackOff state
Keywords:
Status: CLOSED DUPLICATE of bug 1933772
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: ppc64le
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Yu Qi Zhang
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-12 12:32 UTC by Tania Kapoor
Modified: 2021-03-12 16:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-12 16:02:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Tania Kapoor 2021-03-12 12:32:25 UTC
Description of problem:

machine-config-daemon pod in CrashLoopBackOff state

Version-Release number of selected component:
4.8.0-0.nightly-ppc64le-2021-03-11-155720

[root@tania-ovn-bastion ~]# oc get pods -A | grep -v "Running\|Completed"
NAMESPACE                                          NAME                                                      READY   STATUS             RESTARTS   AGE
openshift-machine-config-operator                  machine-config-daemon-mktx5                               1/2     CrashLoopBackOff   17         63m


[root@tania-ovn-bastion ~]# oc get nodes
NAME       STATUS   ROLES    AGE     VERSION
master-0   Ready    master   3h16m   v1.20.0+69d7e87
master-1   Ready    master   3h16m   v1.20.0+69d7e87
master-2   Ready    master   3h16m   v1.20.0+69d7e87
worker-0   Ready    worker   3h      v1.20.0+69d7e87
worker-1   Ready    worker   3h1m    v1.20.0+69d7e87

[root@tania-ovn-bastion ~]# oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
OVNKubernetes


[root@tania-ovn-bastion ~]# oc describe pod machine-config-daemon-w7tjh -n openshift-machine-config-operator
Name:                 machine-config-daemon-w7tjh
Namespace:            openshift-machine-config-operator
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 worker-1/9.114.99.74
Start Time:           Fri, 12 Mar 2021 06:47:00 -0500
Labels:               controller-revision-hash=555648969d
                      k8s-app=machine-config-daemon
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   9.114.99.74
IPs:
  IP:           9.114.99.74
Controlled By:  DaemonSet/machine-config-daemon
Containers:
  machine-config-daemon:
    Container ID:  cri-o://0b610466f8cb5cba88b2dea140e473b1d79a5098ea14e5e7bb66b9b6c1762e00
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2db73c40bf8c77de51172f79e656143be7f8361d357be13b7958141870c164fc
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2db73c40bf8c77de51172f79e656143be7f8361d357be13b7958141870c164fc
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/bin/machine-config-daemon
    Args:
      start
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   480725 update.go:1905] Starting to manage node: worker-1
I0312 11:57:53.310503  480725 rpm-ostree.go:258] Running captured: rpm-ostree status
I0312 11:57:53.313195  480725 daemon.go:668] Detected a new login session: New session 1 of user core.
I0312 11:57:53.313240  480725 daemon.go:669] Login access is discouraged! Applying annotation: machineconfiguration.openshift.io/ssh
I0312 11:57:53.362342  480725 daemon.go:850] State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3ef6bba62d4720a42d0173d350fed1db6f1c36590f4cff16918dc2a4fb1430a5
              CustomOrigin: Managed by machine-config-operator
                   Version: 48.83.202102231119-0 (2021-02-23T11:24:59Z)

  ostree://3d3c288b8245c9b2af925f13f7dce13cdd9dacfd8434f5c291858bcdc8230f94
                   Version: 48.83.202102100223-0 (2021-02-10T02:29:03Z)
I0312 11:57:53.362358  480725 rpm-ostree.go:258] Running captured: journalctl --list-boots
I0312 11:57:53.368371  480725 daemon.go:857] journalctl --list-boots:
-1 5a82a8e7581b4c9c875089c9259a22c0 Fri 2021-03-12 08:52:52 UTC—Fri 2021-03-12 08:56:48 UTC
 0 2559135d79134d109c57efc03683df9b Fri 2021-03-12 09:00:54 UTC—Fri 2021-03-12 11:57:53 UTC
I0312 11:57:53.368418  480725 rpm-ostree.go:258] Running captured: systemctl list-units --state=failed --no-legend
I0312 11:57:53.376319  480725 daemon.go:872] systemd service state: OK
I0312 11:57:53.376364  480725 daemon.go:606] Starting MachineConfigDaemon
I0312 11:57:53.376372  480725 daemon.go:576] Guarding against sigterm signal
I0312 11:57:53.376393  480725 daemon.go:613] Enabling Kubelet Healthz Monitor
W0312 11:57:53.376452  480725 daemon.go:634] Got an error from auxiliary tools: error: cannot apply annotation for SSH access due to: unable to update node "nil": node "worker-1" not found
I0312 11:57:53.376486  480725 daemon.go:635] Shutting down MachineConfigDaemon
F0312 11:57:53.376557  480725 helpers.go:147] error: cannot apply annotation for SSH access due to: unable to update node "nil": node "worker-1" not found

      Exit Code:    255
      Started:      Fri, 12 Mar 2021 06:57:53 -0500
      Finished:     Fri, 12 Mar 2021 06:57:53 -0500
    Ready:          False
    Restart Count:  7
    Requests:
      cpu:     20m
      memory:  50Mi
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /rootfs from rootfs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from machine-config-daemon-token-7d6fp (ro)
  oauth-proxy:
    Container ID:  cri-o://86f7812a6d46baf8a67ff761706d347bf4c41037d2d3d70b97498e05f6ab475b
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419af1383fdc27ee1495813a89721abe68f63b0ab1e932a64937336d60f78054
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419af1383fdc27ee1495813a89721abe68f63b0ab1e932a64937336d60f78054
    Port:          9001/TCP
    Host Port:     9001/TCP
    Args:
      --https-address=:9001
      --provider=openshift
      --openshift-service-account=machine-config-daemon
      --upstream=http://127.0.0.1:8797
      --tls-cert=/etc/tls/private/tls.crt
      --tls-key=/etc/tls/private/tls.key
      --cookie-secret-file=/etc/tls/cookie-secret/cookie-secret
      --openshift-sar={"resource": "namespaces", "verb": "get"}
      --openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}
    State:          Running
      Started:      Fri, 12 Mar 2021 06:47:02 -0500
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        20m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /etc/tls/cookie-secret from cookie-secret (rw)
      /etc/tls/private from proxy-tls (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from machine-config-daemon-token-7d6fp (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  rootfs:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  
  proxy-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  proxy-tls
    Optional:    false
  cookie-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cookie-secret
    Optional:    false
  machine-config-daemon-token-7d6fp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  machine-config-daemon-token-7d6fp
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     op=Exists
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  14m                   default-scheduler  Successfully assigned openshift-machine-config-operator/machine-config-daemon-w7tjh to worker-1
  Normal   Pulled     14m                   kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419af1383fdc27ee1495813a89721abe68f63b0ab1e932a64937336d60f78054" already present on machine
  Normal   Created    14m                   kubelet            Created container oauth-proxy
  Normal   Started    14m                   kubelet            Started container oauth-proxy
  Normal   Started    13m (x4 over 14m)     kubelet            Started container machine-config-daemon
  Normal   Pulled     12m (x5 over 14m)     kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2db73c40bf8c77de51172f79e656143be7f8361d357be13b7958141870c164fc" already present on machine
  Normal   Created    12m (x5 over 14m)     kubelet            Created container machine-config-daemon
  Warning  BackOff    3m59s (x49 over 14m)  kubelet            Back-off restarting failed container


[root@tania-ovn-bastion ~]# oc logs pod/machine-config-daemon-mktx5 -n openshift-machine-config-operator -c machine-config-daemon
Error from server (NotFound): pods "machine-config-daemon-mktx5" not found
[root@tania-ovn-bastion ~]# oc logs pod/machine-config-daemon-w7tjh -n openshift-machine-config-operator -c machine-config-daemon
I0312 11:57:53.184903  480725 start.go:108] Version: v4.8.0-202103091116.p0-dirty (82868e63176fee2bc806c1deb308ed1fc8965d84)
I0312 11:57:53.188144  480725 start.go:121] Calling chroot("/rootfs")
I0312 11:57:53.188267  480725 rpm-ostree.go:258] Running captured: rpm-ostree status --json
I0312 11:57:53.292508  480725 daemon.go:218] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3ef6bba62d4720a42d0173d350fed1db6f1c36590f4cff16918dc2a4fb1430a5 (48.83.202102231119-0)
I0312 11:57:53.298758  480725 start.go:97] Copied self to /run/bin/machine-config-daemon on host
I0312 11:57:53.302707  480725 metrics.go:105] Registering Prometheus metrics
I0312 11:57:53.302848  480725 metrics.go:110] Starting metrics listener on 127.0.0.1:8797
I0312 11:57:53.305408  480725 update.go:1905] Starting to manage node: worker-1
I0312 11:57:53.310503  480725 rpm-ostree.go:258] Running captured: rpm-ostree status
I0312 11:57:53.313195  480725 daemon.go:668] Detected a new login session: New session 1 of user core.
I0312 11:57:53.313240  480725 daemon.go:669] Login access is discouraged! Applying annotation: machineconfiguration.openshift.io/ssh
I0312 11:57:53.362342  480725 daemon.go:850] State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3ef6bba62d4720a42d0173d350fed1db6f1c36590f4cff16918dc2a4fb1430a5
              CustomOrigin: Managed by machine-config-operator
                   Version: 48.83.202102231119-0 (2021-02-23T11:24:59Z)

  ostree://3d3c288b8245c9b2af925f13f7dce13cdd9dacfd8434f5c291858bcdc8230f94
                   Version: 48.83.202102100223-0 (2021-02-10T02:29:03Z)
I0312 11:57:53.362358  480725 rpm-ostree.go:258] Running captured: journalctl --list-boots
I0312 11:57:53.368371  480725 daemon.go:857] journalctl --list-boots:
-1 5a82a8e7581b4c9c875089c9259a22c0 Fri 2021-03-12 08:52:52 UTC—Fri 2021-03-12 08:56:48 UTC
 0 2559135d79134d109c57efc03683df9b Fri 2021-03-12 09:00:54 UTC—Fri 2021-03-12 11:57:53 UTC
I0312 11:57:53.368418  480725 rpm-ostree.go:258] Running captured: systemctl list-units --state=failed --no-legend
I0312 11:57:53.376319  480725 daemon.go:872] systemd service state: OK
I0312 11:57:53.376364  480725 daemon.go:606] Starting MachineConfigDaemon
I0312 11:57:53.376372  480725 daemon.go:576] Guarding against sigterm signal
I0312 11:57:53.376393  480725 daemon.go:613] Enabling Kubelet Healthz Monitor
W0312 11:57:53.376452  480725 daemon.go:634] Got an error from auxiliary tools: error: cannot apply annotation for SSH access due to: unable to update node "nil": node "worker-1" not found
I0312 11:57:53.376486  480725 daemon.go:635] Shutting down MachineConfigDaemon
F0312 11:57:53.376557  480725 helpers.go:147] error: cannot apply annotation for SSH access due to: unable to update node "nil": node "worker-1" not found

[root@tania-ovn-bastion ~]# oc get co
NAME                                       VERSION                                     AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      167m
baremetal                                  4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h12m
cloud-credential                           4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h20m
cluster-autoscaler                         4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h11m
config-operator                            4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h12m
console                                    4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      175m
csi-snapshot-controller                    4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h11m
dns                                        4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h11m
etcd                                       4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h11m
image-registry                             4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      163m
ingress                                    4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h
insights                                   4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h5m
kube-apiserver                             4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h7m
kube-controller-manager                    4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h10m
kube-scheduler                             4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h10m
kube-storage-version-migrator              4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h
machine-api                                4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h12m
machine-approver                           4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h11m
machine-config                             4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      82s
marketplace                                4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h10m
monitoring                                 4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      178m
network                                    4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h12m
node-tuning                                4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h12m
openshift-apiserver                        4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h7m
openshift-controller-manager               4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h9m
openshift-samples                          4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h7m
operator-lifecycle-manager                 4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h11m
operator-lifecycle-manager-catalog         4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h11m
operator-lifecycle-manager-packageserver   4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h8m
service-ca                                 4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h12m
storage                                    4.8.0-0.nightly-ppc64le-2021-03-11-155720   True        False         False      3h12m

Comment 1 Dan Li 2021-03-12 13:29:02 UTC
Hi MCO team, we found the below bug

https://bugzilla.redhat.com/show_bug.cgi?id=1933772

Could that be related to this bug?

Comment 2 Yu Qi Zhang 2021-03-12 16:02:31 UTC
Hi,

Yes https://bugzilla.redhat.com/show_bug.cgi?id=1933772 is the same error. Marking duplicate.

*** This bug has been marked as a duplicate of bug 1933772 ***


Note You need to log in before you can comment on or make changes to this bug.