Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1946713

Summary: SNO: upgrade gets stuck waiting on machine-config: error: cannot apply annotation for SSH access due to: unable to update node "nil": node <FQDN> not found
Product: OpenShift Container Platform Reporter: Alexander Chuzhoy <sasha>
Component: Machine Config OperatorAssignee: Yu Qi Zhang <jerzhang>
Status: CLOSED DUPLICATE QA Contact: Michael Nguyen <mnguyen>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.8CC: omichael
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-06 17:55:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2021-04-06 17:44:56 UTC
SNO:  upgrade gets stuck waiting on machine-config: error: cannot apply annotation for SSH access due to: unable to update node "nil": node <FQDN> not found


Version:
4.8.0-0.nightly-2021-04-01-072432
Attempted to upgrade to 4.8.0-0.nightly-2021-04-03-044912


Result:
The upgrade gets stuck at 84%:



[kni@r640-u01 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-01-072432   True        True          91m     Working towards 4.8.0-0.nightly-2021-04-03-044912: 567 of 675 done (84% complete)
[kni@r640-u01 ~]$ 





[kni@r640-u01 ~]$ oc get pod -A|grep -v Run|grep -v Comple;
NAMESPACE                                          NAME                                                                         READY   STATUS             RESTARTS   AGE
openshift-machine-config-operator                  machine-config-daemon-n286c                                                  1/2     CrashLoopBackOff   18         67m






[kni@r640-u01 ~]$ oc logs -n openshift-machine-config-operator                  machine-config-daemon-n286c  -c machine-config-daemon
I0406 17:41:21.488360  544460 start.go:108] Version: v4.8.0-202104030047.p0-dirty (86270f3375f894ff1dc21eee74247f04790dd0e1)
I0406 17:41:21.490593  544460 start.go:121] Calling chroot("/rootfs")
I0406 17:41:21.490652  544460 rpm-ostree.go:258] Running captured: rpm-ostree status --json
I0406 17:41:21.552522  544460 daemon.go:219] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0581b43ed5c3db21620c09711e102acf7837bacb99e22cd4dce1fc9c3800ec4c (48.83.202104010252-0)
I0406 17:41:21.577196  544460 start.go:97] Copied self to /run/bin/machine-config-daemon on host
I0406 17:41:21.578921  544460 metrics.go:105] Registering Prometheus metrics
I0406 17:41:21.579608  544460 metrics.go:110] Starting metrics listener on 127.0.0.1:8797
I0406 17:41:21.580261  544460 update.go:1851] Starting to manage node: openshift-master-0.qe1.kni.lab.eng.bos.redhat.com
I0406 17:41:21.583051  544460 rpm-ostree.go:258] Running captured: rpm-ostree status
I0406 17:41:21.585604  544460 daemon.go:669] Detected a new login session: New session 1 of user core.
I0406 17:41:21.585614  544460 daemon.go:670] Login access is discouraged! Applying annotation: machineconfiguration.openshift.io/ssh
I0406 17:41:21.612682  544460 daemon.go:851] State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0581b43ed5c3db21620c09711e102acf7837bacb99e22cd4dce1fc9c3800ec4c
              CustomOrigin: Managed by machine-config-operator
                   Version: 48.83.202104010252-0 (2021-04-01T02:55:23Z)

  ostree://646a9832dd0dc9fe174a2fc005863a9582186518a5476522a0e9bdccc0e5252a
                   Version: 47.83.202102090044-0 (2021-02-09T00:47:36Z)
I0406 17:41:21.612703  544460 rpm-ostree.go:258] Running captured: journalctl --list-boots
I0406 17:41:21.618321  544460 daemon.go:858] journalctl --list-boots:
-1 be1322d40e1d4a599ad3b51ccd383f91 Sat 2021-04-03 16:27:18 UTC—Sat 2021-04-03 16:28:51 UTC
 0 ad6204a549f94f7da2a60ab4bff96f36 Sat 2021-04-03 16:30:11 UTC—Tue 2021-04-06 17:41:21 UTC
I0406 17:41:21.618334  544460 rpm-ostree.go:258] Running captured: systemctl list-units --state=failed --no-legend
I0406 17:41:21.623970  544460 daemon.go:871] systemctl --failed:
NetworkManager-wait-online.service loaded failed failed Network Manager Wait Online
I0406 17:41:21.623980  544460 daemon.go:607] Starting MachineConfigDaemon
I0406 17:41:21.623986  544460 daemon.go:577] Guarding against sigterm signal
I0406 17:41:21.623996  544460 daemon.go:614] Enabling Kubelet Healthz Monitor
W0406 17:41:21.624004  544460 daemon.go:635] Got an error from auxiliary tools: error: cannot apply annotation for SSH access due to: unable to update node "nil": node "openshift-master-0.qe1.kni.lab.eng.bos.redhat.com" not found
I0406 17:41:21.624013  544460 daemon.go:636] Shutting down MachineConfigDaemon
F0406 17:41:21.624051  544460 helpers.go:147] error: cannot apply annotation for SSH access due to: unable to update node "nil": node "openshift-master-0.qe1.kni.lab.eng.bos.redhat.com" not found
[kni@r640-u01 ~]$ 










[kni@r640-u01 ~]$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-81fbf9897a1ee0bf6cce5faf66c9f24f   False     True       False      1              0                   0                     0                      3d1h
worker   rendered-worker-f91909329df6dca62a07551ac0a530f3   True      False      False      0              0                   0                     0                      3d1h









[kni@r640-u01 ~]$ oc describe mcp master
Name:         master
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
              operator.machineconfiguration.openshift.io/required-for-upgrade=
              pools.operator.machineconfiguration.openshift.io/master=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2021-04-03T16:36:51Z
  Generation:          5
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:machineconfiguration.openshift.io/mco-built-in:
          f:operator.machineconfiguration.openshift.io/required-for-upgrade:
          f:pools.operator.machineconfiguration.openshift.io/master:
      f:spec:
        .:
        f:configuration:
        f:machineConfigSelector:
          .:
          f:matchLabels:
            .:
            f:machineconfiguration.openshift.io/role:
        f:nodeSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/master:
        f:paused:
    Manager:      machine-config-operator
    Operation:    Update
    Time:         2021-04-03T16:36:51Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:configuration:
          f:name:
          f:source:
      f:status:
        .:
        f:conditions:
        f:configuration:
          .:
          f:name:
          f:source:
        f:degradedMachineCount:
        f:machineCount:
        f:observedGeneration:
        f:readyMachineCount:
        f:unavailableMachineCount:
        f:updatedMachineCount:
    Manager:         machine-config-controller
    Operation:       Update
    Time:            2021-04-03T16:40:46Z
  Resource Version:  995781
  UID:               4e4c15c5-bedb-4012-ad26-0dfc0357b742
Spec:
  Configuration:
    Name:  rendered-master-064c2e13dbbd9d85f2614bb0979c31c4
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-master
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-master-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-master-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         10-hostname
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         10-static-ips
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-generated-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  master
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/master:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2021-04-03T16:40:46Z
    Message:               
    Reason:                
    Status:                False
    Type:                  NodeDegraded
    Last Transition Time:  2021-04-03T16:40:46Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2021-04-03T16:40:47Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2021-04-06T16:36:42Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2021-04-06T16:36:42Z
    Message:               All nodes are updating to rendered-master-064c2e13dbbd9d85f2614bb0979c31c4
    Reason:                
    Status:                True
    Type:                  Updating
  Configuration:
    Name:  rendered-master-81fbf9897a1ee0bf6cce5faf66c9f24f
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-master
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-master-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-master-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   10-hostname
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   10-static-ips
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-master-generated-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-master-ssh
  Degraded Machine Count:     0
  Machine Count:              1
  Observed Generation:        5
  Ready Machine Count:        0
  Unavailable Machine Count:  1
  Updated Machine Count:      0
Events:
  Type    Reason            Age   From                                    Message
  ----    ------            ----  ----                                    -------
  Normal  SetDesiredConfig  67m   machineconfigcontroller-nodecontroller  Targeted node openshift-master-0.qe1.kni.lab.eng.bos.redhat.com to config rendered-master-064c2e13dbbd9d85f2614bb0979c31c4
  Normal  AnnotationChange  67m   machineconfigcontroller-nodecontroller  Node openshift-master-0.qe1.kni.lab.eng.bos.redhat.com now has machineconfiguration.openshift.io/desiredConfig=rendered-master-064c2e13dbbd9d85f2614bb0979c31c4
[kni@r640-u01 ~]$

Comment 1 Alexander Chuzhoy 2021-04-06 17:55:10 UTC

*** This bug has been marked as a duplicate of bug 1933772 ***

Comment 2 Alexander Chuzhoy 2021-04-06 22:34:02 UTC
Workaround - successfully tested:

ssh to SNO and run:
 journalctl --flush
 rm -rf /var/log/journal/*
 systemctl restart systemd-journald

Then restart the affected pods (in crashloopbackoff) and make sure they go up correctly.