Bug 1946713 - SNO: upgrade gets stuck waiting on machine-config: error: cannot apply annotation for SSH access due to: unable to update node "nil": node <FQDN> not found
Summary: SNO: upgrade gets stuck waiting on machine-config: error: cannot apply annot...
Keywords:
Status: CLOSED DUPLICATE of bug 1933772
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Yu Qi Zhang
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-06 17:44 UTC by Alexander Chuzhoy
Modified: 2021-04-06 22:34 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-06 17:55:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alexander Chuzhoy 2021-04-06 17:44:56 UTC
SNO:  upgrade gets stuck waiting on machine-config: error: cannot apply annotation for SSH access due to: unable to update node "nil": node <FQDN> not found


Version:
4.8.0-0.nightly-2021-04-01-072432
Attempted to upgrade to 4.8.0-0.nightly-2021-04-03-044912


Result:
The upgrade gets stuck at 84%:



[kni@r640-u01 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-01-072432   True        True          91m     Working towards 4.8.0-0.nightly-2021-04-03-044912: 567 of 675 done (84% complete)
[kni@r640-u01 ~]$ 





[kni@r640-u01 ~]$ oc get pod -A|grep -v Run|grep -v Comple;
NAMESPACE                                          NAME                                                                         READY   STATUS             RESTARTS   AGE
openshift-machine-config-operator                  machine-config-daemon-n286c                                                  1/2     CrashLoopBackOff   18         67m






[kni@r640-u01 ~]$ oc logs -n openshift-machine-config-operator                  machine-config-daemon-n286c  -c machine-config-daemon
I0406 17:41:21.488360  544460 start.go:108] Version: v4.8.0-202104030047.p0-dirty (86270f3375f894ff1dc21eee74247f04790dd0e1)
I0406 17:41:21.490593  544460 start.go:121] Calling chroot("/rootfs")
I0406 17:41:21.490652  544460 rpm-ostree.go:258] Running captured: rpm-ostree status --json
I0406 17:41:21.552522  544460 daemon.go:219] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0581b43ed5c3db21620c09711e102acf7837bacb99e22cd4dce1fc9c3800ec4c (48.83.202104010252-0)
I0406 17:41:21.577196  544460 start.go:97] Copied self to /run/bin/machine-config-daemon on host
I0406 17:41:21.578921  544460 metrics.go:105] Registering Prometheus metrics
I0406 17:41:21.579608  544460 metrics.go:110] Starting metrics listener on 127.0.0.1:8797
I0406 17:41:21.580261  544460 update.go:1851] Starting to manage node: openshift-master-0.qe1.kni.lab.eng.bos.redhat.com
I0406 17:41:21.583051  544460 rpm-ostree.go:258] Running captured: rpm-ostree status
I0406 17:41:21.585604  544460 daemon.go:669] Detected a new login session: New session 1 of user core.
I0406 17:41:21.585614  544460 daemon.go:670] Login access is discouraged! Applying annotation: machineconfiguration.openshift.io/ssh
I0406 17:41:21.612682  544460 daemon.go:851] State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0581b43ed5c3db21620c09711e102acf7837bacb99e22cd4dce1fc9c3800ec4c
              CustomOrigin: Managed by machine-config-operator
                   Version: 48.83.202104010252-0 (2021-04-01T02:55:23Z)

  ostree://646a9832dd0dc9fe174a2fc005863a9582186518a5476522a0e9bdccc0e5252a
                   Version: 47.83.202102090044-0 (2021-02-09T00:47:36Z)
I0406 17:41:21.612703  544460 rpm-ostree.go:258] Running captured: journalctl --list-boots
I0406 17:41:21.618321  544460 daemon.go:858] journalctl --list-boots:
-1 be1322d40e1d4a599ad3b51ccd383f91 Sat 2021-04-03 16:27:18 UTC—Sat 2021-04-03 16:28:51 UTC
 0 ad6204a549f94f7da2a60ab4bff96f36 Sat 2021-04-03 16:30:11 UTC—Tue 2021-04-06 17:41:21 UTC
I0406 17:41:21.618334  544460 rpm-ostree.go:258] Running captured: systemctl list-units --state=failed --no-legend
I0406 17:41:21.623970  544460 daemon.go:871] systemctl --failed:
NetworkManager-wait-online.service loaded failed failed Network Manager Wait Online
I0406 17:41:21.623980  544460 daemon.go:607] Starting MachineConfigDaemon
I0406 17:41:21.623986  544460 daemon.go:577] Guarding against sigterm signal
I0406 17:41:21.623996  544460 daemon.go:614] Enabling Kubelet Healthz Monitor
W0406 17:41:21.624004  544460 daemon.go:635] Got an error from auxiliary tools: error: cannot apply annotation for SSH access due to: unable to update node "nil": node "openshift-master-0.qe1.kni.lab.eng.bos.redhat.com" not found
I0406 17:41:21.624013  544460 daemon.go:636] Shutting down MachineConfigDaemon
F0406 17:41:21.624051  544460 helpers.go:147] error: cannot apply annotation for SSH access due to: unable to update node "nil": node "openshift-master-0.qe1.kni.lab.eng.bos.redhat.com" not found
[kni@r640-u01 ~]$ 










[kni@r640-u01 ~]$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-81fbf9897a1ee0bf6cce5faf66c9f24f   False     True       False      1              0                   0                     0                      3d1h
worker   rendered-worker-f91909329df6dca62a07551ac0a530f3   True      False      False      0              0                   0                     0                      3d1h









[kni@r640-u01 ~]$ oc describe mcp master
Name:         master
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
              operator.machineconfiguration.openshift.io/required-for-upgrade=
              pools.operator.machineconfiguration.openshift.io/master=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2021-04-03T16:36:51Z
  Generation:          5
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:machineconfiguration.openshift.io/mco-built-in:
          f:operator.machineconfiguration.openshift.io/required-for-upgrade:
          f:pools.operator.machineconfiguration.openshift.io/master:
      f:spec:
        .:
        f:configuration:
        f:machineConfigSelector:
          .:
          f:matchLabels:
            .:
            f:machineconfiguration.openshift.io/role:
        f:nodeSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/master:
        f:paused:
    Manager:      machine-config-operator
    Operation:    Update
    Time:         2021-04-03T16:36:51Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:configuration:
          f:name:
          f:source:
      f:status:
        .:
        f:conditions:
        f:configuration:
          .:
          f:name:
          f:source:
        f:degradedMachineCount:
        f:machineCount:
        f:observedGeneration:
        f:readyMachineCount:
        f:unavailableMachineCount:
        f:updatedMachineCount:
    Manager:         machine-config-controller
    Operation:       Update
    Time:            2021-04-03T16:40:46Z
  Resource Version:  995781
  UID:               4e4c15c5-bedb-4012-ad26-0dfc0357b742
Spec:
  Configuration:
    Name:  rendered-master-064c2e13dbbd9d85f2614bb0979c31c4
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-master
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-master-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-master-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         10-hostname
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         10-static-ips
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-generated-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  master
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/master:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2021-04-03T16:40:46Z
    Message:               
    Reason:                
    Status:                False
    Type:                  NodeDegraded
    Last Transition Time:  2021-04-03T16:40:46Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2021-04-03T16:40:47Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2021-04-06T16:36:42Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2021-04-06T16:36:42Z
    Message:               All nodes are updating to rendered-master-064c2e13dbbd9d85f2614bb0979c31c4
    Reason:                
    Status:                True
    Type:                  Updating
  Configuration:
    Name:  rendered-master-81fbf9897a1ee0bf6cce5faf66c9f24f
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-master
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-master-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-master-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   10-hostname
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   10-static-ips
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-master-generated-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-master-ssh
  Degraded Machine Count:     0
  Machine Count:              1
  Observed Generation:        5
  Ready Machine Count:        0
  Unavailable Machine Count:  1
  Updated Machine Count:      0
Events:
  Type    Reason            Age   From                                    Message
  ----    ------            ----  ----                                    -------
  Normal  SetDesiredConfig  67m   machineconfigcontroller-nodecontroller  Targeted node openshift-master-0.qe1.kni.lab.eng.bos.redhat.com to config rendered-master-064c2e13dbbd9d85f2614bb0979c31c4
  Normal  AnnotationChange  67m   machineconfigcontroller-nodecontroller  Node openshift-master-0.qe1.kni.lab.eng.bos.redhat.com now has machineconfiguration.openshift.io/desiredConfig=rendered-master-064c2e13dbbd9d85f2614bb0979c31c4
[kni@r640-u01 ~]$

Comment 1 Alexander Chuzhoy 2021-04-06 17:55:10 UTC

*** This bug has been marked as a duplicate of bug 1933772 ***

Comment 2 Alexander Chuzhoy 2021-04-06 22:34:02 UTC
Workaround - successfully tested:

ssh to SNO and run:
 journalctl --flush
 rm -rf /var/log/journal/*
 systemctl restart systemd-journald

Then restart the affected pods (in crashloopbackoff) and make sure they go up correctly.


Note You need to log in before you can comment on or make changes to this bug.