Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1847351

Summary: Upgrade hanging due to unexpected on-disk state validating against rendered...
Product: OpenShift Container Platform Reporter: huirwang
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED DUPLICATE QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: walters
Target Milestone: ---Keywords: TestBlocker
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-16 10:30:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huirwang 2020-06-16 09:18:59 UTC
Description:
Upgrade 4.4.8 to 4.5.0-0.nightly-2020-06-16-014907, upgrade stucks due to unexpected on-disk state validating against rendered...

Steps to Reproduce:
1. Install ocp 4.4.8 on baremetal 
2. Then upgrade to 4.5.0-0.nightly-2020-06-16-014907 with command:oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-16-014907 --force=true --allow-explicit-upgrade=true

Result:
Found two nodes stucks in SchedulingDisabled state.
 
oc get machineconfigpools.machineconfiguration.openshift.io 
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-6d45441c2be99b0d129ec345f6b6e114   False     True       True       3              0                   0                     1                      6h23m
worker   rendered-worker-e4e32d87b18decf05a89b1af84fe9075   False     True       True       3              0                   0                     1                      6h23m

oc describe machineconfigpools.machineconfiguration.openshift.io master
Name:         master
Namespace:    
Labels:       custom-kubelet=small-pods
              machineconfiguration.openshift.io/mco-built-in=
              operator.machineconfiguration.openshift.io/required-for-upgrade=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2020-06-16T02:36:16Z
  Generation:          4
  Resource Version:    178817
  Self Link:           /apis/machineconfiguration.openshift.io/v1/machineconfigpools/master
  UID:                 4f35005c-1297-4952-8444-95003516e5ca
Spec:
  Configuration:
    Name:  rendered-master-9a5283ce51b09cb5b76bc796e74e5940
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-master
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-master-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-master-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-4f35005c-1297-4952-8444-95003516e5ca-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-4f35005c-1297-4952-8444-95003516e5ca-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-fips
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-master-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  master
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/master:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2020-06-16T02:37:02Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2020-06-16T08:02:36Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2020-06-16T08:02:36Z
    Message:               All nodes are updating to rendered-master-9a5283ce51b09cb5b76bc796e74e5940
    Reason:                
    Status:                True
    Type:                  Updating
    Last Transition Time:  2020-06-16T08:09:30Z
    Message:               Node huir-upg-jlqd2-control-plane-0 is reporting: "unexpected on-disk state validating against rendered-master-513bb8d25aeba1f69d4ccf1708ce5a6e"
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2020-06-16T08:09:30Z
    Message:               
    Reason:                
    Status:                True
    Type:                  Degraded
  Configuration:
    Name:  rendered-master-6d45441c2be99b0d129ec345f6b6e114
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-master
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-master-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-master-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-master-4f35005c-1297-4952-8444-95003516e5ca-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-master-4f35005c-1297-4952-8444-95003516e5ca-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-master-fips
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-master-ssh
  Degraded Machine Count:     1
  Machine Count:              3
  Observed Generation:        4
  Ready Machine Count:        0
  Unavailable Machine Count:  1
  Updated Machine Count:      0
Events:                       <none>

 oc describe machineconfigpools.machineconfiguration.openshift.io worker
Name:         worker
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2020-06-16T02:36:16Z
  Generation:          3
  Resource Version:    177371
  Self Link:           /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker
  UID:                 690b3f0d-6b01-433d-89a1-76ecae732a0e
Spec:
  Configuration:
    Name:  rendered-worker-5b16814212becca0a0e2432af89fbeb5
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-690b3f0d-6b01-433d-89a1-76ecae732a0e-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-fips
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2020-06-16T02:37:02Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2020-06-16T08:02:36Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2020-06-16T08:02:36Z
    Message:               All nodes are updating to rendered-worker-5b16814212becca0a0e2432af89fbeb5
    Reason:                
    Status:                True
    Type:                  Updating
    Last Transition Time:  2020-06-16T08:07:49Z
    Message:               Node huir-upg-jlqd2-compute-2 is reporting: "unexpected on-disk state validating against rendered-worker-e4e32d87b18decf05a89b1af84fe9075"
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2020-06-16T08:07:49Z
    Message:               
    Reason:                
    Status:                True
    Type:                  Degraded
  Configuration:
    Name:  rendered-worker-e4e32d87b18decf05a89b1af84fe9075
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-690b3f0d-6b01-433d-89a1-76ecae732a0e-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-fips
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ssh
  Degraded Machine Count:     1
  Machine Count:              3
  Observed Generation:        3
  Ready Machine Count:        0
  Unavailable Machine Count:  1
  Updated Machine Count:      0
Events:                       <none>


Expected results:
UPgrade suceeded.

Comment 4 Antonio Murdaca 2020-06-16 10:23:32 UTC
So, yesterday https://github.com/openshift/machine-config-operator/pull/1822 merged in 4.5 and your cluster is experiencing what that PR was trying to solve - is this consistent and happening all the time?

Comment 5 Antonio Murdaca 2020-06-16 10:30:22 UTC
The payload you're upgrading to contains the following MCO commit: https://github.com/openshift/machine-config-operator/commit/908117045fe9ef32662554ed9ed557b3c1e1a965
The fix I referenced above (PR https://github.com/openshift/machine-config-operator/pull/1822) is fixing this behavior
Please use a newer payload and also take a look at some testing that went into the duplicate BZ http://bugzilla.redhat.com/show_bug.cgi?id=1846690 and https://bugzilla.redhat.com/show_bug.cgi?id=1842906#c55

*** This bug has been marked as a duplicate of bug 1846690 ***

Comment 6 W. Trevor King 2021-04-05 17:46:01 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475