Description of problem: Restoring cluster to previous state after 4.6 to 4.7 upgrade results in machine-config operator in Degraded state. Version-Release number of selected component (if applicable): Upgrade from 4.6.9 to 4.7.0-fc.0 [kni@provisionhost-0-0 ~]$ ./openshift-baremetal-install version ./openshift-baremetal-install 4.6.9 built from commit a48ad4a15b42102d1747d2f5f3b635deffb950b5 release image registry.svc.ci.openshift.org/ocp/release@sha256:43d5c84169a4b3ff307c29d7374f6d69a707de15e9fa90ad352b432f77c0cead How reproducible: Every time. Steps to Reproduce: 1. Backup cluster. 2. Mirror release image to the disconnected registry. 3. Create ImageContentSourcePolicy. 4. Create ConfigMap for image signature. 5. Create custom upgrade graph. 6. Point CVO to custom upgrade graph. 7. Upgrade to 4.7 nightly. 8. Restore cluster to previous state (based on https://docs.openshift.com/container-platform/4.6/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html) Actual results: machine-config operator is in degraded state after restore. Expected results: Cluster is restored to previous state. Additional info: Virtual env: 3 masters + 2 workers, disconnected deployment
Pretty sure this is because MCO started using ignition version 3.2 in 4.7 and MCD in 4.6 only understands up to 3.1. https://github.com/openshift/machine-config-operator/pull/2248
While I don't think y-stream downgrades are supported really, I was able to break out of this by: - `oc get mc -oyaml` on the MC currentConfig for each node type (currentConfig annotation on the Nodes) - delete all rendered MCs (one will be regenerated per node type) - edit currentConfig MCs from step 1, replacing ignition version 3.2.0 with 3.1.0 - recreate the currentConfig MCs with `oc create -f` - log into each node and delete /etc/machine-config-daemon/currentConfig Should get things moving.
Adding the Keywords because this bug blocks: the testing of https://issues.redhat.com/browse/API-1055 ; And the testing of downgrade cases of upgrade subteam.
Could you say more about why you are testing a downgrade path? The MCO has been operating under the understanding that we do not support downgrade paths. There might be some reasoning here that we are not understanding. Thank you for providing more context.
I think you meant to set needinfo on the reporter.
> Could you say more about why you are testing a downgrade path? The upgrade QE guys (like above Yang Yang) have downgrade test case. They say, though downgrade is not officially supported, Dev requires QE should have a basic check for downgrade. So that the cluster function can be ensured to work, when its upgrading hits problem and makes it go into urgent situation. In addition, while doing the basic checking for downgrade, QE hit / reported many issues which made the cluster malfunction, like bug 1907812, bug 1913620, bug 1916586 etc. But they were all fixed. This is why testing downgrade.
(In reply to Michelle Krejci from comment #5) > Could you say more about why you are testing a downgrade path? As Xingxing commented, we are testing disaster recovery as part of updates/upgrades testing on IPI BM.
Hi, this is Jerry from the MCO team. Regarding major y stream downgrades, the MCO has never guarenteed its ability, and in this case like Seth mentions the MCO in 4.6 does not have the ignition version bump and we are unfortunately unlikely to backport that functionality, given the priority of other work. In terms of workaround, like Seth mentions, its possible to set all ignition 3.2 machineconfigs to 3.1 manually, and it should get past that error (3.1->3.2 should not have changed anything unless you are using LUKS encryption, which would not be backwards compatible). Apologies to the disruption of the QE process. If you believe "MCO downgradeability" should be a supported flow, please raise the issue as a new epic. The MCO today does not consider this a bug.
Closing this as NOTABUG for now. If we would like to discuss this further, perhaps Jira is a better place to continue