Description of problem: Upgrade from 4.6.8 stalls on machine-config operator. Upgrade from 4.6.9 stalls on machine-config operator. Version-Release number of selected component (if applicable): 4.6.8 -> 4.7.0-fc.0 4.6.9 -> 4.7.0-fc.0 How reproducible: Every time. Steps to Reproduce: 1. Deploy 4.6.8 cluster on VMware using UPI Method. 2. Set Release Channel to 4.7 Candidate 3. Upgrade from 4.6.8 Actual results: Upgrade stalls on machine-config operator Expected results: Master/Worker Nodes should be updated to 4.7.0-fc.0 Additional info: Master and Worker MachineConfigPool report the following: message: 'Node openshift-pjq7x-worker2 is reporting: "error enabling unit: Failed to enable unit: Unit file nodeip-configuration.service does not exist.\n"' message: 'Node openshift-pjq7x-master1 is reporting: "error enabling unit: Failed to enable unit: Unit file nodeip-configuration.service does not exist.\n"' Both 00-master and 00-worker MachineConfig have the following configuration: - contents: "" enabled: true name: nodeip-configuration.service PlatformSpec is type: VSPhere oc get infrastructures.config.openshift.io cluster -o yaml apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-12-18T21:59:51Z" generation: 1 name: cluster resourceVersion: "547" uid: 10f3785a-11ea-45d9-99b4-f36b8ea24a49 spec: cloudConfig: key: config name: cloud-provider-config platformSpec: type: VSphere
Deployed a new cluster using provider: None. Cluster upgraded from 4.6.8 -> 4.7.0-fc.0 I believe the issue is not in 4.7 but in 4.6. The nodeip-configuration.service should not be enabled in a VMware UPI install based on the following if statement. However, as seen in my first comment, the service is enabled but the Cluster and the Nodes do not complain and allow upgrades from 4.6.z to a higher z-release. https://github.com/openshift/machine-config-operator/blob/ca283c2500df8cdc787600e8fcbd311b99859538/templates/common/_base/units/nodeip-configuration.service.yaml#L2 $ oc get infrastructures.config.openshift.io cluster -o yaml apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-12-21T16:13:48Z" generation: 1 name: cluster resourceVersion: "548" uid: 866bf03a-9566-4c21-9b48-2ff3e3787edf spec: cloudConfig: name: "" platformSpec: type: None status: apiServerInternalURI: https://api-int.openshift.lab.int:6443 apiServerURL: https://api.openshift.lab.int:6443 etcdDiscoveryDomain: openshift.lab.int infrastructureName: openshift-w2sbh platform: None platformStatus: type: None
Reproduced again adding info if it is helpful [miyadav@miyadav Upgrade_tests]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.8 True True 4h43m Unable to apply 4.7.0-0.nightly-2021-01-07-181010: the cluster operator openshift-apiserver is degraded [miyadav@miyadav Upgrade_tests]$ Steps: Vsphere 4.6.8 UPI cluster Upgraded to 4.7 nightly Additional info: http://pastebin.test.redhat.com/929793 http://pastebin.test.redhat.com/929825
Thanks for all of the detailed info. This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1910738, which has an open PR that is close to merging. *** This bug has been marked as a duplicate of bug 1910738 ***
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days