*** Bug 1815179 has been marked as a duplicate of this bug. ***
verified with upgrade 4.3.5 to 4.4.0-0.nightly-2020-03-23-010639, added some resource to etcd before upgrade.
Upgrade a cluster from 4.2->4.3->4.4, still failed at 4.3->4.4. error info from mcp master shows: message: 'Node ip-10-0-172-254.us-east-2.compute.internal is reporting: "rename /etc/machine-config-daemon/orig/usr/local/bin/etcd-member-add.sh.mcdorig /usr/local/bin/etcd-member-add.sh: invalid cross-device link"' Here are some more infos: # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.0-0.nightly-2020-03-23-115620 True False False 18h cloud-credential 4.4.0-0.nightly-2020-03-23-115620 True False False 18h cluster-autoscaler 4.4.0-0.nightly-2020-03-23-115620 True False False 18h console 4.4.0-0.nightly-2020-03-23-115620 True False False 13h csi-snapshot-controller 4.4.0-0.nightly-2020-03-23-115620 True False False 13h dns 4.4.0-0.nightly-2020-03-23-115620 True False False 18h etcd 4.4.0-0.nightly-2020-03-23-115620 True False False 16m image-registry 4.4.0-0.nightly-2020-03-23-115620 True False False 13h ingress 4.4.0-0.nightly-2020-03-23-115620 True False False 13h insights 4.4.0-0.nightly-2020-03-23-115620 True False False 18h kube-apiserver 4.4.0-0.nightly-2020-03-23-115620 True False False 18h kube-controller-manager 4.4.0-0.nightly-2020-03-23-115620 True False False 14h kube-scheduler 4.4.0-0.nightly-2020-03-23-115620 True False False 14h kube-storage-version-migrator 4.4.0-0.nightly-2020-03-23-115620 True False False 13h machine-api 4.4.0-0.nightly-2020-03-23-115620 True False False 18h machine-config 4.3.0-0.nightly-2020-03-20-053743 False True True 13h marketplace 4.4.0-0.nightly-2020-03-23-115620 True False False 13h monitoring 4.4.0-0.nightly-2020-03-23-115620 True False False 13h network 4.4.0-0.nightly-2020-03-23-115620 True False False 18h node-tuning 4.4.0-0.nightly-2020-03-23-115620 True False False 13h openshift-apiserver 4.4.0-0.nightly-2020-03-23-115620 True False True 13h openshift-controller-manager 4.4.0-0.nightly-2020-03-23-115620 True False False 18h openshift-samples 4.4.0-0.nightly-2020-03-23-115620 True False False 14h operator-lifecycle-manager 4.4.0-0.nightly-2020-03-23-115620 True False False 18h operator-lifecycle-manager-catalog 4.4.0-0.nightly-2020-03-23-115620 True False False 18h operator-lifecycle-manager-packageserver 4.4.0-0.nightly-2020-03-23-115620 True False False 13h service-ca 4.4.0-0.nightly-2020-03-23-115620 True False False 18h service-catalog-apiserver 4.4.0-0.nightly-2020-03-23-115620 True False False 13h service-catalog-controller-manager 4.4.0-0.nightly-2020-03-23-115620 True False False 15h storage 4.4.0-0.nightly-2020-03-23-115620 True False False 14h # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2020-03-20-053743 True True 14h Unable to apply 4.4.0-0.nightly-2020-03-23-115620: the cluster operator openshift-apiserver is degraded # oc get node NAME STATUS ROLES AGE VERSION ip-10-0-129-165.us-east-2.compute.internal Ready master 17h v1.16.2 ip-10-0-143-184.us-east-2.compute.internal Ready worker 17h v1.17.1 ip-10-0-148-46.us-east-2.compute.internal Ready worker 17h v1.17.1 ip-10-0-158-80.us-east-2.compute.internal Ready master 17h v1.16.2 ip-10-0-160-88.us-east-2.compute.internal Ready worker 17h v1.17.1 ip-10-0-172-254.us-east-2.compute.internal Ready,SchedulingDisabled master 17h v1.16.2 [root@MiWiFi-R1CM ~]# oc get co machine-config -o yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: creationTimestamp: "2020-03-24T09:07:48Z" generation: 1 name: machine-config resourceVersion: "434388" selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config uid: eebea21a-6dae-11ea-91ef-02b5a04d55cc spec: {} status: conditions: - lastTransitionTime: "2020-03-24T13:34:42Z" message: Cluster not available for 4.4.0-0.nightly-2020-03-23-115620 status: "False" type: Available - lastTransitionTime: "2020-03-24T13:36:44Z" message: Working towards 4.4.0-0.nightly-2020-03-23-115620 status: "True" type: Progressing - lastTransitionTime: "2020-03-24T13:34:41Z" message: 'Unable to apply 4.4.0-0.nightly-2020-03-23-115620: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-ae1f3d111090dc50b108976cfe9743cb expected d5d9a488c1e0e19e1d3044bd0fac90096b0224d6 has ab4d62a3bf3774b77b6f9b04a2028faec1568aca, retrying' reason: RequiredPoolsFailed status: "True" type: Degraded - lastTransitionTime: "2020-03-24T09:08:46Z" reason: AsExpected status: "True" type: Upgradeable extension: {} relatedObjects: - group: "" name: openshift-machine-config-operator resource: namespaces - group: machineconfiguration.openshift.io name: master resource: machineconfigpools - group: machineconfiguration.openshift.io name: worker resource: machineconfigpools - group: machineconfiguration.openshift.io name: machine-config-controller resource: controllerconfigs versions: - name: operator version: 4.3.0-0.nightly-2020-03-20-053743 [root@MiWiFi-R1CM ~]# oc get co openshift-apiserver -o yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: creationTimestamp: "2020-03-24T09:07:59Z" generation: 1 name: openshift-apiserver resourceVersion: "145634" selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-apiserver uid: f5477f87-6dae-11ea-91ef-02b5a04d55cc spec: {} status: conditions: - lastTransitionTime: "2020-03-24T13:42:46Z" message: 'APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable' reason: APIServerDeployment_UnavailablePod status: "True" type: Degraded - lastTransitionTime: "2020-03-24T13:22:42Z" reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2020-03-24T13:30:08Z" reason: AsExpected status: "True" type: Available - lastTransitionTime: "2020-03-24T09:07:59Z" reason: AsExpected status: "True" type: Upgradeable extension: null relatedObjects: - group: operator.openshift.io name: cluster resource: openshiftapiservers - group: "" name: openshift-config resource: namespaces - group: "" name: openshift-config-managed resource: namespaces - group: "" name: openshift-apiserver-operator resource: namespaces - group: "" name: openshift-apiserver resource: namespaces - group: "" name: host-etcd-2 namespace: openshift-etcd resource: endpoints - group: apiregistration.k8s.io name: v1.apps.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.authorization.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.build.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.image.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.oauth.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.project.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.quota.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.route.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.security.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.template.openshift.io resource: apiservices - group: apiregistration.k8s.io name: v1.user.openshift.io resource: apiservices versions: - name: operator version: 4.4.0-0.nightly-2020-03-23-115620 - name: openshift-apiserver version: 4.4.0-0.nightly-2020-03-23-115620 [root@MiWiFi-R1CM ~]# oc get co machine-config -o yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: creationTimestamp: "2020-03-24T09:07:48Z" generation: 1 name: machine-config resourceVersion: "434388" selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config uid: eebea21a-6dae-11ea-91ef-02b5a04d55cc spec: {} status: conditions: - lastTransitionTime: "2020-03-24T13:34:42Z" message: Cluster not available for 4.4.0-0.nightly-2020-03-23-115620 status: "False" type: Available - lastTransitionTime: "2020-03-24T13:36:44Z" message: Working towards 4.4.0-0.nightly-2020-03-23-115620 status: "True" type: Progressing - lastTransitionTime: "2020-03-24T13:34:41Z" message: 'Unable to apply 4.4.0-0.nightly-2020-03-23-115620: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-ae1f3d111090dc50b108976cfe9743cb expected d5d9a488c1e0e19e1d3044bd0fac90096b0224d6 has ab4d62a3bf3774b77b6f9b04a2028faec1568aca, retrying' reason: RequiredPoolsFailed status: "True" type: Degraded - lastTransitionTime: "2020-03-24T09:08:46Z" reason: AsExpected status: "True" type: Upgradeable extension: {} relatedObjects: - group: "" name: openshift-machine-config-operator resource: namespaces - group: machineconfiguration.openshift.io name: master resource: machineconfigpools - group: machineconfiguration.openshift.io name: worker resource: machineconfigpools - group: machineconfiguration.openshift.io name: machine-config-controller resource: controllerconfigs versions: - name: operator version: 4.3.0-0.nightly-2020-03-20-053743 [root@MiWiFi-R1CM ~]# oc get mcp master -o yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: creationTimestamp: "2020-03-24T09:07:53Z" generation: 4 labels: machineconfiguration.openshift.io/mco-built-in: "" operator.machineconfiguration.openshift.io/required-for-upgrade: "" name: master resourceVersion: "144629" selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/master uid: f1ea987f-6dae-11ea-91ef-02b5a04d55cc spec: configuration: name: rendered-master-c9dbe1108410107d4942f4fd2c14cc4f source: - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 00-master - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-master-container-runtime - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-master-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-master-f1ea987f-6dae-11ea-91ef-02b5a04d55cc-registries - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-master-ssh machineConfigSelector: matchLabels: machineconfiguration.openshift.io/role: master nodeSelector: matchLabels: node-role.kubernetes.io/master: "" paused: false status: conditions: - lastTransitionTime: "2020-03-24T09:08:21Z" message: "" reason: "" status: "False" type: RenderDegraded - lastTransitionTime: "2020-03-24T13:40:25Z" message: "" reason: "" status: "False" type: Updated - lastTransitionTime: "2020-03-24T13:40:25Z" message: All nodes are updating to rendered-master-c9dbe1108410107d4942f4fd2c14cc4f reason: "" status: "True" type: Updating - lastTransitionTime: "2020-03-24T13:41:01Z" message: 'Node ip-10-0-172-254.us-east-2.compute.internal is reporting: "rename /etc/machine-config-daemon/orig/usr/local/bin/etcd-member-add.sh.mcdorig /usr/local/bin/etcd-member-add.sh: invalid cross-device link"' reason: 1 nodes are reporting degraded status on sync status: "True" type: NodeDegraded - lastTransitionTime: "2020-03-24T13:41:01Z" message: "" reason: "" status: "True" type: Degraded configuration: name: rendered-master-ae1f3d111090dc50b108976cfe9743cb source: - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 00-master - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-master-container-runtime - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-master-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-master-f1ea987f-6dae-11ea-91ef-02b5a04d55cc-registries - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-master-ssh degradedMachineCount: 1 machineCount: 3 observedGeneration: 4 readyMachineCount: 0 unavailableMachineCount: 1 updatedMachineCount: 0
I can also see this bug when upgrade from 4.3.8 to 4.4.0-0.nightly-2020-03-24-225110: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.8 True True 126m Unable to apply 4.4.0-0.nightly-2020-03-24-225110: the cluster operator openshift-apiserver is degraded Status: Available Updates: <nil> Conditions: Last Transition Time: 2020-03-25T02:29:17Z Message: Done applying 4.3.8 Status: True Type: Available Last Transition Time: 2020-03-25T04:34:12Z Message: Cluster operator openshift-apiserver is reporting a failure: APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable Reason: ClusterOperatorDegraded Status: True Type: Failing Last Transition Time: 2020-03-25T03:50:04Z Message: Unable to apply 4.4.0-0.nightly-2020-03-24-225110: the cluster operator openshift-apiserver is degraded Reason: ClusterOperatorDegraded Status: True Type: Progressing Last Transition Time: 2020-03-25T02:32:29Z Status: True Type: RetrievedUpdates
Must gather paused with below error: $ oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c67d14314241030909c260f2e67667df4f499997c40bb7144413e8ede5abe53 [must-gather ] OUT namespace/openshift-must-gather-llnm6 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-rhjnh created [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c67d14314241030909c260f2e67667df4f499997c40bb7144413e8ede5abe53 created [must-gather-7r5xp] POD Wrote inspect data to must-gather. [must-gather-7r5xp] POD Gathering data for ns/openshift-cluster-version... [must-gather-7r5xp] POD Wrote inspect data to must-gather. [must-gather-7r5xp] POD Gathering data for ns/openshift-config... [must-gather-7r5xp] POD Gathering data for ns/openshift-config-managed... [must-gather-7r5xp] POD Gathering data for ns/openshift-authentication... [must-gather-7r5xp] POD Gathering data for ns/openshift-authentication-operator... [must-gather-7r5xp] POD Gathering data for ns/openshift-ingress... [must-gather-7r5xp] POD Gathering data for ns/openshift-cloud-credential-operator... [must-gather-7r5xp] POD Gathering data for ns/openshift-machine-api... [must-gather-7r5xp] POD Gathering data for ns/openshift-console-operator... [must-gather-7r5xp] POD Gathering data for ns/openshift-console... [must-gather-7r5xp] POD Gathering data for ns/openshift-csi-snapshot-controller... [must-gather-7r5xp] POD Gathering data for ns/openshift-csi-snapshot-controller-operator... [must-gather-7r5xp] POD Gathering data for ns/openshift-dns-operator... [must-gather-7r5xp] POD Gathering data for ns/openshift-dns... [must-gather-7r5xp] OUT waiting for gather to complete [must-gather-7r5xp] OUT gather never finished: pods "must-gather-7r5xp" not found [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-rhjnh deleted [must-gather ] OUT namespace/openshift-must-gather-llnm6 deleted error: gather never finished for pod must-gather-7r5xp: pods "must-gather-7r5xp" not found
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475