Bug 1781141 - Upgrade from 4.2.9 to 4.3 failed for MCO: controller version mismatch
Summary: Upgrade from 4.2.9 to 4.3 failed for MCO: controller version mismatch
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.3.0
Assignee: Yu Qi Zhang
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1778904
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-09 12:08 UTC by Mike Fiedler
Modified: 2020-01-23 11:18 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:18:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:18:41 UTC

Description Mike Fiedler 2019-12-09 12:08:44 UTC
Description of problem:

Upgrading 4.2.9 to 4.3.0-0.nightly-2019-12-06-094536 fails:

    message: 'Unable to apply 4.3.0-0.nightly-2019-12-06-094536: timed out waiting
      for the condition during syncRequiredMachineConfigPools: pool master has not
      progressed to latest configuration: controller version mismatch for rendered-master-348056f16abd630d3dced666f8bc9080
      expected 2789973d61a0011415e2d019c09bbcb0f1bd3383 has d780d197a9c5848ba786982c0c4aaa7487297046,


oc adm must-gather fails in this cluster:

[root@ip-172-31-53-199 must-gather]# oc adm must-gather
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a5f927c199fa9fa90a5793d012eda90cd4163d4d2ff4d0ad04534401faba5b24
[must-gather      ] OUT namespace/openshift-must-gather-z27xz created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-97k45 created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a5f927c199fa9fa90a5793d012eda90cd4163d4d2ff4d0ad04534401faba5b24 created
[must-gather-ftwtk] OUT gather did not start: timed out waiting for the condition
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-97k45 deleted
[must-gather      ] OUT namespace/openshift-must-gather-z27xz deleted
error: gather did not start for pod must-gather-ftwtk: timed out waiting for the condition


I'll grab the MCO logs.



Version-Release number of selected component (if applicable): Upgrading 4.2.9 to 4.3.0-0.nightly-2019-12-06-094536


How reproducible: Unknown

Comment 1 Mike Fiedler 2019-12-09 12:11:10 UTC
Trying to retrieve MCO logs:

# oc logs machine-config-operator-5c4c599bc7-7dzh7 > machine-config-operator-5c4c599bc7-7dzh7
Error from server: Get https://10.0.157.10:10250/containerLogs/openshift-machine-config-operator/machine-config-operator-5c4c599bc7-7dzh7/machine-config-operator: x509: certificate signed by unknown authority

Comment 2 Mike Fiedler 2019-12-09 12:34:50 UTC
Tried to get logs via oc debug node and that failed too

# oc debug node/ip-10-0-157-10.us-west-2.compute.internal
Starting pod/ip-10-0-157-10us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.157.10
If you don't see a command prompt, try pressing enter.

Removing debug pod ...
Error from server: error dialing backend: x509: certificate signed by unknown authority

Comment 10 Mike Fiedler 2019-12-13 13:37:56 UTC
Moving back to MODIFIED since it depends on un-merged PR https://github.com/openshift/installer/pull/2777

Comment 16 Michael Nguyen 2019-12-18 21:47:31 UTC
Verified I was able to successfully upgrade from 4.2.9 to 4.3.0-0.nightly-2019-12-18-145749.

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.9     True        False         15m     Cluster version is 4.2.9
$ oc adm upgrade --force --to-image=registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-18-145749
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-18-145749
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.9     True        False         15m     Cluster version is 4.2.9
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.9     True        True          4s      Working towards registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-18-145749: downloading update
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.9     True        True          7m39s   Working towards 4.3.0-0.nightly-2019-12-18-145749: 65% complete
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.9     True        True          30m     Working towards 4.3.0-0.nightly-2019-12-18-145749: 84% complete
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2019-12-18-145749   True        False         2m9s    Cluster version is 4.3.0-0.nightly-2019-12-18-145749

Comment 18 errata-xmlrpc 2020-01-23 11:18:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.