1737678 – machine-config operator cannot be upgraded from 4.1.9 to 4.2

Bug 1737678 - machine-config operator cannot be upgraded from 4.1.9 to 4.2

Summary: machine-config operator cannot be upgraded from 4.1.9 to 4.2

Keywords:
Status:	CLOSED DUPLICATE of bug 1742744
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Stephen Greene
QA Contact:	Micah Abbott
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-06 03:19 UTC by Hongan Li
Modified:	2019-08-20 20:10 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-08-20 20:10:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Hongan Li 2019-08-06 03:19:44 UTC

Description of problem:
Upgrade cluster from 4.1.9 to 4.2.0-0.nightly-2019-08-01-113533, only machine-config operator is not upgraded.


Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-01-113533

How reproducible:
50%

Steps to Reproduce:
1. 
2.
3.

Actual results:
$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-08-01-113533   True        False         False      22h
cloud-credential                           4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
cluster-autoscaler                         4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
console                                    4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
dns                                        4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
image-registry                             4.2.0-0.nightly-2019-08-01-113533   True        False         False      7h19m
ingress                                    4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
kube-apiserver                             4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
kube-controller-manager                    4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
kube-scheduler                             4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
machine-api                                4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
machine-config                             4.1.9                               False       True          True       7h50m
marketplace                                4.2.0-0.nightly-2019-08-01-113533   True        False         False      8h
monitoring                                 4.2.0-0.nightly-2019-08-01-113533   False       True          True       7h4m
network                                    4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
node-tuning                                4.2.0-0.nightly-2019-08-01-113533   True        False         False      8h
openshift-apiserver                        4.2.0-0.nightly-2019-08-01-113533   True        False         False      92m
openshift-controller-manager               4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
openshift-samples                          4.2.0-0.nightly-2019-08-01-113533   True        False         False      18h
operator-lifecycle-manager                 4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-08-01-113533   True        False         False      4h7m
service-ca                                 4.2.0-0.nightly-2019-08-01-113533   True        False         False      23h
service-catalog-apiserver                  4.2.0-0.nightly-2019-08-01-113533   True        False         False      4h6m
service-catalog-controller-manager         4.2.0-0.nightly-2019-08-01-113533   True        False         False      17h
storage                                    4.2.0-0.nightly-2019-08-01-113533   True        False         False      18h
support                                    4.2.0-0.nightly-2019-08-01-113533   True        False         False      18h

$ oc get co/machine-config -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-08-05T03:47:57Z"
  generation: 1
  name: machine-config
  resourceVersion: "1094784"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config
  uid: d04f2a1b-b733-11e9-ad70-02e77de128dc
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-08-05T19:05:28Z"
    message: Cluster not available for 4.2.0-0.nightly-2019-08-01-113533
    status: "False"
    type: Available
  - lastTransitionTime: "2019-08-05T18:42:37Z"
    message: Working towards 4.2.0-0.nightly-2019-08-01-113533
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-08-05T19:05:28Z"
    message: 'Unable to apply 4.2.0-0.nightly-2019-08-01-113533: timed out waiting
      for the condition during syncRequiredMachineConfigPools: pool master has not
      progressed to latest configuration: controller version mismatch for rendered-master-ce8adbbe7a871e63d2f9fe30bf489c6f
      expected 6e75b3fe9bb02eeef9756d8b6ff1a85e790944e3 has 83392b13a5c17e56656acf3f7b0031e3303fb5c0,
      retrying'
    reason: FailedToSync
    status: "True"
    type: Degraded
  - lastTransitionTime: "2019-08-05T19:05:28Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension:
    lastSyncError: 'pool master has not progressed to latest configuration: controller
      version mismatch for rendered-master-ce8adbbe7a871e63d2f9fe30bf489c6f expected
      6e75b3fe9bb02eeef9756d8b6ff1a85e790944e3 has 83392b13a5c17e56656acf3f7b0031e3303fb5c0,
      retrying'
    worker: all 3 nodes are at latest configuration rendered-worker-5f6dd4e5c2ad1322fbf6120f4d0916d7
  relatedObjects:
  - group: ""
    name: openshift-machine-config-operator
    resource: namespaces
  - group: machineconfiguration.openshift.io
    name: master
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: worker
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: cluster
    resource: controllerconfigs
  versions:
  - name: operator
    version: 4.1.9


Expected results:
machine-config operator can be upgraded from 4.1 to 4.2.

Additional info:
Tried to upgrade another cluster with same upgrade path and succeed.

Comment 2 Jason Kincl 2019-08-07 19:07:39 UTC

This looks similar to the error in BZ#1734531

Comment 3 Antonio Murdaca 2019-08-08 07:29:30 UTC

Does this reconcile eventually? That message is saying that the MCC hasn't generated the newest rendered machineconfigs for the new version which is ok as the MCC may have not run yet.

Comment 4 Hongan Li 2019-08-08 08:32:09 UTC

No, it stuck in that status more than one day until cluster was destroyed.

This issue is not 100% reproducible, another cluster was upgraded successfully.

Comment 5 Mike Fiedler 2019-08-08 11:47:17 UTC

Two other QE (including myself) reproduced this yesterday.   Let me know what additional info is required and I can attempt again.

Comment 6 Antonio Murdaca 2019-08-19 14:17:01 UTC

This could mean the new MCC hasn't rolled out yet uhm we need must-gather to check system logs as well

Comment 7 Kirsten Garrison 2019-08-20 20:04:25 UTC

@sgreen can you confirm that you are seeing etcdquorum guard issues that might be impacting this upgrade? If so, you can mark as a dupe for 1742744

Comment 8 Stephen Greene 2019-08-20 20:10:23 UTC

Yep, can confirm that this is an etcdquorum guard issue.  Visible in must-gather/namespaces/openshift-machine-config-operator/pods/machine-config-daemon-z8l6v/machine-config-daemon/machine-config-daemon/logs/current.log

Several thousand lines of the following error:

2019-08-06T03:14:57.218366261Z I0806 03:14:57.218315  123397 update.go:89] error when evicting pod "etcd-quorum-guard-8646778784-phtjq" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.


Marking as dupe.

Comment 9 Stephen Greene 2019-08-20 20:10:49 UTC


*** This bug has been marked as a duplicate of bug 1742744 ***

Note You need to log in before you can comment on or make changes to this bug.