Bug 1737420 - Upgrade of Cloud-credential operator is failed during upgrade cluster 4.1.9->4.2
Summary: Upgrade of Cloud-credential operator is failed during upgrade cluster 4.1.9->4.2
Keywords:
Status: CLOSED DUPLICATE of bug 1726451
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Credential Operator
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.2.0
Assignee: Devan Goodwin
QA Contact: Oleg Nesterov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-05 10:47 UTC by Oleg Nesterov
Modified: 2019-08-06 17:27 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-06 17:27:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs from cloud-credential-operator pod (44.24 KB, text/plain)
2019-08-05 10:47 UTC, Oleg Nesterov
no flags Details

Description Oleg Nesterov 2019-08-05 10:47:34 UTC
Created attachment 1600617 [details]
logs from cloud-credential-operator pod

Description of problem:

Cloud-credential operator is not upgraded after upgrade cluster from 4.1.9->4.2

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Setup 4.1.9 version on AWS 
2. Make upgrade to 4.2.0-0.nightly-2019-08-01-113533

Actual results:

CCO has status 
Progressing is true
Degraded is true

Expected results:
Upgrade of CCO is successfully

Additional info:

[onest@localhost ~]$ oc get clusteroperator
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-08-01-113533   True        False         False      6h13m
cloud-credential                           4.1.9                               True        True          True       6h23m


[onest@localhost ~]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.9     True        True          97m     Unable to apply 4.2.0-0.nightly-2019-08-01-113533: the cluster operator cloud-credential has not yet successfully rolled out

Comment 1 Devan Goodwin 2019-08-06 17:27:55 UTC
I'm going to mark this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1726451

We have never been able to reproduce this but the current working theory was that this is caused by a lack of leader election, the old pod is still running while the new is spinning up, and if they happen to reconcile in the wrong order the old resets the version to the old again. (this seems consistent with your log)

The fix (if the theory is correct) has been live in 4.2 for some time, however because this was a 4.1 -> 4.2 upgrade it would not have helped, the old pod would still remain running and the new would not be contesting anything for leadership election. As such the theory still appears to hold.

I am going to propose a backport for 4.1 this week.

*** This bug has been marked as a duplicate of bug 1726451 ***


Note You need to log in before you can comment on or make changes to this bug.