Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1737420

Summary:

Upgrade of Cloud-credential operator is failed during upgrade cluster 4.1.9->4.2

Product:

OpenShift Container Platform

Reporter:

Oleg Nesterov <olnester>

Component:

Cloud Credential Operator

Assignee:

Devan Goodwin <dgoodwin>

Status:

CLOSED DUPLICATE

QA Contact:

Oleg Nesterov <olnester>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

4.1.z

CC:

hongli

Target Milestone:

---

Target Release:

4.2.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-08-06 17:27:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
logs from cloud-credential-operator pod	none

Description Oleg Nesterov 2019-08-05 10:47:34 UTC

Created attachment 1600617 [details]
logs from cloud-credential-operator pod

Description of problem:

Cloud-credential operator is not upgraded after upgrade cluster from 4.1.9->4.2

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Setup 4.1.9 version on AWS 
2. Make upgrade to 4.2.0-0.nightly-2019-08-01-113533

Actual results:

CCO has status 
Progressing is true
Degraded is true

Expected results:
Upgrade of CCO is successfully

Additional info:

[onest@localhost ~]$ oc get clusteroperator
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-08-01-113533   True        False         False      6h13m
cloud-credential                           4.1.9                               True        True          True       6h23m


[onest@localhost ~]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.9     True        True          97m     Unable to apply 4.2.0-0.nightly-2019-08-01-113533: the cluster operator cloud-credential has not yet successfully rolled out

Comment 1 Devan Goodwin 2019-08-06 17:27:55 UTC

I'm going to mark this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1726451

We have never been able to reproduce this but the current working theory was that this is caused by a lack of leader election, the old pod is still running while the new is spinning up, and if they happen to reconcile in the wrong order the old resets the version to the old again. (this seems consistent with your log)

The fix (if the theory is correct) has been live in 4.2 for some time, however because this was a 4.1 -> 4.2 upgrade it would not have helped, the old pod would still remain running and the new would not be contesting anything for leadership election. As such the theory still appears to hold.

I am going to propose a backport for 4.1 this week.

*** This bug has been marked as a duplicate of bug 1726451 ***