1952266 – etcd operator bumps status.version[name=operator] before operands update

Bug 1952266 - etcd operator bumps status.version[name=operator] before operands update

Summary: etcd operator bumps status.version[name=operator] before operands update

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Sam Batschelet
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-21 21:52 UTC by W. Trevor King
Modified:	2021-07-27 23:02 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: On upgrade, the etcd operator immediately marked itself as having fully upgraded, before it began rolling out the 4.7 versions of the openshift-etcd pods. Consequence: If the etcd pod upgrade fails, the cluster might mistakenly report that it had fully upgraded to new version despite some nodes still running a older etcd. (The etcd operator would be Degraded in this case, but it would mistakenly be reporting that it was on newer version and Degraded, rather than the older version and Degraded.) Fix: The etcd operator now correctly waits for the etcd to be upgraded before declaring itself as upgraded. Result: Version reporting should be correct. Upgrades should proceed in proper sequence.
Clone Of:
Environment:
Last Closed:	2021-07-27 23:02:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-etcd-operator pull 576	0	None	open	Bug 1952266: Don't set operator version before operands update	2021-04-26 16:28:22 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:02:54 UTC

Description W. Trevor King 2021-04-21 21:52:29 UTC

Similar to bug 1928157 and bug 1952174, but different operator.  Example update from 4.7.8 to 4.8.0-0.ci-2021-04-21-123839 [1]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1384851693719523328/artifacts/e2e-aws-upgrade/openshift-e2e-test/artifacts/e2e.log | grep 'I clusteroperator/etcd.*versions'
  Apr 21 13:31:30.629 I clusteroperator/etcd versions: operator 4.7.8 -> 4.8.0-0.ci-2021-04-21-123839, raw-internal 4.7.8 -> 4.8.0-0.ci-2021-04-21-123839
  Apr 21 13:35:29.825 I clusteroperator/etcd versions: etcd 4.7.8 -> 4.8.0-0.ci-2021-04-21-123839

But from [2]:

  An operator reports a new "operator" version when it has rolled out the new version to all of its operands.

The operator should delay the 'operator' version bump until the operands have all leveled.  This isn't as bad as the bug 1952174 case, because the etcd operator is asking the cluster-version operator to wait on both the 'operator' and 'etcd' entries [3,4].  So we're still waiting on you to update.  But things like the cluster_operator_conditions metrics will be confused about what version the etcd component is at until this bug gets fixed.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1384851693719523328
[2]: https://github.com/openshift/api/blob/a99ffa1cac6709edf8f502b16890b16f9a557e00/config/v1/types_cluster_operator.go#L43-L47
[3]: https://github.com/openshift/cluster-etcd-operator/blob/a254ec3cfafed3cc4787cbf511070d6e5dd1517c/manifests/0000_12_etcd-operator_07_clusteroperator.yaml#L10-L15
[4]: https://github.com/openshift/cluster-version-operator/blob/6fdd1e0f313f9c67ddf93037a0d4e17ce62e89ab/docs/user/reconciliation.md#clusteroperator

Comment 2 ge liu 2021-04-29 10:15:23 UTC

Verified，the co version have not be updated in upgrade processing.

Comment 6 errata-xmlrpc 2021-07-27 23:02:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.