Bug 1952266 - etcd operator bumps status.version[name=operator] before operands update
Summary: etcd operator bumps status.version[name=operator] before operands update
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-21 21:52 UTC by W. Trevor King
Modified: 2021-07-27 23:02 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: On upgrade, the etcd operator immediately marked itself as having fully upgraded, before it began rolling out the 4.7 versions of the openshift-etcd pods. Consequence: If the etcd pod upgrade fails, the cluster might mistakenly report that it had fully upgraded to new version despite some nodes still running a older etcd. (The etcd operator would be Degraded in this case, but it would mistakenly be reporting that it was on newer version and Degraded, rather than the older version and Degraded.) Fix: The etcd operator now correctly waits for the etcd to be upgraded before declaring itself as upgraded. Result: Version reporting should be correct. Upgrades should proceed in proper sequence.
Clone Of:
Environment:
Last Closed: 2021-07-27 23:02:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 576 0 None open Bug 1952266: Don't set operator version before operands update 2021-04-26 16:28:22 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:02:54 UTC

Description W. Trevor King 2021-04-21 21:52:29 UTC
Similar to bug 1928157 and bug 1952174, but different operator.  Example update from 4.7.8 to 4.8.0-0.ci-2021-04-21-123839 [1]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1384851693719523328/artifacts/e2e-aws-upgrade/openshift-e2e-test/artifacts/e2e.log | grep 'I clusteroperator/etcd.*versions'
  Apr 21 13:31:30.629 I clusteroperator/etcd versions: operator 4.7.8 -> 4.8.0-0.ci-2021-04-21-123839, raw-internal 4.7.8 -> 4.8.0-0.ci-2021-04-21-123839
  Apr 21 13:35:29.825 I clusteroperator/etcd versions: etcd 4.7.8 -> 4.8.0-0.ci-2021-04-21-123839

But from [2]:

  An operator reports a new "operator" version when it has rolled out the new version to all of its operands.

The operator should delay the 'operator' version bump until the operands have all leveled.  This isn't as bad as the bug 1952174 case, because the etcd operator is asking the cluster-version operator to wait on both the 'operator' and 'etcd' entries [3,4].  So we're still waiting on you to update.  But things like the cluster_operator_conditions metrics will be confused about what version the etcd component is at until this bug gets fixed.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1384851693719523328
[2]: https://github.com/openshift/api/blob/a99ffa1cac6709edf8f502b16890b16f9a557e00/config/v1/types_cluster_operator.go#L43-L47
[3]: https://github.com/openshift/cluster-etcd-operator/blob/a254ec3cfafed3cc4787cbf511070d6e5dd1517c/manifests/0000_12_etcd-operator_07_clusteroperator.yaml#L10-L15
[4]: https://github.com/openshift/cluster-version-operator/blob/6fdd1e0f313f9c67ddf93037a0d4e17ce62e89ab/docs/user/reconciliation.md#clusteroperator

Comment 2 ge liu 2021-04-29 10:15:23 UTC
Verified,the co version have not be updated in upgrade processing.

Comment 6 errata-xmlrpc 2021-07-27 23:02:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.