Bug 1979303 - [release-4.8] CI update from 4.7 to 4.8 sticks on: EncryptionMigrationController_Error: EncryptionMigrationControllerDegraded: etcdserver: request timed out
Summary: [release-4.8] CI update from 4.7 to 4.8 sticks on: EncryptionMigrationControl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.z
Assignee: Lukasz Szaszkiewicz
QA Contact: Xingxing Xia
URL:
Whiteboard: tag-ci
Depends On: 1974520
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-05 14:43 UTC by Lukasz Szaszkiewicz
Modified: 2021-08-10 11:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1974520
Environment:
Last Closed: 2021-08-10 11:27:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 467 0 None open Bug 1979303: clear encryption conditions when there is no work to be done 2021-07-30 22:15:59 UTC
Github openshift cluster-kube-apiserver-operator pull 1188 0 None open Bug 1979303: clear encryption conditions when there is no work to be done 2021-07-30 22:15:57 UTC
Github openshift cluster-openshift-apiserver-operator pull 463 0 None open Bug 1979303: clear encryption conditions when there is no work to be done 2021-07-30 22:15:54 UTC
Github openshift library-go pull 1127 0 None closed clear encryption conditions when there is no work to be done 2021-07-15 18:50:36 UTC
Red Hat Product Errata RHSA-2021:2983 0 None None None 2021-08-10 11:28:01 UTC

Comment 3 Xingxing Xia 2021-08-06 03:59:42 UTC
This bug is a corner defect with no definite way to reproduce. The corner is explained in above comment. To verify it, investigated the code's operatorv1helpers.UpdateStatus in terms of:
- how to manually update the status condition as above error "message" and "reason" while keeping "False" and MEANTIME deleting OLD operator pod instance.
- the code uses https://github.com/openshift/library-go/blob/master/pkg/operator/genericoperatorclient/dynamic_operator_client.go#L66 to do the update. Need construct such a CLIENT dynamicOperatorClient to do the update.
But finally not yet able to successfully construct one that can do it to reproduce it.

So, to verify it, have to only check "Encrypted|Encryption\w*Controller" in 4.8 CI since the PRs merging time point:
https://search.ci.openshift.org/?search=Encrypted%7CEncryption%5Cw*Controller&maxAge=168h&context=1&type=junit&name=4%5C.8&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Not see the symptom again.

In addition, the bug reported for EncryptionMigrationControllerDegraded, actually there are also other five conditions using same code logic:
Encrypted EncryptionMigrationControllerProgressing EncryptionPruneControllerDegraded EncryptionStateControllerDegraded EncryptionKeyControllerDegraded

The PRs include fix in files of all these conditions, so the fix code covers well. Given above CI check uses regular expression: Encrypted|Encryption\w*Controller, Moving to VERIFIED

Comment 5 errata-xmlrpc 2021-08-10 11:27:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.4 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2983


Note You need to log in before you can comment on or make changes to this bug.