Bug 1979303 - [release-4.8] CI update from 4.7 to 4.8 sticks on: EncryptionMigrationController_Error: EncryptionMigrationControllerDegraded: etcdserver: request timed out
Summary: [release-4.8] CI update from 4.7 to 4.8 sticks on: EncryptionMigrationControl...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.8
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.8.z
Assignee: Lukasz Szaszkiewicz
QA Contact: Xingxing Xia
Whiteboard: tag-ci
Depends On: 1974520
TreeView+ depends on / blocked
Reported: 2021-07-05 14:43 UTC by Lukasz Szaszkiewicz
Modified: 2021-08-10 11:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1974520
Last Closed: 2021-08-10 11:27:39 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 467 0 None open Bug 1979303: clear encryption conditions when there is no work to be done 2021-07-30 22:15:59 UTC
Github openshift cluster-kube-apiserver-operator pull 1188 0 None open Bug 1979303: clear encryption conditions when there is no work to be done 2021-07-30 22:15:57 UTC
Github openshift cluster-openshift-apiserver-operator pull 463 0 None open Bug 1979303: clear encryption conditions when there is no work to be done 2021-07-30 22:15:54 UTC
Github openshift library-go pull 1127 0 None closed clear encryption conditions when there is no work to be done 2021-07-15 18:50:36 UTC
Red Hat Product Errata RHSA-2021:2983 0 None None None 2021-08-10 11:28:01 UTC

Comment 3 Xingxing Xia 2021-08-06 03:59:42 UTC
This bug is a corner defect with no definite way to reproduce. The corner is explained in above comment. To verify it, investigated the code's operatorv1helpers.UpdateStatus in terms of:
- how to manually update the status condition as above error "message" and "reason" while keeping "False" and MEANTIME deleting OLD operator pod instance.
- the code uses https://github.com/openshift/library-go/blob/master/pkg/operator/genericoperatorclient/dynamic_operator_client.go#L66 to do the update. Need construct such a CLIENT dynamicOperatorClient to do the update.
But finally not yet able to successfully construct one that can do it to reproduce it.

So, to verify it, have to only check "Encrypted|Encryption\w*Controller" in 4.8 CI since the PRs merging time point:

Not see the symptom again.

In addition, the bug reported for EncryptionMigrationControllerDegraded, actually there are also other five conditions using same code logic:
Encrypted EncryptionMigrationControllerProgressing EncryptionPruneControllerDegraded EncryptionStateControllerDegraded EncryptionKeyControllerDegraded

The PRs include fix in files of all these conditions, so the fix code covers well. Given above CI check uses regular expression: Encrypted|Encryption\w*Controller, Moving to VERIFIED

Comment 5 errata-xmlrpc 2021-08-10 11:27:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.4 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.