Bug 1979303

Summary: [release-4.8] CI update from 4.7 to 4.8 sticks on: EncryptionMigrationController_Error: EncryptionMigrationControllerDegraded: etcdserver: request timed out
Product: OpenShift Container Platform Reporter: Lukasz Szaszkiewicz <lszaszki>
Component: apiserver-authAssignee: Lukasz Szaszkiewicz <lszaszki>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.8CC: aos-bugs, liyao, lszaszki, mfojtik, surbania, wking, xxia
Target Milestone: ---Keywords: Upgrades
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: tag-ci
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1974520 Environment:
Last Closed: 2021-08-10 11:27:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1974520    
Bug Blocks:    

Comment 3 Xingxing Xia 2021-08-06 03:59:42 UTC
This bug is a corner defect with no definite way to reproduce. The corner is explained in above comment. To verify it, investigated the code's operatorv1helpers.UpdateStatus in terms of:
- how to manually update the status condition as above error "message" and "reason" while keeping "False" and MEANTIME deleting OLD operator pod instance.
- the code uses https://github.com/openshift/library-go/blob/master/pkg/operator/genericoperatorclient/dynamic_operator_client.go#L66 to do the update. Need construct such a CLIENT dynamicOperatorClient to do the update.
But finally not yet able to successfully construct one that can do it to reproduce it.

So, to verify it, have to only check "Encrypted|Encryption\w*Controller" in 4.8 CI since the PRs merging time point:
https://search.ci.openshift.org/?search=Encrypted%7CEncryption%5Cw*Controller&maxAge=168h&context=1&type=junit&name=4%5C.8&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Not see the symptom again.

In addition, the bug reported for EncryptionMigrationControllerDegraded, actually there are also other five conditions using same code logic:
Encrypted EncryptionMigrationControllerProgressing EncryptionPruneControllerDegraded EncryptionStateControllerDegraded EncryptionKeyControllerDegraded

The PRs include fix in files of all these conditions, so the fix code covers well. Given above CI check uses regular expression: Encrypted|Encryption\w*Controller, Moving to VERIFIED

Comment 5 errata-xmlrpc 2021-08-10 11:27:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.4 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2983