Bug 1816704
| Summary: | Cloud Credential Operator pod crashlooping with golang segfault | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Scott Dodson <sdodson> | |
| Component: | Cloud Credential Operator | Assignee: | Joel Diaz <jdiaz> | |
| Status: | CLOSED ERRATA | QA Contact: | wang lin <lwan> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.2.z | CC: | bbrownin, cblecker, ccoleman, jdiaz, lmohanty, lwan, nmalik, vrutkovs, wking | |
| Target Milestone: | --- | Keywords: | ServiceDeliveryBlocker, Upgrades | |
| Target Release: | 4.3.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1813998 | |||
| : | 1819183 (view as bug list) | Environment: | ||
| Last Closed: | 2020-04-14 16:18:53 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1813998 | |||
| Bug Blocks: | 1819183 | |||
|
Description
Scott Dodson
2020-03-24 15:12:58 UTC
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. Who is impacted? Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Up to 2 minute disruption in edge routing Up to 90seconds of API downtime etcd loses quorum and you have to restore from backup How involved is remediation? Issue resolves itself after five minutes Admin uses oc to fix things Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression? No, it’s always been like this we just never noticed Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1 (In reply to Scott Dodson from comment #2) > Who is impacted? Clusters originally installed with 4.1 and upgraded to affected versions. > What is the impact? Cloud-credential-operator is unable to process any CredentialsRequests. If the permissions requested in a CredentialsRequest changed along with the introduction of the bug, then that CredentialsRequest would be unable to be proccessed. Alerts would also potentially be firing since the CCO is unhealthy. > How involved is remediation? Manual patching of the Infrastructure CR to put in the new/updated Status fields. > Is this a regression? Yes, since the backport of the enhanced permissions simulation https://github.com/openshift/cloud-credential-operator/pull/157 The upgrading process is : 4.1.24 -> 4.2.20 -> 4.3.0-0.nightly-2020-04-06-093556 The bug has fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1393 |