Bug 1874549

Summary: Alarm when multiple CredentialsRequests have the same target secret
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Cloud Credential OperatorAssignee: Devan Goodwin <dgoodwin>
Status: CLOSED WONTFIX QA Contact: wang lin <lwan>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: gshereme, jdiaz, lwan
Target Milestone: ---Keywords: UpcomingSprint
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-04 13:46:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2020-09-01 15:09:52 UTC
Bug 1874328 describes an issue with the machine-API operator where some clusters had two CredentialsRequests targeting the same secret, and those requests eventually had divergent permission sets, breaking machine provisioning.  Ideally the cred operator would alert on this in such a way that CI would notice and fail promotion, so we'd discover and fix these issues more quickly.  Also ideally, clusters where this happened would not completely break.  Joel suggests the following discovery strategy:

1. Cred operator annotates secrets to link the owning CredentialsRequests.  Maybe this could happen via metadata.ownerReferences [1]; I can't think of anything else that would deserve to own a cred-operator created secret.  But I guess there could be trouble if someone intended the cred-operator to fill in cred options in a secret that had additional properties set by other actors.
2. Cred operator sets a failing condition on CredentialsRequests that target a secret owned by a different CredentialsRequest.
3. Cred operator sets itself Degraded=True when it has some CredentialsRequests with failing conditions.
4. Cluster-version operator sets a critical alert when a ClusterOperator has Degraded=True.
5. CI fails on critical alerts, even if they are only pending, at the end of a run.
6. We notice the failing CI runs and fix the overlapping CredentialsRequests before shipping releases that might create them.

[1]: https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents

Comment 1 Devan Goodwin 2020-09-10 12:08:45 UTC
For consideration in a future sprint.

Comment 2 Devan Goodwin 2020-09-24 12:27:37 UTC
Hoping to look at during 4.7 dev sprints.

Comment 5 Devan Goodwin 2021-02-04 13:46:11 UTC
This is a good idea but there is no real priority need for this as evidenced by the time this has been floating around. Apologies but closing for now.