Bug 2030186

Summary: cv.status.conditions.Available goes to False if conditionalEdges have invalid risk names
Product: OpenShift Container Platform Reporter: Yang Yang <yanyang>
Component: Cluster Version OperatorAssignee: W. Trevor King <wking>
Status: CLOSED WONTFIX QA Contact: Yang Yang <yanyang>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.10CC: aos-bugs, jokerman, wking
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 06:06:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yang Yang 2021-12-08 07:03:19 UTC
Description of problem:
If a cluster is patched to a dummy cincinnati which has invalid risk names, CVO sets the Available condition to False which is confusing. It would be better to prompt the invalid risk name error but keep the cluster available.

# oc get clusterversion/version -ojson | jq -r .status.conditions
[
  {
    "lastTransitionTime": "2021-12-08T05:53:39Z",
    "status": "False",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2021-12-08T05:53:39Z",
    "message": "ClusterVersion.config.openshift.io \"version\" is invalid: status.conditionalUpdates.conditions.reason: Invalid value: \"Multiple releases\": status.conditionalUpdates.conditions.reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'",
    "status": "True",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2021-12-07T08:51:24Z",
    "message": "Error ensuring the cluster version is up to date: ClusterVersion.config.openshift.io \"version\" is invalid: status.conditionalUpdates.conditions.reason: Invalid value: \"Multiple releases\": status.conditionalUpdates.conditions.reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'",
    "status": "False",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2021-12-08T05:50:33Z",
    "status": "True",
    "type": "RetrievedUpdates"
  }
]

Version-Release number of the following components:
4.10.0-0.nightly-2021-12-03-213835

How reproducible:
Always

Steps to Reproduce:
1. Install a cluster
2. Patch to use the dummy cincinnati
# oc patch clusterversion/version --patch '{"spec":{"upstream":"https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-multi-payloads.json"}}' --type=merge
clusterversion.config.openshift.io/version patched

Actual results:
status.Available is set to False

Expected results:
It would be better to prompt the invalid risk name error but keep the cluster available.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Yang Yang 2021-12-08 07:53:05 UTC
CVO log is available online https://drive.google.com/file/d/1zcPyDqTePN6Hdey4je2Y6pqCXkKkG2U6/view?usp=sharing.

Comment 2 W. Trevor King 2021-12-09 06:06:41 UTC
This turned out to be trickier than just the invalid risk name, since we have other properties that are only validated in the Kube-API-server today.  We've opened [1] to discuss and pick up a plan that covers all of them (or decides we're ok leaving them uncovered, because the risk of graph-data admins creating this invalid data seems low).  I'm closing this WONTFIX for now, but depending on how [1] works out, we may end up re-opening later.

[1]: https://issues.redhat.com/browse/OTA-537