Bug 1968448

Summary: [master] KubeAPI CVO progress is not available on CR/conditions only in events.
Product: OpenShift Container Platform Reporter: Fred Rolland <frolland>
Component: assisted-installerAssignee: Fred Rolland <frolland>
assisted-installer sub component: assisted-service QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: aos-bugs, asegurap, atraeger, ercohen, mhrivnak
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AI-Team-Hive KNI-EDGE-JUKE-4.8 KNI-EDGE-4.8
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1974677 (view as bug list) Environment:
Last Closed: 2021-10-18 17:32:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1974677    
Attachments:
Description Flags
CVO progress in the UI none

Description Fred Rolland 2021-06-07 12:29:21 UTC
Description of problem:

When getting to Finalizing stage, the is no indication on the CVO progess/state.
Note that events do show:
"Cluster version status: progressing message: Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Install SNO cluster via Kube API
2. Wait for Finalizing stage
3. Check ACI CR conditions/status

Actual results:
No indication on CVO progress

Expected results:
CVO progress is available in CR

Additional info:
Note that events do show:
"Cluster version status: progressing message: Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"

Comment 1 Michael Filanov 2021-06-10 13:17:17 UTC
Where do we have this info in the cloud?

Comment 2 Fred Rolland 2021-06-16 09:25:22 UTC
I see progress only in events in UI.
In REST there is:

    "monitored_operators": [
      {
        "cluster_id": "641f8e78-8058-48fa-bfd1-207dd9077f96",
        "name": "console",
        "operator_type": "builtin",
        "status_updated_at": "0001-01-01T00:00:00.000Z",
        "timeout_seconds": 3600
      },
      {
        "cluster_id": "641f8e78-8058-48fa-bfd1-207dd9077f96",
        "name": "cvo",
        "operator_type": "builtin",
        "status": "progressing",
        "status_info": "Working towards 4.8.0-fc.7: 590 of 676 done (87% complete)",
        "status_updated_at": "2021-06-16T09:24:00.181Z",
        "timeout_seconds": 3600
      }
    ],

Comment 3 Eran Cohen 2021-06-16 16:58:17 UTC
Created attachment 1791586 [details]
CVO progress in the UI

Comment 4 Eran Cohen 2021-06-16 17:12:33 UTC
Some background about why the assisted-installer is monitoring the clusterversion API
  
While finalizing the installation (once all nodes joined the cluster) the assisted-controller will watch the installation progress by monitoring the clusterversion (equivalent to: `oc get clustgerversion`)
As you can see in the attached image from the UI or in frolland comments above the progress message usually looks like this: 
"Working towards 4.8.0-fc.7: 590 of 676 done (87% complete)"
In some cases clusterversion message might seem like an installation failure e.g.:
"Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"
or
" Unable to apply 4.8.0-fc.9: the cluster operator authentication is degraded"

While that looks bad it doesn't mean that the installation failed, CVO will keep retrying to apply the desired configuration and monitor the clusteroperators status and eventually (in most cases) reach: "Cluster version is 4.8.0-fc.9"

I think it is desirable to add this information to a condition.
I don't think the misleading messages are an issue while the condition status is set to "progressing".

Comment 5 Fred Rolland 2021-06-17 07:57:41 UTC
@atraeger @mhrivnak Thoughts?

Comment 6 Michael Hrivnak 2021-06-18 17:20:50 UTC
It seems reasonable to reflect that progress message into the AgentClusterInstall's "Complete" condition message when its Reason == "InstallationInProgress".

Comment 7 Michael Filanov 2021-06-20 06:04:57 UTC
There are two operators here that we monitor, console and CVO, i'm not sure that it will be clear to combine then in the same condition.
Maybe we should have separate condition for each?

Comment 8 Eran Cohen 2021-06-20 08:32:40 UTC
The CVO progress should go to complete condition.
console progress can go to a new condition (The console progress is not a must to start with)

Comment 9 Michael Filanov 2021-06-20 08:47:52 UTC
type: ConsoleCompleted 
status: unknown - no info (the same as in the example above), false - still installing, failed to install, true - instlaled
reason: unkonwn - NoProgress, false - status, true - status
message: unknown - NoProgress, false - "status info" - status info

sounds good?

Comment 10 Michael Hrivnak 2021-06-21 17:46:57 UTC
I assume we are watching the "Available" condition on the ClusterVersion? And when that becomes "true", we mark our "Complete" condition as true?

For the console, I assume we are also looking at the "Available" condition on its ClusterOperator resource? That seems reasonable to reflect in a separate local condition, especially if its availability is not a requirement for declaring an installation "complete".

Comment 11 Fred Rolland 2021-06-28 05:01:47 UTC
CVO status will be part of the "Completed" condition: https://github.com/openshift/assisted-service/pull/2110

Opened an issue to handle other operators: https://issues.redhat.com/browse/MGMT-7099

Comment 15 errata-xmlrpc 2021-10-18 17:32:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759