Bug 1968448 - [master] KubeAPI CVO progress is not available on CR/conditions only in events.
Summary: [master] KubeAPI CVO progress is not available on CR/conditions only in events.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.9.0
Assignee: Fred Rolland
QA Contact: Yuri Obshansky
URL:
Whiteboard: AI-Team-Hive KNI-EDGE-JUKE-4.8 KNI-ED...
Depends On:
Blocks: 1974677
TreeView+ depends on / blocked
 
Reported: 2021-06-07 12:29 UTC by Fred Rolland
Modified: 2021-10-18 17:33 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1974677 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:32:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
CVO progress in the UI (70.38 KB, image/png)
2021-06-16 16:58 UTC, Eran Cohen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 2110 0 None open Bug 1968448: Add CVO progress to condition 2021-06-27 15:09:08 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:33:23 UTC

Internal Links: 1974677

Description Fred Rolland 2021-06-07 12:29:21 UTC
Description of problem:

When getting to Finalizing stage, the is no indication on the CVO progess/state.
Note that events do show:
"Cluster version status: progressing message: Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Install SNO cluster via Kube API
2. Wait for Finalizing stage
3. Check ACI CR conditions/status

Actual results:
No indication on CVO progress

Expected results:
CVO progress is available in CR

Additional info:
Note that events do show:
"Cluster version status: progressing message: Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"

Comment 1 Michael Filanov 2021-06-10 13:17:17 UTC
Where do we have this info in the cloud?

Comment 2 Fred Rolland 2021-06-16 09:25:22 UTC
I see progress only in events in UI.
In REST there is:

    "monitored_operators": [
      {
        "cluster_id": "641f8e78-8058-48fa-bfd1-207dd9077f96",
        "name": "console",
        "operator_type": "builtin",
        "status_updated_at": "0001-01-01T00:00:00.000Z",
        "timeout_seconds": 3600
      },
      {
        "cluster_id": "641f8e78-8058-48fa-bfd1-207dd9077f96",
        "name": "cvo",
        "operator_type": "builtin",
        "status": "progressing",
        "status_info": "Working towards 4.8.0-fc.7: 590 of 676 done (87% complete)",
        "status_updated_at": "2021-06-16T09:24:00.181Z",
        "timeout_seconds": 3600
      }
    ],

Comment 3 Eran Cohen 2021-06-16 16:58:17 UTC
Created attachment 1791586 [details]
CVO progress in the UI

Comment 4 Eran Cohen 2021-06-16 17:12:33 UTC
Some background about why the assisted-installer is monitoring the clusterversion API
  
While finalizing the installation (once all nodes joined the cluster) the assisted-controller will watch the installation progress by monitoring the clusterversion (equivalent to: `oc get clustgerversion`)
As you can see in the attached image from the UI or in frolland comments above the progress message usually looks like this: 
"Working towards 4.8.0-fc.7: 590 of 676 done (87% complete)"
In some cases clusterversion message might seem like an installation failure e.g.:
"Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"
or
" Unable to apply 4.8.0-fc.9: the cluster operator authentication is degraded"

While that looks bad it doesn't mean that the installation failed, CVO will keep retrying to apply the desired configuration and monitor the clusteroperators status and eventually (in most cases) reach: "Cluster version is 4.8.0-fc.9"

I think it is desirable to add this information to a condition.
I don't think the misleading messages are an issue while the condition status is set to "progressing".

Comment 5 Fred Rolland 2021-06-17 07:57:41 UTC
@atraeger @mhrivnak Thoughts?

Comment 6 Michael Hrivnak 2021-06-18 17:20:50 UTC
It seems reasonable to reflect that progress message into the AgentClusterInstall's "Complete" condition message when its Reason == "InstallationInProgress".

Comment 7 Michael Filanov 2021-06-20 06:04:57 UTC
There are two operators here that we monitor, console and CVO, i'm not sure that it will be clear to combine then in the same condition.
Maybe we should have separate condition for each?

Comment 8 Eran Cohen 2021-06-20 08:32:40 UTC
The CVO progress should go to complete condition.
console progress can go to a new condition (The console progress is not a must to start with)

Comment 9 Michael Filanov 2021-06-20 08:47:52 UTC
type: ConsoleCompleted 
status: unknown - no info (the same as in the example above), false - still installing, failed to install, true - instlaled
reason: unkonwn - NoProgress, false - status, true - status
message: unknown - NoProgress, false - "status info" - status info

sounds good?

Comment 10 Michael Hrivnak 2021-06-21 17:46:57 UTC
I assume we are watching the "Available" condition on the ClusterVersion? And when that becomes "true", we mark our "Complete" condition as true?

For the console, I assume we are also looking at the "Available" condition on its ClusterOperator resource? That seems reasonable to reflect in a separate local condition, especially if its availability is not a requirement for declaring an installation "complete".

Comment 11 Fred Rolland 2021-06-28 05:01:47 UTC
CVO status will be part of the "Completed" condition: https://github.com/openshift/assisted-service/pull/2110

Opened an issue to handle other operators: https://issues.redhat.com/browse/MGMT-7099

Comment 15 errata-xmlrpc 2021-10-18 17:32:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.