Bug 1974677 - [4.8.0] KubeAPI CVO progress is not available on CR/conditions only in events.
Summary: [4.8.0] KubeAPI CVO progress is not available on CR/conditions only in events.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.8.0
Assignee: Fred Rolland
QA Contact: Yuri Obshansky
URL:
Whiteboard: AI-Team-Hive KNI-EDGE-JUKE-4.8 KNI-ED...
Depends On: 1968448
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-22 09:44 UTC by Michael Filanov
Modified: 2021-07-27 23:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1968448
Environment:
Last Closed: 2021-07-27 23:13:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 2117 0 None open [ocm-2.3] Bug 1974677: Add CVO progress to condition 2021-06-29 12:27:55 UTC
Red Hat Bugzilla 1968448 1 medium CLOSED [master] KubeAPI CVO progress is not available on CR/conditions only in events. 2021-10-18 17:33:23 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:13:37 UTC

Description Michael Filanov 2021-06-22 09:44:33 UTC
+++ This bug was initially created as a clone of Bug #1968448 +++

Description of problem:

When getting to Finalizing stage, the is no indication on the CVO progess/state.
Note that events do show:
"Cluster version status: progressing message: Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Install SNO cluster via Kube API
2. Wait for Finalizing stage
3. Check ACI CR conditions/status

Actual results:
No indication on CVO progress

Expected results:
CVO progress is available in CR

Additional info:
Note that events do show:
"Cluster version status: progressing message: Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"

--- Additional comment from mfilanov on 20210610T13:17:17

Where do we have this info in the cloud?

--- Additional comment from frolland on 20210616T09:25:22

I see progress only in events in UI.
In REST there is:

    "monitored_operators": [
      {
        "cluster_id": "641f8e78-8058-48fa-bfd1-207dd9077f96",
        "name": "console",
        "operator_type": "builtin",
        "status_updated_at": "0001-01-01T00:00:00.000Z",
        "timeout_seconds": 3600
      },
      {
        "cluster_id": "641f8e78-8058-48fa-bfd1-207dd9077f96",
        "name": "cvo",
        "operator_type": "builtin",
        "status": "progressing",
        "status_info": "Working towards 4.8.0-fc.7: 590 of 676 done (87% complete)",
        "status_updated_at": "2021-06-16T09:24:00.181Z",
        "timeout_seconds": 3600
      }
    ],

--- Additional comment from ercohen on 20210616T16:58:17

Created attachment 1791586 [details]
CVO progress in the UI

--- Additional comment from ercohen on 20210616T17:12:33

Some background about why the assisted-installer is monitoring the clusterversion API
  
While finalizing the installation (once all nodes joined the cluster) the assisted-controller will watch the installation progress by monitoring the clusterversion (equivalent to: `oc get clustgerversion`)
As you can see in the attached image from the UI or in frolland comments above the progress message usually looks like this: 
"Working towards 4.8.0-fc.7: 590 of 676 done (87% complete)"
In some cases clusterversion message might seem like an installation failure e.g.:
"Unable to apply 4.8.0-fc.7: the cluster operator console has not yet successfully rolled out"
or
" Unable to apply 4.8.0-fc.9: the cluster operator authentication is degraded"

While that looks bad it doesn't mean that the installation failed, CVO will keep retrying to apply the desired configuration and monitor the clusteroperators status and eventually (in most cases) reach: "Cluster version is 4.8.0-fc.9"

I think it is desirable to add this information to a condition.
I don't think the misleading messages are an issue while the condition status is set to "progressing".

--- Additional comment from frolland on 20210617T07:57:41

@atraeger @mhrivnak Thoughts?

--- Additional comment from mhrivnak on 20210618T17:20:50

It seems reasonable to reflect that progress message into the AgentClusterInstall's "Complete" condition message when its Reason == "InstallationInProgress".

--- Additional comment from mfilanov on 20210620T06:04:57

There are two operators here that we monitor, console and CVO, i'm not sure that it will be clear to combine then in the same condition.
Maybe we should have separate condition for each?

--- Additional comment from ercohen on 20210620T08:32:40

The CVO progress should go to complete condition.
console progress can go to a new condition (The console progress is not a must to start with)

--- Additional comment from mfilanov on 20210620T08:47:52

type: ConsoleCompleted 
status: unknown - no info (the same as in the example above), false - still installing, failed to install, true - instlaled
reason: unkonwn - NoProgress, false - status, true - status
message: unknown - NoProgress, false - "status info" - status info

sounds good?

--- Additional comment from mhrivnak on 20210621T17:46:57

I assume we are watching the "Available" condition on the ClusterVersion? And when that becomes "true", we mark our "Complete" condition as true?

For the console, I assume we are also looking at the "Available" condition on its ClusterOperator resource? That seems reasonable to reflect in a separate local condition, especially if its availability is not a requirement for declaring an installation "complete".

Comment 3 nshidlin 2021-07-05 13:47:18 UTC
Verified with 2.3.0-DOWNSTREAM-2021-07-02-22-02-33

oc get AgentClusterInstall -n sno-0  -o=custom-columns='STATUS:status.conditions[-3].message'
STATUS
The installation is in progress: Finalizing cluster installation. Cluster version status: progressing, message: Working towards 4.8.0-rc.1: 585 of 676 done (86% complete)

Comment 5 errata-xmlrpc 2021-07-27 23:13:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.