Bug 1851856 - Deployment not progressing due to PriorityClass missing
Summary: Deployment not progressing due to PriorityClass missing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 2.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 2.4.0
Assignee: Yuval Turgeman
QA Contact: Asher Shoshan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-29 09:15 UTC by Asher Shoshan
Modified: 2020-07-28 19:10 UTC (History)
7 users (show)

Fixed In Version: hco-bundle-registry-container-v2.3.0-449, hyperconverged-cluster-operator-container-v2.4.0-63
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-28 19:10:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Failed CSV (99.67 KB, image/png)
2020-06-29 14:03 UTC, Simone Tiraboschi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 669 0 None closed Create a PriorityClass for KubeVirt on startup 2020-07-19 13:00:35 UTC
Github kubevirt hyperconverged-cluster-operator pull 686 0 None closed [release-2.4] Create a PriorityClass for KubeVirt on startup (#669) 2020-07-19 13:00:35 UTC
Red Hat Product Errata RHSA-2020:3194 0 None None None 2020-07-28 19:10:50 UTC

Description Asher Shoshan 2020-06-29 09:15:24 UTC
Description of problem:
When deploying CNV 2.4, virt-operator requires PriorityClass (kubevirt-cluster-critical), not yet in cluster.  This is created only later by HCO operator, when hco CR is created (after virt-operator is up and running).


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy CNV 2.4 on a clean 4.5 OCP cluster (without any previous CNV versions installed)
2.
3.

Actual results:
Deployment is not progressing

Expected results:
To be completed

Additional info:
Using baremetal workers

Comment 1 Simone Tiraboschi 2020-06-29 12:02:15 UTC
(In reply to Asher Shoshan from comment #0)
> Description of problem:
> When deploying CNV 2.4, virt-operator requires PriorityClass
> (kubevirt-cluster-critical), not yet in cluster.  This is created only later
> by HCO operator, when hco CR is created (after virt-operator is up and
> running).

This is exactly as we designed it,
why do you think it's a bug?

Comment 2 Asher Shoshan 2020-06-29 12:45:19 UTC
(In reply to Simone Tiraboschi from comment #1)
> (In reply to Asher Shoshan from comment #0)
> > Description of problem:
> > When deploying CNV 2.4, virt-operator requires PriorityClass
> > (kubevirt-cluster-critical), not yet in cluster.  This is created only later
> > by HCO operator, when hco CR is created (after virt-operator is up and
> > running).
> 
> This is exactly as we designed it,
> why do you think it's a bug?

If your script is waiting for HCO operator to become ready, and then creating the HCO cr -- then it's a deadlock.

Comment 3 Simone Tiraboschi 2020-06-29 14:03:12 UTC
Created attachment 1699157 [details]
Failed CSV

Comment 4 Simone Tiraboschi 2020-06-29 14:03:50 UTC
This is more a GUI glitch than a real dedlock.
HCO pod will become ready because it's checking conditions on the CR for virt-operator that doesn't exist before the user creates HCO CR so HCO pod will be ready for sure.

The real issue here is that OLM will try to start virt-operator and it will fail due to the lack of its priority class.
So, after a certain timeout the CSV will be declared as failed and this can confuse the user although the user will be still allowed to create the CR for HCO exactly as documented and this is enough to trigger the creation of the priority class and get it up and running.

I'm attaching a screenshot with the issue.

The only option that I see is trying to create KV priority class on HCO start.

Comment 5 Asher Shoshan 2020-06-29 17:44:49 UTC
(In reply to Simone Tiraboschi from comment #4)
> This is more a GUI glitch than a real dedlock.
> HCO pod will become ready because it's checking conditions on the CR for
> virt-operator that doesn't exist before the user creates HCO CR so HCO pod
> will be ready for sure.
> 
> The real issue here is that OLM will try to start virt-operator and it will
> fail due to the lack of its priority class.
> So, after a certain timeout the CSV will be declared as failed and this can
> confuse the user although the user will be still allowed to create the CR
> for HCO exactly as documented and this is enough to trigger the creation of
> the priority class and get it up and running.
> 
> I'm attaching a screenshot with the issue.
> 
> The only option that I see is trying to create KV priority class on HCO
> start.

An end-user UI deploying CNV package, does not need to hurry-up, and create HCO cr, in order to rectify the virt-operator pending start.
He usually waits, until the UI, shows the operator installed, then he will proceed to create all CR's.
More than that, if he hadn't created the HCo cr on time, UI deployment would enter "failed" state.

CLI deployment scripts, do not wait for OLM to successfully deploy all underlying deployment/pods of the CSV, and once HCO operator pod is ready, it will create the HCO cr (rectify the pending state).

The missing priorityClass, should be created by HCO operator (implemented as an init-container), or by a separate OLM deploy/pod.

Comment 9 errata-xmlrpc 2020-07-28 19:10:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3194


Note You need to log in before you can comment on or make changes to this bug.