Bug 1851856

Summary: Deployment not progressing due to PriorityClass missing
Product: Container Native Virtualization (CNV) Reporter: Asher Shoshan <ashoshan>
Component: InstallationAssignee: Yuval Turgeman <yturgema>
Status: CLOSED ERRATA QA Contact: Asher Shoshan <ashoshan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.4.0CC: cnv-qe-bugs, lbednar, ncredi, pelauter, royoung, stirabos, yturgema
Target Milestone: ---   
Target Release: 2.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v2.3.0-449, hyperconverged-cluster-operator-container-v2.4.0-63 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-28 19:10:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Failed CSV none

Description Asher Shoshan 2020-06-29 09:15:24 UTC
Description of problem:
When deploying CNV 2.4, virt-operator requires PriorityClass (kubevirt-cluster-critical), not yet in cluster.  This is created only later by HCO operator, when hco CR is created (after virt-operator is up and running).


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy CNV 2.4 on a clean 4.5 OCP cluster (without any previous CNV versions installed)
2.
3.

Actual results:
Deployment is not progressing

Expected results:
To be completed

Additional info:
Using baremetal workers

Comment 1 Simone Tiraboschi 2020-06-29 12:02:15 UTC
(In reply to Asher Shoshan from comment #0)
> Description of problem:
> When deploying CNV 2.4, virt-operator requires PriorityClass
> (kubevirt-cluster-critical), not yet in cluster.  This is created only later
> by HCO operator, when hco CR is created (after virt-operator is up and
> running).

This is exactly as we designed it,
why do you think it's a bug?

Comment 2 Asher Shoshan 2020-06-29 12:45:19 UTC
(In reply to Simone Tiraboschi from comment #1)
> (In reply to Asher Shoshan from comment #0)
> > Description of problem:
> > When deploying CNV 2.4, virt-operator requires PriorityClass
> > (kubevirt-cluster-critical), not yet in cluster.  This is created only later
> > by HCO operator, when hco CR is created (after virt-operator is up and
> > running).
> 
> This is exactly as we designed it,
> why do you think it's a bug?

If your script is waiting for HCO operator to become ready, and then creating the HCO cr -- then it's a deadlock.

Comment 3 Simone Tiraboschi 2020-06-29 14:03:12 UTC
Created attachment 1699157 [details]
Failed CSV

Comment 4 Simone Tiraboschi 2020-06-29 14:03:50 UTC
This is more a GUI glitch than a real dedlock.
HCO pod will become ready because it's checking conditions on the CR for virt-operator that doesn't exist before the user creates HCO CR so HCO pod will be ready for sure.

The real issue here is that OLM will try to start virt-operator and it will fail due to the lack of its priority class.
So, after a certain timeout the CSV will be declared as failed and this can confuse the user although the user will be still allowed to create the CR for HCO exactly as documented and this is enough to trigger the creation of the priority class and get it up and running.

I'm attaching a screenshot with the issue.

The only option that I see is trying to create KV priority class on HCO start.

Comment 5 Asher Shoshan 2020-06-29 17:44:49 UTC
(In reply to Simone Tiraboschi from comment #4)
> This is more a GUI glitch than a real dedlock.
> HCO pod will become ready because it's checking conditions on the CR for
> virt-operator that doesn't exist before the user creates HCO CR so HCO pod
> will be ready for sure.
> 
> The real issue here is that OLM will try to start virt-operator and it will
> fail due to the lack of its priority class.
> So, after a certain timeout the CSV will be declared as failed and this can
> confuse the user although the user will be still allowed to create the CR
> for HCO exactly as documented and this is enough to trigger the creation of
> the priority class and get it up and running.
> 
> I'm attaching a screenshot with the issue.
> 
> The only option that I see is trying to create KV priority class on HCO
> start.

An end-user UI deploying CNV package, does not need to hurry-up, and create HCO cr, in order to rectify the virt-operator pending start.
He usually waits, until the UI, shows the operator installed, then he will proceed to create all CR's.
More than that, if he hadn't created the HCo cr on time, UI deployment would enter "failed" state.

CLI deployment scripts, do not wait for OLM to successfully deploy all underlying deployment/pods of the CSV, and once HCO operator pod is ready, it will create the HCO cr (rectify the pending state).

The missing priorityClass, should be created by HCO operator (implemented as an init-container), or by a separate OLM deploy/pod.

Comment 9 errata-xmlrpc 2020-07-28 19:10:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3194