Bug 1851856
Summary: | Deployment not progressing due to PriorityClass missing | ||||||
---|---|---|---|---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Asher Shoshan <ashoshan> | ||||
Component: | Installation | Assignee: | Yuval Turgeman <yturgema> | ||||
Status: | CLOSED ERRATA | QA Contact: | Asher Shoshan <ashoshan> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 2.4.0 | CC: | cnv-qe-bugs, lbednar, ncredi, pelauter, royoung, stirabos, yturgema | ||||
Target Milestone: | --- | ||||||
Target Release: | 2.4.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | hco-bundle-registry-container-v2.3.0-449, hyperconverged-cluster-operator-container-v2.4.0-63 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-07-28 19:10:39 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Asher Shoshan
2020-06-29 09:15:24 UTC
(In reply to Asher Shoshan from comment #0) > Description of problem: > When deploying CNV 2.4, virt-operator requires PriorityClass > (kubevirt-cluster-critical), not yet in cluster. This is created only later > by HCO operator, when hco CR is created (after virt-operator is up and > running). This is exactly as we designed it, why do you think it's a bug? (In reply to Simone Tiraboschi from comment #1) > (In reply to Asher Shoshan from comment #0) > > Description of problem: > > When deploying CNV 2.4, virt-operator requires PriorityClass > > (kubevirt-cluster-critical), not yet in cluster. This is created only later > > by HCO operator, when hco CR is created (after virt-operator is up and > > running). > > This is exactly as we designed it, > why do you think it's a bug? If your script is waiting for HCO operator to become ready, and then creating the HCO cr -- then it's a deadlock. Created attachment 1699157 [details]
Failed CSV
This is more a GUI glitch than a real dedlock. HCO pod will become ready because it's checking conditions on the CR for virt-operator that doesn't exist before the user creates HCO CR so HCO pod will be ready for sure. The real issue here is that OLM will try to start virt-operator and it will fail due to the lack of its priority class. So, after a certain timeout the CSV will be declared as failed and this can confuse the user although the user will be still allowed to create the CR for HCO exactly as documented and this is enough to trigger the creation of the priority class and get it up and running. I'm attaching a screenshot with the issue. The only option that I see is trying to create KV priority class on HCO start. (In reply to Simone Tiraboschi from comment #4) > This is more a GUI glitch than a real dedlock. > HCO pod will become ready because it's checking conditions on the CR for > virt-operator that doesn't exist before the user creates HCO CR so HCO pod > will be ready for sure. > > The real issue here is that OLM will try to start virt-operator and it will > fail due to the lack of its priority class. > So, after a certain timeout the CSV will be declared as failed and this can > confuse the user although the user will be still allowed to create the CR > for HCO exactly as documented and this is enough to trigger the creation of > the priority class and get it up and running. > > I'm attaching a screenshot with the issue. > > The only option that I see is trying to create KV priority class on HCO > start. An end-user UI deploying CNV package, does not need to hurry-up, and create HCO cr, in order to rectify the virt-operator pending start. He usually waits, until the UI, shows the operator installed, then he will proceed to create all CR's. More than that, if he hadn't created the HCo cr on time, UI deployment would enter "failed" state. CLI deployment scripts, do not wait for OLM to successfully deploy all underlying deployment/pods of the CSV, and once HCO operator pod is ready, it will create the HCO cr (rectify the pending state). The missing priorityClass, should be created by HCO operator (implemented as an init-container), or by a separate OLM deploy/pod. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3194 |