Bug 2031727

Summary: [CNV-4.10] kubemacpool & nmstate pods stuck in pending state
Product: Container Native Virtualization (CNV) Reporter: Lukas Bednar <lbednar>
Component: InstallationAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Debarati Basu-Nag <dbasunag>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.10.0CC: cnv-qe-bugs, ibesso, ocohen, stirabos, ycui
Target Milestone: ---Keywords: Regression
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: CNV-v4.10.0-505 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-16 15:57:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nodes.yaml none

Description Lukas Bednar 2021-12-13 10:57:03 UTC
Created attachment 1846057 [details]
nodes.yaml

Description of problem:

When deploying CNV-4.10 there are kubemacpool & nmstate pods stuck in pending state.

$ oc -n openshift-cnv get pods |grep -v Running
NAME                                                            READY   STATUS    RESTARTS       AGE
kubemacpool-mac-controller-manager-76fcbd7d66-66q79             0/1     Pending   0              144m
nmstate-cert-manager-8587fc7fc8-8wqrz                           0/1     Pending   0              144m
nmstate-webhook-6f74c58966-fgjqf                                0/1     Pending   0              144m
nmstate-webhook-6f74c58966-tqncp                                0/1     Pending   0              144m


Version-Release number of selected component (if applicable):
HCO-v4.10.0-445

How reproducible: 100


Steps to Reproduce:
1. Deploy CNV-4.10
2. Observe nmstate & kubemacpool pods
3.

Actual results: Stuck in pending


Expected results: Up and running


Additional info:
Attaching pods & nodes logs

Comment 4 Petr Horáček 2021-12-13 11:03:09 UTC
This is due to the default node selector changed from "master" to "control-plane".

I believe that the issue is that HCO does not have any defaults in the placement API. That means that every component will be scheduled based on the defaults set by the upstream component. We should not assume any control over U/S and should expect these defaults may change any time.

Moving to HCO as it can control the default placement for all components it ships.

Comment 6 Oren Cohen 2021-12-16 17:22:17 UTC
Fixed in upstream CNAO - https://github.com/kubevirt/cluster-network-addons-operator/pull/1118
The pods in question will be scheduled using "nodeSelectorTerms" with a logical OR between node-role.kubernetes.io/control-plane and node-role.kubernetes.io/master

Comment 7 Oren Cohen 2021-12-21 09:44:43 UTC
The fix is included in:
hco-bundle-registry-container-v4.10.0-506
cluster-network-addons-operator-container-v4.10.0-35

Comment 8 Lukas Bednar 2021-12-21 10:39:42 UTC
Verified with v4.10.0-506

Comment 13 errata-xmlrpc 2022-03-16 15:57:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947