Bug 2031727 - [CNV-4.10] kubemacpool & nmstate pods stuck in pending state
Summary: [CNV-4.10] kubemacpool & nmstate pods stuck in pending state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 4.10.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.10.0
Assignee: Simone Tiraboschi
QA Contact: Debarati Basu-Nag
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-13 10:57 UTC by Lukas Bednar
Modified: 2022-03-16 15:57 UTC (History)
5 users (show)

Fixed In Version: CNV-v4.10.0-505
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 15:57:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nodes.yaml (103.07 KB, text/plain)
2021-12-13 10:57 UTC, Lukas Bednar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt cluster-network-addons-operator pull 1118 0 None Merged placement, infra: fallback to master node 2021-12-16 17:22:17 UTC

Description Lukas Bednar 2021-12-13 10:57:03 UTC
Created attachment 1846057 [details]
nodes.yaml

Description of problem:

When deploying CNV-4.10 there are kubemacpool & nmstate pods stuck in pending state.

$ oc -n openshift-cnv get pods |grep -v Running
NAME                                                            READY   STATUS    RESTARTS       AGE
kubemacpool-mac-controller-manager-76fcbd7d66-66q79             0/1     Pending   0              144m
nmstate-cert-manager-8587fc7fc8-8wqrz                           0/1     Pending   0              144m
nmstate-webhook-6f74c58966-fgjqf                                0/1     Pending   0              144m
nmstate-webhook-6f74c58966-tqncp                                0/1     Pending   0              144m


Version-Release number of selected component (if applicable):
HCO-v4.10.0-445

How reproducible: 100


Steps to Reproduce:
1. Deploy CNV-4.10
2. Observe nmstate & kubemacpool pods
3.

Actual results: Stuck in pending


Expected results: Up and running


Additional info:
Attaching pods & nodes logs

Comment 4 Petr Horáček 2021-12-13 11:03:09 UTC
This is due to the default node selector changed from "master" to "control-plane".

I believe that the issue is that HCO does not have any defaults in the placement API. That means that every component will be scheduled based on the defaults set by the upstream component. We should not assume any control over U/S and should expect these defaults may change any time.

Moving to HCO as it can control the default placement for all components it ships.

Comment 6 Oren Cohen 2021-12-16 17:22:17 UTC
Fixed in upstream CNAO - https://github.com/kubevirt/cluster-network-addons-operator/pull/1118
The pods in question will be scheduled using "nodeSelectorTerms" with a logical OR between node-role.kubernetes.io/control-plane and node-role.kubernetes.io/master

Comment 7 Oren Cohen 2021-12-21 09:44:43 UTC
The fix is included in:
hco-bundle-registry-container-v4.10.0-506
cluster-network-addons-operator-container-v4.10.0-35

Comment 8 Lukas Bednar 2021-12-21 10:39:42 UTC
Verified with v4.10.0-506

Comment 13 errata-xmlrpc 2022-03-16 15:57:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947


Note You need to log in before you can comment on or make changes to this bug.