2031727 – [CNV-4.10] kubemacpool & nmstate pods stuck in pending state

Bug 2031727 - [CNV-4.10] kubemacpool & nmstate pods stuck in pending state

Summary: [CNV-4.10] kubemacpool & nmstate pods stuck in pending state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Installation
Sub Component:
Version:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Simone Tiraboschi
QA Contact:	Debarati Basu-Nag
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-13 10:57 UTC by Lukas Bednar
Modified:	2022-03-16 15:57 UTC (History)
CC List:	5 users (show)
Fixed In Version:	CNV-v4.10.0-505
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-16 15:57:31 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
nodes.yaml (103.07 KB, text/plain) 2021-12-13 10:57 UTC, Lukas Bednar	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubevirt cluster-network-addons-operator pull 1118	0	None	Merged	placement, infra: fallback to master node	2021-12-16 17:22:17 UTC

Description Lukas Bednar 2021-12-13 10:57:03 UTC

Created attachment 1846057 [details]
nodes.yaml

Description of problem:

When deploying CNV-4.10 there are kubemacpool & nmstate pods stuck in pending state.

$ oc -n openshift-cnv get pods |grep -v Running
NAME                                                            READY   STATUS    RESTARTS       AGE
kubemacpool-mac-controller-manager-76fcbd7d66-66q79             0/1     Pending   0              144m
nmstate-cert-manager-8587fc7fc8-8wqrz                           0/1     Pending   0              144m
nmstate-webhook-6f74c58966-fgjqf                                0/1     Pending   0              144m
nmstate-webhook-6f74c58966-tqncp                                0/1     Pending   0              144m


Version-Release number of selected component (if applicable):
HCO-v4.10.0-445

How reproducible: 100


Steps to Reproduce:
1. Deploy CNV-4.10
2. Observe nmstate & kubemacpool pods
3.

Actual results: Stuck in pending


Expected results: Up and running


Additional info:
Attaching pods & nodes logs

Comment 4 Petr Horáček 2021-12-13 11:03:09 UTC

This is due to the default node selector changed from "master" to "control-plane".

I believe that the issue is that HCO does not have any defaults in the placement API. That means that every component will be scheduled based on the defaults set by the upstream component. We should not assume any control over U/S and should expect these defaults may change any time.

Moving to HCO as it can control the default placement for all components it ships.

Comment 6 Oren Cohen 2021-12-16 17:22:17 UTC

Fixed in upstream CNAO - https://github.com/kubevirt/cluster-network-addons-operator/pull/1118
The pods in question will be scheduled using "nodeSelectorTerms" with a logical OR between node-role.kubernetes.io/control-plane and node-role.kubernetes.io/master

Comment 7 Oren Cohen 2021-12-21 09:44:43 UTC

The fix is included in:
hco-bundle-registry-container-v4.10.0-506
cluster-network-addons-operator-container-v4.10.0-35

Comment 8 Lukas Bednar 2021-12-21 10:39:42 UTC

Verified with v4.10.0-506

Comment 13 errata-xmlrpc 2022-03-16 15:57:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947

Note You need to log in before you can comment on or make changes to this bug.