Bug 2031727

Summary:

[CNV-4.10] kubemacpool & nmstate pods stuck in pending state

Product:

Container Native Virtualization (CNV)

Reporter:

Lukas Bednar <lbednar>

Component:

Installation

Assignee:

Simone Tiraboschi <stirabos>

Status:

CLOSED ERRATA

QA Contact:

Debarati Basu-Nag <dbasunag>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

4.10.0

CC:

cnv-qe-bugs, ibesso, ocohen, stirabos, ycui

Target Milestone:

---

Keywords:

Regression

Target Release:

4.10.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

CNV-v4.10.0-505

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-03-16 15:57:31 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
nodes.yaml	none

Description Lukas Bednar 2021-12-13 10:57:03 UTC

Created attachment 1846057 [details]
nodes.yaml

Description of problem:

When deploying CNV-4.10 there are kubemacpool & nmstate pods stuck in pending state.

$ oc -n openshift-cnv get pods |grep -v Running
NAME                                                            READY   STATUS    RESTARTS       AGE
kubemacpool-mac-controller-manager-76fcbd7d66-66q79             0/1     Pending   0              144m
nmstate-cert-manager-8587fc7fc8-8wqrz                           0/1     Pending   0              144m
nmstate-webhook-6f74c58966-fgjqf                                0/1     Pending   0              144m
nmstate-webhook-6f74c58966-tqncp                                0/1     Pending   0              144m


Version-Release number of selected component (if applicable):
HCO-v4.10.0-445

How reproducible: 100


Steps to Reproduce:
1. Deploy CNV-4.10
2. Observe nmstate & kubemacpool pods
3.

Actual results: Stuck in pending


Expected results: Up and running


Additional info:
Attaching pods & nodes logs

Comment 4 Petr Horáček 2021-12-13 11:03:09 UTC

This is due to the default node selector changed from "master" to "control-plane".

I believe that the issue is that HCO does not have any defaults in the placement API. That means that every component will be scheduled based on the defaults set by the upstream component. We should not assume any control over U/S and should expect these defaults may change any time.

Moving to HCO as it can control the default placement for all components it ships.

Comment 6 Oren Cohen 2021-12-16 17:22:17 UTC

Fixed in upstream CNAO - https://github.com/kubevirt/cluster-network-addons-operator/pull/1118
The pods in question will be scheduled using "nodeSelectorTerms" with a logical OR between node-role.kubernetes.io/control-plane and node-role.kubernetes.io/master

Comment 7 Oren Cohen 2021-12-21 09:44:43 UTC

The fix is included in:
hco-bundle-registry-container-v4.10.0-506
cluster-network-addons-operator-container-v4.10.0-35

Comment 8 Lukas Bednar 2021-12-21 10:39:42 UTC

Verified with v4.10.0-506

Comment 13 errata-xmlrpc 2022-03-16 15:57:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947