2035344 – kubemacpool-mac-controller-manager not ready

Bug 2035344 - kubemacpool-mac-controller-manager not ready

Summary: kubemacpool-mac-controller-manager not ready

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.9.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Ram Lavi
QA Contact:	awax
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2056619
TreeView+	depends on / blocked

Reported:	2021-12-23 17:12 UTC by Dan Kenigsberg
Modified:	2025-10-03 11:27 UTC (History)
CC List:	7 users (show)
Fixed In Version:	v4.11.0-156
Doc Type:	Known Issue
Doc Text:	On a large cluster, the OpenShift Virtualization MAC pool manager might take too much time to boot and OpenShift Virtualization might not become ready. As a workaround, if you do not require MAC pooling functionality, then disable this sub-component by running the following command: `oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'networkaddonsconfigs.kubevirt.io/jsonpatch=[{"op": "replace","path": "/spec/kubeMacPool","value": null}]'`.
Clone Of:
Clones:	2056619 (view as bug list)
Environment:
Last Closed:	2022-09-14 19:28:30 UTC
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	awax: needinfo- awax: needinfo-

Attachments	(Terms of Use)
log from kubemacpool-cert-manager (18.74 KB, text/plain) 2021-12-23 17:12 UTC, Dan Kenigsberg	no flags	Details
describe non-ready kubemacpool-mac-controller-manager (5.45 KB, text/plain) 2021-12-23 17:16 UTC, Dan Kenigsberg	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CNV-15529	None	None	None	2023-03-20 20:25:08 UTC
Red Hat Knowledge Base (Solution)	6027991	None	None	None	2022-02-11 01:48:18 UTC
Red Hat Product Errata	RHSA-2022:6526	None	None	None	2022-09-14 19:28:56 UTC

Description Dan Kenigsberg 2021-12-23 17:12:22 UTC

Created attachment 1847607 [details]
log from kubemacpool-cert-manager

Description of problem:
On a specific big cluster (~500 nodes) kubemacpool-mac-controller-manager pod never reaches Ready state, making it impossible to define VMs.

Version-Release number of selected component (if applicable):
CNV-4.9.1
$ oc version
Client Version: 4.9.12
Server Version: 4.9.12
Kubernetes Version: v1.22.3+e790d7f

How reproducible:
repeatedly, on one specific cluster

Steps to Reproduce:
1. Install OpenShift Virtualization

Actual results:

$ oc get pod -n openshift-cnv -l app=kubemacpool
NAME                                                 READY   STATUS    RESTARTS        AGE
kubemacpool-cert-manager-7b7bcfc9db-2c8p6            1/1     Running   0               3h13m
kubemacpool-mac-controller-manager-88b9c5b99-tt4tk   0/1     Running   33 (112s ago)   5h27m

$ ./vm.sh |oc apply -f -
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "mutatevirtualmachines.kubemacpool.io": failed to call webhook: Post "https://kubemacpool-service.openshift-cnv.svc:443/mutate-virtualmachines?timeout=10s": dial tcp 10.129.0.72:8000: connect: connection refused


Expected results:
kubemacpool is READY, serving unique MAC addresses to VMs. VM is defined.

Additional info:
Use
$ oc label namespace mynamespace mutatevirtualmachines.kubemacpool.io=ignore
to disable kubemacpool in your namespace. Make sure that your VMs do not refer to kubemacpool.io in their finalizers.

Comment 1 Dan Kenigsberg 2021-12-23 17:15:06 UTC

Created attachment 1847608 [details]
log from non-ready kubemacpool-mac-controller-manager

Comment 2 Dan Kenigsberg 2021-12-23 17:16:07 UTC

Created attachment 1847610 [details]
describe non-ready kubemacpool-mac-controller-manager

Comment 4 Quique Llorente 2021-12-24 12:31:55 UTC

After some diggins looks like it takes a lot of time for InitMap to finish (it do some api access per pod/VM) so it never reachs webhook start 
and the readiness probes hits timeout.

Possible solutions:
1. Increase readiness probe timeout
2. Remove all the api accesss per pod/vm at InitMap:
  a. Using controller-runtime client
  b. Caching namespaces and webhook configuration at the beggining
3. Parallelize InitMap and use a sync.Map for the data structure


I suggest we go for 2.a so we have the cache for free and it's already well tested.

Comment 5 Dan Kenigsberg 2021-12-28 12:12:36 UTC

Completely avoid deployment of KubeMacPool with

kubectl annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'networkaddonsconfigs.kubevirt.io/jsonpatch=[{"op": "replace","path": "/spec/kubeMacPool","value": null}]'

(notice the plurals form of networkaddonsconfigs)

Comment 6 Petr Horáček 2022-01-13 12:21:40 UTC

Targeting to 4.11. We want to take our time to properly design the solution.  The workaround is described above.

Comment 10 Ram Lavi 2022-02-22 12:27:52 UTC

https://github.com/k8snetworkplumbingwg/kubemacpool/pull/354

Comment 23 ctomasko 2022-07-25 02:25:10 UTC

@ralavi Please update the Doc Type and Doc Text fields.

Because this issue is now resolved, it is now longer a known issue. The documentation team will exclude the known issue from the 4.11 release notes.

On a large cluster, the OpenShift Virtualization MAC pool manager might take too much time to boot and OpenShift Virtualization might not become ready.

As a workaround, if you do not require MAC pooling functionality, then disable this sub-component by running the following command: `oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'networkaddonsconfigs.kubevirt.io/jsonpatch=[{"op": "replace","path": "/spec/kubeMacPool","value": null}]'`.

Comment 24 Ram Lavi 2022-07-25 06:17:48 UTC

@ctomc is removing the release note from BZ and setting the status to "If docs needed, set a value" good enough?

Comment 26 errata-xmlrpc 2022-09-14 19:28:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.11.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6526

Note You need to log in before you can comment on or make changes to this bug.