Bug 2035344 - kubemacpool-mac-controller-manager not ready
Summary: kubemacpool-mac-controller-manager not ready
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.9.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.11.0
Assignee: Ram Lavi
QA Contact: awax
URL:
Whiteboard:
Depends On:
Blocks: 2056619
TreeView+ depends on / blocked
 
Reported: 2021-12-23 17:12 UTC by Dan Kenigsberg
Modified: 2025-04-04 13:58 UTC (History)
7 users (show)

Fixed In Version: v4.11.0-156
Doc Type: Known Issue
Doc Text:
On a large cluster, the OpenShift Virtualization MAC pool manager might take too much time to boot and OpenShift Virtualization might not become ready. As a workaround, if you do not require MAC pooling functionality, then disable this sub-component by running the following command: `oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'networkaddonsconfigs.kubevirt.io/jsonpatch=[{"op": "replace","path": "/spec/kubeMacPool","value": null}]'`.
Clone Of:
: 2056619 (view as bug list)
Environment:
Last Closed: 2022-09-14 19:28:30 UTC
Target Upstream Version:
Embargoed:
awax: needinfo-
awax: needinfo-


Attachments (Terms of Use)
log from kubemacpool-cert-manager (18.74 KB, text/plain)
2021-12-23 17:12 UTC, Dan Kenigsberg
no flags Details
describe non-ready kubemacpool-mac-controller-manager (5.45 KB, text/plain)
2021-12-23 17:16 UTC, Dan Kenigsberg
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-15529 0 None None None 2023-03-20 20:25:08 UTC
Red Hat Knowledge Base (Solution) 6027991 0 None None None 2022-02-11 01:48:18 UTC
Red Hat Product Errata RHSA-2022:6526 0 None None None 2022-09-14 19:28:56 UTC

Description Dan Kenigsberg 2021-12-23 17:12:22 UTC
Created attachment 1847607 [details]
log from kubemacpool-cert-manager

Description of problem:
On a specific big cluster (~500 nodes) kubemacpool-mac-controller-manager pod never reaches Ready state, making it impossible to define VMs.

Version-Release number of selected component (if applicable):
CNV-4.9.1
$ oc version
Client Version: 4.9.12
Server Version: 4.9.12
Kubernetes Version: v1.22.3+e790d7f

How reproducible:
repeatedly, on one specific cluster

Steps to Reproduce:
1. Install OpenShift Virtualization

Actual results:

$ oc get pod -n openshift-cnv -l app=kubemacpool
NAME                                                 READY   STATUS    RESTARTS        AGE
kubemacpool-cert-manager-7b7bcfc9db-2c8p6            1/1     Running   0               3h13m
kubemacpool-mac-controller-manager-88b9c5b99-tt4tk   0/1     Running   33 (112s ago)   5h27m

$ ./vm.sh |oc apply -f -
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "mutatevirtualmachines.kubemacpool.io": failed to call webhook: Post "https://kubemacpool-service.openshift-cnv.svc:443/mutate-virtualmachines?timeout=10s": dial tcp 10.129.0.72:8000: connect: connection refused


Expected results:
kubemacpool is READY, serving unique MAC addresses to VMs. VM is defined.

Additional info:
Use
$ oc label namespace mynamespace mutatevirtualmachines.kubemacpool.io=ignore
to disable kubemacpool in your namespace. Make sure that your VMs do not refer to kubemacpool.io in their finalizers.

Comment 1 Dan Kenigsberg 2021-12-23 17:15:06 UTC
Created attachment 1847608 [details]
log from non-ready kubemacpool-mac-controller-manager

Comment 2 Dan Kenigsberg 2021-12-23 17:16:07 UTC
Created attachment 1847610 [details]
describe non-ready kubemacpool-mac-controller-manager

Comment 4 Quique Llorente 2021-12-24 12:31:55 UTC
After some diggins looks like it takes a lot of time for InitMap to finish (it do some api access per pod/VM) so it never reachs webhook start 
and the readiness probes hits timeout.

Possible solutions:
1. Increase readiness probe timeout
2. Remove all the api accesss per pod/vm at InitMap:
  a. Using controller-runtime client
  b. Caching namespaces and webhook configuration at the beggining
3. Parallelize InitMap and use a sync.Map for the data structure


I suggest we go for 2.a so we have the cache for free and it's already well tested.

Comment 5 Dan Kenigsberg 2021-12-28 12:12:36 UTC
Completely avoid deployment of KubeMacPool with

kubectl annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'networkaddonsconfigs.kubevirt.io/jsonpatch=[{"op": "replace","path": "/spec/kubeMacPool","value": null}]'

(notice the plurals form of networkaddonsconfigs)

Comment 6 Petr Horáček 2022-01-13 12:21:40 UTC
Targeting to 4.11. We want to take our time to properly design the solution.  The workaround is described above.

Comment 23 ctomasko 2022-07-25 02:25:10 UTC
@ralavi Please update the Doc Type and Doc Text fields.

Because this issue is now resolved, it is now longer a known issue. The documentation team will exclude the known issue from the 4.11 release notes.

On a large cluster, the OpenShift Virtualization MAC pool manager might take too much time to boot and OpenShift Virtualization might not become ready.

As a workaround, if you do not require MAC pooling functionality, then disable this sub-component by running the following command: `oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'networkaddonsconfigs.kubevirt.io/jsonpatch=[{"op": "replace","path": "/spec/kubeMacPool","value": null}]'`.

Comment 24 Ram Lavi 2022-07-25 06:17:48 UTC
@ctomc is removing the release note from BZ and setting the status to "If docs needed, set a value" good enough?

Comment 26 errata-xmlrpc 2022-09-14 19:28:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.11.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6526


Note You need to log in before you can comment on or make changes to this bug.