Description of problem:
While the upgrade is in process, the pods in the `openshift-vertical-pod-autoscaler` namespace, fail to get created with `no endpoints available for service "vpa-webhook"` messages. The vpa-admission-plugin-default pod itself fails to get created with the same error message causing a deadlock kind of situation.
Steps to Reproduce:
1. Install VPA operator.
2. Start an upgrade from OCP v4.6.x to v4.6.y
Observed the following messages while upgrading:
$ oc get pod -n openshift-vertical-pod-autoscaler
NAME READY STATUS RESTARTS AGE
pod/vertical-pod-autoscaler-operator-6c64cd877b-46rmd 1/1 Running 0 14h
pod/vpa-recommender-default-649f9f4479-jd4jx 1/1 Running 0 13h
pod/vpa-updater-default-59bf95f4db-bvwld 1/1 Running 0 13h
- As the svc vpa-webhook has its endpoints populated as vpa-admission-plugin-default pods IP, the 'no endpoints available for service "vpa-webhook"' was encountered.
[*] The description of replicaset for vpa-admission-plugin-default pod:
$ oc describe replicaset.apps/vpa-admission-plugin-default-7d4c654465
Controlled By: Deployment/vpa-admission-plugin-default
Replicas: 0 current / 1 desired
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 145m replicaset-controller Created pod: vpa-admission-plugin-default-7d4c654465-nq9ht
Warning FailedCreate 105s (x21 over 24m) replicaset-controller Error creating: Internal error occurred: failed calling webhook "vpa.k8s.io": Post "https://vpa-webhook.openshift-vertical-pod-autoscaler.svc:443/?timeout=10s": no endpoints available for service "vpa-webhook"
- The replicaset itself is failing to create the replica due to no endpoints available for vpa-webhook svc.
The pods should be created without these messages and upgrade should complete.
[ Workaround ] Deleting the mutatingwebhookconfigurations helps to overcome the issue:
$ oc delete mutatingwebhookconfigurations vpa-webhook-config
*** Bug 1909982 has been marked as a duplicate of this bug. ***
This should be working in 4.7 already. We have a 4.6 PR open to fix it which we're working to merge: