Bug 2117439
| Summary: | change controlplanemachineset machineType to other type trigger RollingUpdate cause cluster error | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Huali Liu <huliu> |
| Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
| Cloud Compute sub component: | Other Providers | QA Contact: | Huali Liu <huliu> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | jspeed |
| Version: | 4.12 | Flags: | huliu:
needinfo-
|
| Target Milestone: | --- | ||
| Target Release: | 4.12.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-17 19:54:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Huali Liu
2022-08-11 01:48:02 UTC
The must-gather didn't gather any logs so we will have to try to reproduce this to work out what's gone wrong there. I think maybe the drain fix hasn't made it to GCP yet so we need to make that update and test again. I've spent some time to try and reproduce this today and didn't hit the same issues. Looking at the cluster version you've shown, are you sure the cluster was installed correctly before you started testing? I wouldn't expect it to be showing errors like that in a correctly installed cluster. Can you try again with a fresh cluster and make sure that the cluster is installed and stable before you install the CPMS? Then we can make sure the symptoms are definitely a factor of the CPMS creating a new master machine Scratch my previous message, I've noticed the rollout has completed but the cluster is now stuck in the same circumstances. It seems there's an extraneous kube-controller-manager pod that should have been removed but, for some reason wasn't removed correctly. I suspect this is causing everything to block Ok, so after some further digging, the root cause here is that when new Machines are being created, they are not being added to the instance groups for the internal load balancer API, which means that the internal API (used by things such as KCM) has no backing endpoints and therefore, no KCM can run to schedule pods and the like. We need to work out why these instance groups aren't being updated when Machine API creates new control plane instances Instance groups aren't currently supported by Machine API on GCP so this is why this isn't working, we have https://issues.redhat.com/browse/OCPCLOUD-672 and https://issues.redhat.com/browse/OCPCLOUD-1562 in jira tracking the implementation here. Looking at the upstream code, they check if a Machine is a control plane machine and then, look up the instance group for the subnet and ensure the instance is registered. We could do this automatically in a similar way. Will need someone to backport the feature from upstream. On azure, CPMS RolingUpdate also cause cluster error. liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-azure12b-w6zxn-master-bvrvl-0 Running Standard_D4s_v3 eastus 2 155m huliu-azure12b-w6zxn-master-gq87p-2 Running Standard_D4s_v3 eastus 1 123m huliu-azure12b-w6zxn-master-qd8jg-1 Running Standard_D4s_v3 eastus 3 140m huliu-azure12b-w6zxn-worker-eastus1-tvxvr Running Standard_D4s_v3 eastus 1 4h huliu-azure12b-w6zxn-worker-eastus2-dtxrd Running Standard_D4s_v3 eastus 2 4h huliu-azure12b-w6zxn-worker-eastus3-j7qjw Running Standard_D4s_v3 eastus 3 4h liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-08-15-150248 True False 3h35m Error while reconciling 4.12.0-0.nightly-2022-08-15-150248: an unknown error has occurred: MultipleErrors liuhuali@Lius-MacBook-Pro huali-test % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-2022-08-15-150248 False False True 105m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.huliu-azure12b.qe.azure.devcluster.openshift.com/healthz": dial tcp 20.124.44.69:443: connect: connection refused... baremetal 4.12.0-0.nightly-2022-08-15-150248 True False False 4h cloud-controller-manager 4.12.0-0.nightly-2022-08-15-150248 True False False 4h3m cloud-credential 4.12.0-0.nightly-2022-08-15-150248 True False False 4h13m cluster-autoscaler 4.12.0-0.nightly-2022-08-15-150248 True False False 4h config-operator 4.12.0-0.nightly-2022-08-15-150248 True False False 4h2m console 4.12.0-0.nightly-2022-08-15-150248 False False False 105m RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.huliu-azure12b.qe.azure.devcluster.openshift.com): Get "https://console-openshift-console.apps.huliu-azure12b.qe.azure.devcluster.openshift.com": dial tcp 20.124.44.69:443: connect: connection refused control-plane-machine-set 0.0.1-snapshot True False False 170m csi-snapshot-controller 4.12.0-0.nightly-2022-08-15-150248 True False False 4h1m dns 4.12.0-0.nightly-2022-08-15-150248 True True False 4h DNS "default" reports Progressing=True: "Have 3 available node-resolver pods, want 6." etcd 4.12.0-0.nightly-2022-08-15-150248 True False False 3h51m image-registry 4.12.0-0.nightly-2022-08-15-150248 False True True 105m Available: The deployment does not have available replicas... ingress 4.12.0-0.nightly-2022-08-15-150248 False True True 105m The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.) insights 4.12.0-0.nightly-2022-08-15-150248 True False False 3h55m kube-apiserver 4.12.0-0.nightly-2022-08-15-150248 True False False 3h51m kube-controller-manager 4.12.0-0.nightly-2022-08-15-150248 True False True 3h52m GarbageCollectorDegraded: error querying alerts: Post "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query": dial tcp 172.30.97.149:9091: i/o timeout kube-scheduler 4.12.0-0.nightly-2022-08-15-150248 True False False 3h52m kube-storage-version-migrator 4.12.0-0.nightly-2022-08-15-150248 True False False 102m machine-api 4.12.0-0.nightly-2022-08-15-150248 True False False 3h47m machine-approver 4.12.0-0.nightly-2022-08-15-150248 True False False 4h1m machine-config 4.12.0-0.nightly-2022-08-15-150248 False False True 95m Cluster not available for [{operator 4.12.0-0.nightly-2022-08-15-150248}] marketplace 4.12.0-0.nightly-2022-08-15-150248 True False False 4h monitoring 4.12.0-0.nightly-2022-08-15-150248 False True True 88m Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. network 4.12.0-0.nightly-2022-08-15-150248 True True True 4h2m DaemonSet "/openshift-multus/multus" rollout is not making progress - last change 2022-08-16T04:00:18Z... node-tuning 4.12.0-0.nightly-2022-08-15-150248 True False False 4h openshift-apiserver 4.12.0-0.nightly-2022-08-15-150248 True False False 3h49m openshift-controller-manager 4.12.0-0.nightly-2022-08-15-150248 True False False 3h52m openshift-samples 4.12.0-0.nightly-2022-08-15-150248 True False False 3h49m operator-lifecycle-manager 4.12.0-0.nightly-2022-08-15-150248 True False False 4h1m operator-lifecycle-manager-catalog 4.12.0-0.nightly-2022-08-15-150248 True False False 4h1m operator-lifecycle-manager-packageserver 4.12.0-0.nightly-2022-08-15-150248 True False False 3h49m service-ca 4.12.0-0.nightly-2022-08-15-150248 True False False 4h2m storage 4.12.0-0.nightly-2022-08-15-150248 True True False 4h1m AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods... liuhuali@Lius-MacBook-Pro huali-test % On Azure, the installer does not set the `internalLoadBalancer` field on the Machine provider spec. This means that Azure then presents the same symptoms as GCP. However, unlike GCP, we do support this. The public load balancer field will look like `publicLoadBalancer: <cluster-id>`. If we copy this and add `-internal` to the end we can configure the `internalLoadBalancer: <cluster-id>-internal` and the rollout can proceed correctly. There are a few actions we need to take knowing this: - We should document when Azure users are setting up CPMS that they should add the internal load balancer to their existing machines, and the CPMS spec from there on - We should fix the installer to add this field by default - We _could_ prevent a CPMS from being installed if there is no internal load balancer, since we know this is required I think if we repeat the process on Azure with this PR https://github.com/openshift/installer/pull/6230 to the installer it should work. This should add the load balancer field that's missing, and, as long as we add that to the CPMS spec as well, it should be able to complete a control plane replacement Thanks @jspeed yes, on Azure, after adding `internalLoadBalancer: huliu-azure12c-6mpmd-internal` to the CPMS spec, CPMS RollingUpdate proceed correctly and didn't cause cluster error. liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-azure12c-6mpmd-master-84wqx-1 Running Standard_D4s_v3 eastus 3 48m huliu-azure12c-6mpmd-master-sqc6v-0 Running Standard_D4s_v3 eastus 2 65m huliu-azure12c-6mpmd-master-vkgsj-2 Running Standard_D4s_v3 eastus 1 33m huliu-azure12c-6mpmd-worker-eastus1-87778 Running Standard_D4s_v3 eastus 1 3h1m huliu-azure12c-6mpmd-worker-eastus2-28gjm Running Standard_D4s_v3 eastus 2 3h1m huliu-azure12c-6mpmd-worker-eastus3-xnlrx Running Standard_D4s_v3 eastus 3 3h1m liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-08-15-150248 True False 159m Cluster version is 4.12.0-0.nightly-2022-08-15-150248 liuhuali@Lius-MacBook-Pro huali-test % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-2022-08-15-150248 True False False 19m baremetal 4.12.0-0.nightly-2022-08-15-150248 True False False 3h cloud-controller-manager 4.12.0-0.nightly-2022-08-15-150248 True False False 3h2m cloud-credential 4.12.0-0.nightly-2022-08-15-150248 True False False 3h11m cluster-autoscaler 4.12.0-0.nightly-2022-08-15-150248 True False False 3h config-operator 4.12.0-0.nightly-2022-08-15-150248 True False False 3h1m console 4.12.0-0.nightly-2022-08-15-150248 True False False 50m control-plane-machine-set 0.0.1-snapshot True False False 137m csi-snapshot-controller 4.12.0-0.nightly-2022-08-15-150248 True False False 3h1m dns 4.12.0-0.nightly-2022-08-15-150248 True False False 3h etcd 4.12.0-0.nightly-2022-08-15-150248 True False False 179m image-registry 4.12.0-0.nightly-2022-08-15-150248 True False False 170m ingress 4.12.0-0.nightly-2022-08-15-150248 True False False 170m insights 4.12.0-0.nightly-2022-08-15-150248 True False False 175m kube-apiserver 4.12.0-0.nightly-2022-08-15-150248 True False False 163m kube-controller-manager 4.12.0-0.nightly-2022-08-15-150248 True False False 178m kube-scheduler 4.12.0-0.nightly-2022-08-15-150248 True False False 178m kube-storage-version-migrator 4.12.0-0.nightly-2022-08-15-150248 True False False 99m machine-api 4.12.0-0.nightly-2022-08-15-150248 True False False 167m machine-approver 4.12.0-0.nightly-2022-08-15-150248 True False False 3h machine-config 4.12.0-0.nightly-2022-08-15-150248 True False False 179m marketplace 4.12.0-0.nightly-2022-08-15-150248 True False False 3h monitoring 4.12.0-0.nightly-2022-08-15-150248 True False False 159m network 4.12.0-0.nightly-2022-08-15-150248 True False False 3h2m node-tuning 4.12.0-0.nightly-2022-08-15-150248 True False False 3h openshift-apiserver 4.12.0-0.nightly-2022-08-15-150248 True False False 50m openshift-controller-manager 4.12.0-0.nightly-2022-08-15-150248 True False False 171m openshift-samples 4.12.0-0.nightly-2022-08-15-150248 True False False 171m operator-lifecycle-manager 4.12.0-0.nightly-2022-08-15-150248 True False False 3h operator-lifecycle-manager-catalog 4.12.0-0.nightly-2022-08-15-150248 True False False 3h1m operator-lifecycle-manager-packageserver 4.12.0-0.nightly-2022-08-15-150248 True False False 174m service-ca 4.12.0-0.nightly-2022-08-15-150248 True False False 3h1m storage 4.12.0-0.nightly-2022-08-15-150248 True False False 3h1m liuhuali@Lius-MacBook-Pro huali-test % The plan forward here is to prevent GCP CPMS from being created until we can fix the load balancing issues in 4.13. Then for Azure, return an error if the user doesn't have the internal load balancer set, this should prompt the user to configure it. To prevent awkward rollouts on install, we will set up Machine API Azure to populate the load balancer where possible. Verified the issue before pr merge.
On GCP:
1. Create a new release image from the pull requests using Cluster Bot
build openshift/installer#6230,openshift/machine-api-provider-azure#31,openshift/cluster-control-plane-machine-set-operator#84
2. Install a cluster using the image build in previous step on GCP
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False 3h15m Cluster version is 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest
liuhuali@Lius-MacBook-Pro huali-test % oc get pod
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-operator-b6c8d658-9rs65 2/2 Running 0 3h36m
cluster-baremetal-operator-7c9cb8d8cb-qfqk2 2/2 Running 0 3h36m
control-plane-machine-set-operator-7fc8897c6b-m4tgh 1/1 Running 0 3h36m
machine-api-controllers-596c49fcd9-x5jzk 7/7 Running 0 3h34m
machine-api-operator-54d9869b57-t6ndz 2/2 Running 0 3h37m
machine-api-termination-handler-62pm9 1/1 Running 0 3h27m
machine-api-termination-handler-6pr4c 1/1 Running 0 3h27m
machine-api-termination-handler-m5bq4 1/1 Running 0 3h26m
liuhuali@Lius-MacBook-Pro huali-test %
3. Create a ControlPlaneMachineSet
liuhuali@Lius-MacBook-Pro huali-test % oc create -f controlpanemachineset-gcp.yaml
Error from server (spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value: Forbidden: automatic replacement of control plane machines on GCP is not currently supported): error when creating "controlpanemachineset-gcp.yaml": admission webhook "controlplanemachineset.machine.openshift.io" denied the request: spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value: Forbidden: automatic replacement of control plane machines on GCP is not currently supported
liuhuali@Lius-MacBook-Pro huali-test %
On Azure:
1. Create a new release image from the pull requests using Cluster Bot
build openshift/installer#6230,openshift/machine-api-provider-azure#31,openshift/cluster-control-plane-machine-set-operator#84
2. Install a cluster using the image build in previous step on Azure
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False 67m Cluster version is 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest
liuhuali@Lius-MacBook-Pro huali-test % oc get pod
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-operator-b6c8d658-spdhd 2/2 Running 1 (85m ago) 96m
cluster-baremetal-operator-7c9cb8d8cb-pgcgb 2/2 Running 0 96m
control-plane-machine-set-operator-7fc8897c6b-87jkf 1/1 Running 2 (84m ago) 96m
machine-api-controllers-586b447cdc-d56h4 7/7 Running 11 (84m ago) 89m
machine-api-operator-54d9869b57-gc928 2/2 Running 1 (85m ago) 96m
machine-api-termination-handler-6mm5m 1/1 Running 0 79m
machine-api-termination-handler-dbvpl 1/1 Running 0 72m
machine-api-termination-handler-thmvh 1/1 Running 0 74m
3. Check `internalLoadBalancer` field is set on the master machine provider spec by default
liuhuali@Lius-MacBook-Pro huali-test % oc get machine -o yaml |grep internalLoadBalancer
internalLoadBalancer: huliu-azure412pr-rqhjd-internal
internalLoadBalancer: huliu-azure412pr-rqhjd-internal
internalLoadBalancer: huliu-azure412pr-rqhjd-internal
liuhuali@Lius-MacBook-Pro huali-test %
4. Create a ControlPlaneMachineSet, same configuration with current master machine
liuhuali@Lius-MacBook-Pro huali-test % oc create -f controlpanemachineset-azure.yaml
controlplanemachineset.machine.openshift.io/cluster created
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME DESIRED CURRENT READY UPDATED UNAVAILABLE AGE
cluster 3 3 3 3 15s
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-azure412pr-rqhjd-master-0 Running Standard_D8s_v3 eastus 2 105m
huliu-azure412pr-rqhjd-master-1 Running Standard_D8s_v3 eastus 3 105m
huliu-azure412pr-rqhjd-master-2 Running Standard_D8s_v3 eastus 1 105m
huliu-azure412pr-rqhjd-worker-eastus1-bk56k Running Standard_D4s_v3 eastus 1 98m
huliu-azure412pr-rqhjd-worker-eastus2-8dv6h Running Standard_D4s_v3 eastus 2 98m
huliu-azure412pr-rqhjd-worker-eastus3-p68w2 Running Standard_D4s_v3 eastus 3 98m
5. Edit ControlPlaneMachineSet, change something to trigger RollingUpdate, RollingUpdate succeed, cluster is still healthy
liuhuali@Lius-MacBook-Pro huali-test % oc edit controlplanemachineset cluster
controlplanemachineset.machine.openshift.io/cluster edited
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-azure412pr-rqhjd-master-25txd-0 Running Standard_D4s_v3 eastus 3 44m
huliu-azure412pr-rqhjd-master-5vtbs-1 Running Standard_D4s_v3 eastus 3 60m
huliu-azure412pr-rqhjd-master-7bjrk-2 Running Standard_D4s_v3 eastus 3 25m
huliu-azure412pr-rqhjd-worker-eastus1-bk56k Running Standard_D4s_v3 eastus 1 3h52m
huliu-azure412pr-rqhjd-worker-eastus2-8dv6h Running Standard_D4s_v3 eastus 2 3h52m
huliu-azure412pr-rqhjd-worker-eastus3-p68w2 Running Standard_D4s_v3 eastus 3 3h52m
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False 3h30m Cluster version is 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest
liuhuali@Lius-MacBook-Pro huali-test % oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 15m
baremetal 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h52m
cloud-controller-manager 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h54m
cloud-credential 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h59m
cluster-autoscaler 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h52m
config-operator 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h53m
console 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 46m
control-plane-machine-set 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 52m
csi-snapshot-controller 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h40m
dns 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h40m
etcd 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h44m
image-registry 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h38m
ingress 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h39m
insights 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h46m
kube-apiserver 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h41m
kube-controller-manager 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h44m
kube-scheduler 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h44m
kube-storage-version-migrator 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 107m
machine-api 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h37m
machine-approver 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h52m
machine-config 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h51m
marketplace 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h52m
monitoring 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h35m
network 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h54m
node-tuning 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h40m
openshift-apiserver 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 117m
openshift-controller-manager 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h40m
openshift-samples 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h39m
operator-lifecycle-manager 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h52m
operator-lifecycle-manager-catalog 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h52m
operator-lifecycle-manager-packageserver 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h40m
service-ca 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h53m
storage 4.12.0-0.ci.test-2022-08-24-025537-ci-ln-xj7tqhb-latest True False False 3h52m
liuhuali@Lius-MacBook-Pro huali-test %
6. Edit ControlPlaneMachineSet, change `internalLoadBalancer` to an invalid value
liuhuali@Lius-MacBook-Pro huali-test % oc edit controlplanemachineset cluster
controlplanemachineset.machine.openshift.io/cluster edited
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-azure412pr-rqhjd-master-25txd-0 Running Standard_D4s_v3 eastus 3 48m
huliu-azure412pr-rqhjd-master-5vtbs-1 Running Standard_D4s_v3 eastus 3 65m
huliu-azure412pr-rqhjd-master-7bjrk-2 Running Standard_D4s_v3 eastus 3 30m
huliu-azure412pr-rqhjd-master-zvt22-0 Failed 5s
huliu-azure412pr-rqhjd-worker-eastus1-bk56k Running Standard_D4s_v3 eastus 1 3h56m
huliu-azure412pr-rqhjd-worker-eastus2-8dv6h Running Standard_D4s_v3 eastus 2 3h56m
huliu-azure412pr-rqhjd-worker-eastus3-p68w2 Running Standard_D4s_v3 eastus 3 3h56m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-azure412pr-rqhjd-master-zvt22-0 -o yaml
...
errorMessage: 'failed to reconcile machine "huliu-azure412pr-rqhjd-master-zvt22-0":
network.LoadBalancersClient#Get: Failure responding to request: StatusCode=404
-- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound"
Message="The Resource ''Microsoft.Network/loadBalancers/invalid'' under resource
group ''huliu-azure412pr-rqhjd-rg'' was not found. For more details please go
to https://aka.ms/ARMResourceNotFoundFix"'
errorReason: InvalidConfiguration
...
7. Edit ControlPlaneMachineSet, remove `internalLoadBalancer` field
liuhuali@Lius-MacBook-Pro huali-test % oc edit controlplanemachineset cluster
error: controlplanemachinesets.machine.openshift.io "cluster" could not be patched: admission webhook "controlplanemachineset.machine.openshift.io" denied the request: spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.internalLoadBalancer: Required value: internalLoadBalancer is required for control plane machines
You can run `oc replace -f /var/folders/yc/y9zy01jn3f51r9knbpsm_55r0000gn/T/oc-edit-889700000.yaml` to try this update again.
liuhuali@Lius-MacBook-Pro huali-test %
Already verified this before pr merge, refer Comment 11, move this to Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |