Bug 1854787
Summary: | [RHV] New machine stuck at 'Provisioned' phase | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
Component: | Cloud Compute | Assignee: | Roy Golan <rgolan> |
Cloud Compute sub component: | oVirt Provider | QA Contact: | Jan Zmeskal <jzmeskal> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aprajapa, asonmez, bjarolim, dmoessne, dvercill, eslutsky, fhirtz, gpulido, hpopal, jmalde, jortialc, jzmeskal, lleistne, lmartinh, michal.skrivanek, mleonard, mnunes, openshift-bugs-escalate, pelauter, ramon.gordillo, rgolan, rkshirsa, zhsun |
Version: | 4.4 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.5.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-08 10:54:03 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1817853 | ||
Bug Blocks: |
Comment 2
Roy Golan
2020-07-09 12:41:45 UTC
*** Bug 1849387 has been marked as a duplicate of this bug. *** Verification attempted with: openshift-install-linux-4.5.0-0.nightly-2020-07-29-051236 (The fix landed in 4.5.0-0.nightly-2020-07-28-182449) RHV 4.3.11.2-0.1.el7 I scaled up the existing worker MachineSet and waited for the new worker machine to get into Running state. I waited for almost hour and half but it got stuck in Provisioned state. See here: # oc get machine -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE primary-spfb8-master-0 Running 178m primary-spfb8-master-1 Running 178m primary-spfb8-master-2 Running 178m primary-spfb8-worker-0-9qgp8 Running 169m primary-spfb8-worker-0-b8s89 Provisioned 88m primary-spfb8-worker-0-ktqv2 Running 169m primary-spfb8-worker-0-nxr82 Running 169m It was not present among nodes either: # oc get nodes NAME STATUS ROLES AGE VERSION primary-spfb8-master-0 Ready master 177m v1.18.3+012b3ec primary-spfb8-master-1 Ready master 177m v1.18.3+012b3ec primary-spfb8-master-2 Ready master 177m v1.18.3+012b3ec primary-spfb8-worker-0-9qgp8 Ready worker 162m v1.18.3+012b3ec primary-spfb8-worker-0-ktqv2 Ready worker 151m v1.18.3+012b3ec primary-spfb8-worker-0-nxr82 Ready worker 161m v1.18.3+012b3ec This error can be seen in machine-controller container: # oc logs machine-api-controllers-5d75cbdb7d-4pm8j -c machine-controller ... E0729 12:06:46.244009 1 actuator.go:295] failed to lookup the VM IP lookup primary-spfb8-worker-0-b8s89 on 172.30.0.10:53: no such host - skip setting addresses for this machine E0729 12:06:46.244052 1 controller.go:286] Error updating machine "openshift-machine-api/primary-spfb8-worker-0-b8s89": lookup primary-spfb8-worker-0-b8s89 on 172.30.0.10:53: no such host {"level":"error","ts":1596024406.2441392,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"machine_controller","request":"openshift-machine-api/primary-spfb8-worker-0-b8s89","error":"lookup primary-spfb8-worker-0-b8s89 on 172.30.0.10:53: no such host","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/cluster-api-provider-ovirt/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} ... Couple of CSRs are pending (I checked there were none Pending after cluster deployment): # oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-8x46q 25m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-9h7ft 56m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-bb98m 41m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-d6plc 87m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-v2q29 10m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-vjjgw 72m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending I'll attach other log files as well. One additional information. The Provider State of the new worker Machine in webconsole is reboot_in_progress, but in reality the VM is not rebooting. reading previous comments it seems the workaround is in c#7 and c#8 giving this a try .... setting up new OCP 4.4.10 cluster: ~~~ # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.10 True False 36m Cluster version is 4.4.10 # # oc get nodes NAME STATUS ROLES AGE VERSION cluster-46vks-master-0 Ready master 25m v1.17.1+9d33dd3 cluster-46vks-master-1 Ready master 24m v1.17.1+9d33dd3 cluster-46vks-master-2 Ready master 24m v1.17.1+9d33dd3 cluster-46vks-worker-0-2bzzm Ready worker 7m41s v1.17.1+9d33dd3 cluster-46vks-worker-0-gbkdn Ready worker 9m31s v1.17.1+9d33dd3 cluster-46vks-worker-0-ks6dk Ready worker 12m v1.17.1+9d33dd3 cluster-46vks-worker-0-kt6jp Ready worker 11m v1.17.1+9d33dd3 # # oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE cluster-46vks-worker-0 4 4 26m # # oc get machine -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE cluster-46vks-master-0 Running 26m cluster-46vks-master-1 Running 26m cluster-46vks-master-2 Running 26m cluster-46vks-worker-0-2bzzm Provisioned 17m cluster-46vks-worker-0-gbkdn Provisioned 17m cluster-46vks-worker-0-ks6dk Provisioned 17m cluster-46vks-worker-0-kt6jp Provisioned 17m # ~~~ -> so, getting VM ID from RHV UI: ( Compute -> Virtual Machines -> click VM in question -> on the right see: VM ID:) --> this brings me to the following list: node/machine name | VM ID --------------------------------------------------------------------- cluster-46vks-worker-0-2bzzm | ee7488fb-ac4f-4bed-85c8-1d75b2cd3798 cluster-46vks-worker-0-gbkdn | ad826aae-2d39-401a-b95a-02dc14b902ea cluster-46vks-worker-0-ks6dk | 9dafee85-5cca-406b-8f73-fcd438fc67b1 cluster-46vks-worker-0-kt6jp | aa1b34ce-1c4c-44d6-8566-ae4521067fe1 --> those need to be in node and machine config with either possibility a) direct edit: ~~~ # oc edit node cluster-46vks-worker-0-2bzzm ... spec: providerID: ee7488fb-ac4f-4bed-85c8-1d75b2cd3798 status: addresses: ... node/cluster-46vks-worker-0-2bzzm edited # oc edit machine cluster-46vks-worker-0-2bzzm -n openshift-machine-api ... spec: metadata: creationTimestamp: null providerSpec: value: apiVersion: ovirtproviderconfig.machine.openshift.io/v1beta1 cluster_id: 587fa27d-0229-00d8-0323-000000000290 cpu: cores: 4 sockets: 1 threads: 1 credentialsSecret: name: ovirt-credentials id: ee7488fb-ac4f-4bed-85c8-1d75b2cd3798 kind: OvirtMachineProviderSpec memory_mb: 16348 ... machine.machine.openshift.io/cluster-46vks-worker-0-2bzzm edited # ~~~ b) using oc patch: ~~~ # oc patch node cluster-46vks-worker-0-gbkdn --type merge --patch '{"spec":{"providerID":"ad826aae-2d39-401a-b95a-02dc14b902ea"}}' # oc -n openshift-machine-api patch machine cluster-46vks-worker-0-gbkdn --type merge --patch '{"spec":{"providerSpec":{"value":{"id":"ad826aae-2d39-401a-b95a-02dc14b902ea"}}}}' ~~~ and when checking, the machines and CSRs are getting sorted: ~~~ # oc get machine -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE cluster-46vks-master-0 Running 41m cluster-46vks-master-1 Running 41m cluster-46vks-master-2 Running 41m cluster-46vks-worker-0-2bzzm Running 32m cluster-46vks-worker-0-gbkdn Running 32m cluster-46vks-worker-0-ks6dk Provisioned 32m cluster-46vks-worker-0-kt6jp Provisioned 32m # oc get csr NAME AGE REQUESTOR CONDITION csr-5hmph 29m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-bcxhr 12m system:node:cluster-46vks-worker-0-ks6dk Pending csr-cvj6x 26m system:node:cluster-46vks-worker-0-kt6jp Pending csr-drsgg 22m system:node:cluster-46vks-worker-0-2bzzm Approved,Issued csr-dtfl8 27m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-ffzwq 40m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-gzg75 11m system:node:cluster-46vks-worker-0-kt6jp Pending csr-ljkbq 40m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-mmnwq 40m system:node:cluster-46vks-master-0 Approved,Issued csr-npkp6 22m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-nxh8k 39m system:node:cluster-46vks-master-1 Approved,Issued csr-qrhmh 24m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-qswsj 9m25s system:node:cluster-46vks-worker-0-gbkdn Pending csr-qxmks 27m system:node:cluster-46vks-worker-0-ks6dk Pending csr-r6wtx 24m system:node:cluster-46vks-worker-0-gbkdn Pending csr-wtt9z 39m system:node:cluster-46vks-master-2 Approved,Issued csr-xrb69 40m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued # ~~~ doing the rest: ~~~# oc patch node cluster-46vks-worker-0-ks6dk --type merge --patch '{"spec":{"providerID":"9dafee85-5cca-406b-8f73-fcd438fc67b1"}}' # oc patch node cluster-46vks-worker-0-kt6jp --type merge --patch '{"spec":{"providerID":"aa1b34ce-1c4c-44d6-8566-ae4521067fe1"}}' # oc -n openshift-machine-api patch machine cluster-46vks-worker-0-ks6dk --type merge --patch '{"spec":{"providerSpec":{"value":{"id":"9dafee85-5cca-406b-8f73-fcd438fc67b1"}}}}' # oc -n openshift-machine-api patch machine cluster-46vks-worker-0-kt6jp --type merge --patch '{"spec":{"providerSpec":{"value":{"id":"aa1b34ce-1c4c-44d6-8566-ae4521067fe1"}}}}' # oc get machine -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE cluster-46vks-master-0 Running 46m cluster-46vks-master-1 Running 46m cluster-46vks-master-2 Running 46m cluster-46vks-worker-0-2bzzm Running 36m cluster-46vks-worker-0-gbkdn Running 36m cluster-46vks-worker-0-ks6dk Running 36m cluster-46vks-worker-0-kt6jp Running 36m # oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE cluster-46vks-worker-0 4 4 4 4 46m # oc get csr NAME AGE REQUESTOR CONDITION csr-5hmph 33m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-5p28b 2m3s system:node:cluster-46vks-worker-0-ks6dk Approved,Issued csr-bcxhr 17m system:node:cluster-46vks-worker-0-ks6dk Pending csr-cvj6x 31m system:node:cluster-46vks-worker-0-kt6jp Pending csr-drsgg 27m system:node:cluster-46vks-worker-0-2bzzm Approved,Issued csr-dtfl8 32m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-ffzwq 45m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-gzg75 16m system:node:cluster-46vks-worker-0-kt6jp Pending csr-ljkbq 45m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-mmnwq 44m system:node:cluster-46vks-master-0 Approved,Issued csr-npkp6 27m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-nxh8k 44m system:node:cluster-46vks-master-1 Approved,Issued csr-qrhmh 29m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-qswsj 14m system:node:cluster-46vks-worker-0-gbkdn Approved,Issued csr-qxmks 32m system:node:cluster-46vks-worker-0-ks6dk Pending csr-r6wtx 29m system:node:cluster-46vks-worker-0-gbkdn Pending csr-wtt9z 44m system:node:cluster-46vks-master-2 Approved,Issued csr-wzncq 87s system:node:cluster-46vks-worker-0-kt6jp Approved,Issued csr-xrb69 45m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued # ~~~ waiting another ~10 minutes: ~~~ # oc get csr NAME AGE REQUESTOR CONDITION csr-5hmph 54m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-5p28b 22m system:node:cluster-46vks-worker-0-ks6dk Approved,Issued csr-bcxhr 38m system:node:cluster-46vks-worker-0-ks6dk Approved,Issued csr-cvj6x 52m system:node:cluster-46vks-worker-0-kt6jp Approved,Issued csr-drsgg 48m system:node:cluster-46vks-worker-0-2bzzm Approved,Issued csr-dtfl8 53m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-ffzwq 65m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-gzg75 37m system:node:cluster-46vks-worker-0-kt6jp Approved,Issued csr-ljkbq 65m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-mmnwq 65m system:node:cluster-46vks-master-0 Approved,Issued csr-npkp6 48m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-nxh8k 65m system:node:cluster-46vks-master-1 Approved,Issued csr-qrhmh 50m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-qswsj 35m system:node:cluster-46vks-worker-0-gbkdn Approved,Issued csr-qxmks 53m system:node:cluster-46vks-worker-0-ks6dk Approved,Issued csr-r6wtx 50m system:node:cluster-46vks-worker-0-gbkdn Approved,Issued csr-wtt9z 65m system:node:cluster-46vks-master-2 Approved,Issued csr-wzncq 22m system:node:cluster-46vks-worker-0-kt6jp Approved,Issued csr-xrb69 65m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued # ~~~ all seems to be sorted .. so this could be a WA until it is fixed i couldnt reproduce this issue with 4.6.0-0.nightly-2020-08-18-055142 tried: 1. running IPI installation with cluster with 3 masters and 2 workers . 2. manually scaled the machineset to 3 using oc command oc scale --replicas=3 machineset ovirt10-26k9v-worker-0 -n openshift-machine-api [root@eslutsky-proxy-vm ~]# ./oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE ovirt10-26k9v-worker-0 3 3 3 3 67m [root@eslutsky-proxy-vm ~]# ./oc get machine -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE ovirt10-26k9v-master-0 Running 67m ovirt10-26k9v-master-1 Running 67m ovirt10-26k9v-master-2 Running 67m ovirt10-26k9v-worker-0-bmndg Running 57m ovirt10-26k9v-worker-0-dbptj Running 13m ovirt10-26k9v-worker-0-ghppj Running 57m [root@eslutsky-proxy-vm ~]# ./oc get nodes NAME STATUS ROLES AGE VERSION ovirt10-26k9v-master-0 Ready master 65m v1.19.0-rc.2+99cb93a-dirty ovirt10-26k9v-master-1 Ready master 65m v1.19.0-rc.2+99cb93a-dirty ovirt10-26k9v-master-2 Ready master 65m v1.19.0-rc.2+99cb93a-dirty ovirt10-26k9v-worker-0-bmndg Ready worker 50m v1.19.0-rc.2+99cb93a-dirty ovirt10-26k9v-worker-0-dbptj Ready worker 5m46s v1.19.0-rc.2+99cb93a-dirty ovirt10-26k9v-worker-0-ghppj Ready worker 39m v1.19.0-rc.2+99cb93a-dirty Hi Evgeny, one more thing comes to mind: Try scaling the existing worker MachineSet to 0 and then back to 3 the issue reproduced when tried scaling up again to 4 workers, but this time RHV was unable to spawn this extra worker (out of resources in rhv events: Failed to run VM ovirt10-26k9v-worker-0-6mzq8 due to a failed validation: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host es-host-01 did not satisfy internal filter Memory because its available memory is too low (10415 MB) to run the VM., The host es-host-01 did not satisfy internal filter Memory because its available memory is too low (10415 MB) to run the VM.] (User: admin@internal-authz). [root@eslutsky-proxy-vm ~]# ./oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ovirt10-26k9v-master-0 Running 83m openshift-machine-api ovirt10-26k9v-master-1 Running 83m openshift-machine-api ovirt10-26k9v-master-2 Running 83m openshift-machine-api ovirt10-26k9v-worker-0-6mzq8 Provisioned 6m57s openshift-machine-api ovirt10-26k9v-worker-0-bmndg Running 73m openshift-machine-api ovirt10-26k9v-worker-0-dbptj Running 29m openshift-machine-api ovirt10-26k9v-worker-0-ghppj Running 73m ./oc describe machine/ovirt10-26k9v-worker-0-6mzq8 -n openshift-machine-api Spec: Metadata: Provider ID: ac81b8df-ce0b-4e74-a393-4bb4d83607b6 Provider Spec: Value: API Version: ovirtproviderconfig.machine.openshift.io/v1beta1 cluster_id: 2b76bbe8-38c3-49a5-b5f3-23b4dd1f4326 Cpu: Cores: 4 Sockets: 1 Threads: 1 Credentials Secret: Name: ovirt-credentials Id: Kind: OvirtMachineProviderSpec memory_mb: 16348 Metadata: Creation Timestamp: <nil> Name: os_disk: size_gb: 120 template_name: ovirt10-26k9v-rhcos Type: server User Data Secret: Name: worker-user-data-managed Status: Last Updated: 2020-08-18T09:59:12Z Phase: Provisioned Provider Status: Conditions: Last Probe Time: 2020-08-18T09:59:12Z Last Transition Time: 2020-08-18T09:59:12Z Message: Machine successfully created Reason: MachineCreateSucceeded Status: True Type: MachineCreated Instance Id: ac81b8df-ce0b-4e74-a393-4bb4d83607b6 Instance State: down Metadata: Creation Timestamp: <nil> can you please confirm if the workers are Starting up in the RHV engine ? I think that doesn't count as a reproduction, since it's expected to run into problems when there aren't enough compute resources. I can confirm that when I hit this issue, the VMs were successfully started. (In reply to Jan Zmeskal from comment #25) > Hi Evgeny, one more thing comes to mind: Try scaling the existing worker > MachineSet to 0 and then back to 3 when scaling to 0, the last worker didn't deleted and stuck at `Deleting' Phase. ./oc scale --replicas=0 machineset ovirt10-26k9v-worker-0 -n openshift-machine-api [root@eslutsky-proxy-vm ~]# ./oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ovirt10-26k9v-master-0 Running 128m openshift-machine-api ovirt10-26k9v-master-1 Running 128m openshift-machine-api ovirt10-26k9v-master-2 Running 128m openshift-machine-api ovirt10-26k9v-worker-0-dbptj Deleting 74m after reproducing this issue in the QE rhv-4.3 env, with latest ocp (4.6.0-0.nightly-2020-08-18-055142), it appears to be cause because of failed dns lookup attempt : # oc logs machine-api-controllers-5bc7996949-wszvn -c machine-controller I0818 13:49:18.142340 1 machineservice.go:270] Got VM by ID: primary-5tk29-worker-0-ml2v5 E0818 13:49:18.151995 1 actuator.go:295] failed to lookup the VM IP lookup primary-5tk29-worker-0-ml2v5 on 172.30.0.10:53: no such host - skip setting addresses for this machine E0818 13:49:18.152023 1 controller.go:286] Error updating machine "openshift-machine-api/primary-5tk29-worker-0-ml2v5": lookup primary-5tk29-worker-0-ml2v5 on 172.30.0.10:53: no such host {"level":"error","ts":1597758558.15205,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"machine_controller","request":"openshift-machine-api/primary-5tk29-worker-0-ml2v5","error":"lookup primary-5tk29-worker-0-ml2v5 on 172.30.0.10:53: no such host","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/cluster-api-provider-ovirt/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} so the IP never get reported back to the machine object : oc get machine -o=custom-columns="name:.metadata.name,Status:status.phase,Address:.status.addresses[1].address" name Status Address primary-5tk29-master-0 Running 10.35.71.95 primary-5tk29-master-1 Running 10.35.71.94 primary-5tk29-master-2 Running 10.35.71.232 primary-5tk29-worker-0-86mmj Running 10.35.71.98 primary-5tk29-worker-0-ml2v5 Provisioned <none> primary-5tk29-worker-0-nm9lt Running 10.35.71.99 primary-5tk29-worker-0-pv48v Running 10.35.71.97 the issue resolved after removing the failed pod: oc delete pod/machine-api-controllers-5bc7996949-wszvn (In reply to Jan Zmeskal from comment #13) > Verification attempted with: > openshift-install-linux-4.5.0-0.nightly-2020-07-29-051236 (The fix landed in > 4.5.0-0.nightly-2020-07-28-182449) > RHV 4.3.11.2-0.1.el7 > > I scaled up the existing worker MachineSet and waited for the new worker > machine to get into Running state. I waited for almost hour and half but it > got stuck in Provisioned state. See here: Can you please update and confirm if you have verified the IPI part of the problem, and only see an issue for scaling up? If so, please verify this bug and work with evgeni to open a new one tracking scale up issues in 4.5+ this merged for 4.6 only , we need to cherry-pick to 4.5 this bug has TR 4.5.z, it's either a wrong bug or it should not be MODIFIED then yes sorry didnt notice Verified using the same method as described here: https://bugzilla.redhat.com/show_bug.cgi?id=1817853#c32 Using: openshift-install 4.5.0-0.nightly-2020-08-29-080432 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.8 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3510 |