+++ This bug was initially created as a clone of Bug #1815435 +++ Description of problem: Scale up a machineset on RHV, new machine is stuck at 'Provisioned' phase and can not become running. Version-Release number of selected component (if applicable): 4.4.0-0.nightly-2020-03-18-102708 How reproducible: Always Steps to Reproduce: 1. On an IPI on RHV deployment, clone(create) a machineset, with replicas=1 2. Machine phase can not become 'Running', it's stuck at 'Provisioned' state. In the yaml output of the machine, it has a status.addresses with type=internalDNS, that is the node being provisioned. 3. List cluster nodes, found that node is already added to cluster with 'Ready' status before machine becomes 'Running'. 4. oc get csr, found that the csr is still pending, however its node already becomes 'Ready'. oc get csr NAME AGE REQUESTOR CONDITION csr-2dblb 23m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-l5495 8m21s system:node:machineset-clone-25436-dz8zk Pending csr-xcv9q 23m system:node:machineset-clone-25436-dz8zk Pending machine-approver log: I0320 08:34:06.591051 1 main.go:146] CSR csr-xcv9q added I0320 08:34:06.643897 1 csr_check.go:418] retrieving serving cert from machineset-clone-25436-dz8zk (10.37.142.239:10250) W0320 08:34:06.648505 1 csr_check.go:178] Failed to retrieve current serving cert: remote error: tls: internal error I0320 08:34:06.649046 1 csr_check.go:183] Falling back to machine-api authorization for machineset-clone-25436-dz8zk I0320 08:34:06.649116 1 main.go:181] CSR csr-xcv9q not authorized: No target machine for node "machineset-clone-25436-dz8zk" I0320 08:34:06.649282 1 main.go:217] Error syncing csr csr-xcv9q: No target machine for node "machineset-clone-25436-dz8zk" Actual results: After step 2 # oc get machine machineset-clone-25436-dz8zk -o yaml -n openshift-machine-api ``` apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: annotations: instance-status: | {"kind":"Machine","apiVersion":"machine.openshift.io/v1beta1","metadata":{"name":"machineset-clone-25436-dz8zk","generateName":"machineset-clone-25436-","namespace":"openshift-machine-api","selfLink":"/apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/machineset-clone-25436-dz8zk","uid":"268636af-2605-4940-808b-4c93af9f4439","resourceVersion":"1696820","generation":1,"creationTimestamp":"2020-03-20T08:24:00Z","labels":{"machine.openshift.io/cluster-api-cluster":"six-g7gng","machine.openshift.io/cluster-api-machine-role":"worker","machine.openshift.io/cluster-api-machine-type":"worker","machine.openshift.io/cluster-api-machineset":"machineset-clone-25436"},"annotations":{"VmId":"7c32b1e1-2d64-4d8f-a2bc-b8ebd3effa93","instance-status":""},"ownerReferences":[{"apiVersion":"machine.openshift.io/v1beta1","kind":"MachineSet","name":"machineset-clone-25436","uid":"30021136-c1ad-4a3b-afd5-b80aec5454fb","controller":true,"blockOwnerDeletion":true}],"finalizers":["machine.machine.openshift.io"]},"spec":{"metadata":{"creationTimestamp":null},"providerSpec":{"value":{"apiVersion":"ovirtproviderconfig.openshift.io/v1beta1","cluster_id":"7321c6b7-9abb-4de6-affa-b4ce9b4124a0","credentialsSecret":{"name":"ovirt-credentials"},"id":"","kind":"OvirtMachineProviderSpec","metadata":{"creationTimestamp":null},"name":"","template_name":"six-g7gng-rhcos","userDataSecret":{"name":"worker-user-data"}}},"providerID":"7c32b1e1-2d64-4d8f-a2bc-b8ebd3effa93"},"status":{"lastUpdated":"2020-03-20T08:40:49Z","providerStatus":{"metadata":{"creationTimestamp":null},"instanceId":"machineset-clone-25436-dz8zk","instanceState":"up","conditions":[{"type":"MachineCreated","status":"True","lastProbeTime":"2020-03-20T08:40:49Z","lastTransitionTime":"2020-03-20T08:40:49Z","reason":"machineCreationSucceedReason","message":"machineCreationSucceedMessage"}]},"addresses":[{"type":"InternalDNS","address":"machineset-clone-25436-dz8zk"},{"type":"InternalIP","address":"10.37.142.239"}],"phase":"Provisioned"}} creationTimestamp: "2020-03-20T08:24:00Z" finalizers: - machine.machine.openshift.io generateName: machineset-clone-25436- generation: 1 labels: machine.openshift.io/cluster-api-cluster: six-g7gng machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: machineset-clone-25436 name: machineset-clone-25436-dz8zk namespace: openshift-machine-api ownerReferences: - apiVersion: machine.openshift.io/v1beta1 blockOwnerDeletion: true controller: true kind: MachineSet name: machineset-clone-25436 uid: 30021136-c1ad-4a3b-afd5-b80aec5454fb resourceVersion: "1696850" selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/machineset-clone-25436-dz8zk uid: 268636af-2605-4940-808b-4c93af9f4439 spec: metadata: creationTimestamp: null providerSpec: value: apiVersion: ovirtproviderconfig.openshift.io/v1beta1 cluster_id: 7321c6b7-9abb-4de6-affa-b4ce9b4124a0 credentialsSecret: name: ovirt-credentials id: "" kind: OvirtMachineProviderSpec metadata: creationTimestamp: null name: "" template_name: six-g7gng-rhcos userDataSecret: name: worker-user-data status: addresses: - address: machineset-clone-25436-dz8zk type: InternalDNS - address: 10.37.142.239 type: InternalIP lastUpdated: "2020-03-20T08:40:49Z" phase: Provisioned providerStatus: conditions: - lastProbeTime: "2020-03-20T08:40:49Z" lastTransitionTime: "2020-03-20T08:40:49Z" message: machineCreationSucceedMessage reason: machineCreationSucceedReason status: "True" type: MachineCreated instanceId: machineset-clone-25436-dz8zk instanceState: up metadata: creationTimestamp: null ``` After step 3: # oc get node machineset-clone-25436-dz8zk -o yaml apiVersion: v1 kind: Node metadata: annotations: machineconfiguration.openshift.io/currentConfig: rendered-worker-3a7892df1405153af5843ced2eb2124f machineconfiguration.openshift.io/desiredConfig: rendered-worker-3a7892df1405153af5843ced2eb2124f machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: "true" creationTimestamp: "2020-03-20T08:28:36Z" labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/os: linux kubernetes.io/arch: amd64 kubernetes.io/hostname: machineset-clone-25436-dz8zk kubernetes.io/os: linux node-role.kubernetes.io/worker: "" node.openshift.io/os_id: rhcos name: machineset-clone-25436-dz8zk resourceVersion: "1700847" selfLink: /api/v1/nodes/machineset-clone-25436-dz8zk uid: 8191284b-25fd-44a7-8ecd-1d7839c94e0c spec: {} status: addresses: - address: 10.37.142.239 type: InternalIP - address: machineset-clone-25436-dz8zk type: Hostname allocatable: cpu: 3500m ephemeral-storage: "13879457768" hugepages-2Mi: "0" memory: 15258944Ki pods: "250" capacity: cpu: "4" ephemeral-storage: 16225260Ki hugepages-2Mi: "0" memory: 16409920Ki pods: "250" conditions: - lastHeartbeatTime: "2020-03-20T08:46:18Z" lastTransitionTime: "2020-03-20T08:28:36Z" message: kubelet has sufficient memory available reason: KubeletHasSufficientMemory status: "False" type: MemoryPressure - lastHeartbeatTime: "2020-03-20T08:46:18Z" lastTransitionTime: "2020-03-20T08:28:36Z" message: kubelet has no disk pressure reason: KubeletHasNoDiskPressure status: "False" type: DiskPressure - lastHeartbeatTime: "2020-03-20T08:46:18Z" lastTransitionTime: "2020-03-20T08:28:36Z" message: kubelet has sufficient PID available reason: KubeletHasSufficientPID status: "False" type: PIDPressure - lastHeartbeatTime: "2020-03-20T08:46:18Z" lastTransitionTime: "2020-03-20T08:30:27Z" message: kubelet is posting ready status reason: KubeletReady status: "True" type: Ready daemonEndpoints: kubeletEndpoint: Port: 10250 images: - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:683de47331b97c6b9fda23e5e4d30ca6ed204cb1af255d92fee0609beb35e4e8 - <none>:<none> sizeBytes: 819364305 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7a2d829994da699c14da9ec713c697ad81ae4667c6f33b3842246fb644ffb5a2 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 467168186 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1be47db5f7ade2960581367a23c2574b774b0f60a5672bdafd8589b1c87779f6 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 443387144 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:108cef723b7ea6e1322e7addc6d8ede14200b4801d2f6acb1d4c2c446376421c - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 429699040 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:feab9b86f8829fc5e98a5e04ee5dfd2169e165fdefed183d544f56829409dd32 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 364316978 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d71d621566014b622fcc830e390d77fd831d28cf51770309f50aa1a541aec3ee - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 342462123 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f2a1a49204ec502c0c567688faa921f1165d8b130e1360c8fe2916932a6ad7c8 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 337457196 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3b9c1bd3aea24770c1a6daf3448d84d46f3e76433041d1ba21132c240eb49ef9 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 325417773 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4d1150eeb6e587161b278595f90580e79f21479a3fed95a56b7f9b3165e2bb58 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 318520021 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:480153ace2b8733a3cfc1c3f521f972b605efd1ea89ccd5786bfd9d7667af2d2 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 317938245 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1f0de2e2f25da5df864b7ccf993d84a4032c912ad6d8bf4fccd62d3ee6877a8b - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 312586096 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8b1ef25de733758ec14635f4e4a4951cd0d11cfbe031ae3fb030bada9db4abc2 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 305426192 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6662962a1e3cf0867e520206bf679609d59a9ec3b6c257cd9eeb4bf331323bda - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 283277682 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eea59d098d0a88657c3e268adc6c117a4050a377ec373d4806e21e2b6cb39dbb - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 267495412 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5782342a322303a684ac6de9bd05f931b7422e866285a8ba869c2ccaed209914 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 264047880 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dc45c8a26642cee797c6daa32ff10f9059192bc7c0bbab3ac955a6c53935e21d - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 255672168 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b31063f8cb53a137f515394838dd4ede503bcf94125e64c2e87401b5243bd067 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 250871136 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a79b8c47f97205626b95fba4d54f356c4f8f30491becd33155582744530aba2 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 237179055 nodeInfo: architecture: amd64 bootID: 3eed5f48-772d-4f41-8e17-e519ae2eef80 containerRuntimeVersion: cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8 kernelVersion: 4.18.0-147.5.1.el8_1.x86_64 kubeProxyVersion: v1.17.1 kubeletVersion: v1.17.1 machineID: 7c32b1e12d644d8fa2bcb8ebd3effa93 operatingSystem: linux osImage: Red Hat Enterprise Linux CoreOS 44.81.202003180730-0 (Ootpa) systemUUID: 7c32b1e1-2d64-4d8f-a2bc-b8ebd3effa93 Expected results: Additional info: --- Additional comment from Jianwei Hou on 2020-03-20 17:29:54 CST --- --- Additional comment from Jianwei Hou on 2020-03-20 17:42:56 CST --- On deleting the machineset and machine, node can not be removed automatically. # oc get nodes NAME STATUS ROLES AGE VERSION machineset-clone-25436-dz8zk NotReady worker 70m v1.17.1 six-g7gng-master-0 Ready master 41h v1.17.1 six-g7gng-master-1 Ready master 41h v1.17.1 six-g7gng-master-2 NotReady master 41h v1.17.1 six-g7gng-worker-0-4lfbj Ready worker 41h v1.17.1 six-g7gng-worker-0-569xv Ready worker 41h v1.17.1
Test on 4.4.0-0.nightly-2020-03-26-041820, this problem doesn't exist. 1. On an IPI on RHV deployment, create a machineset, with replicas=1 2. Machine phase becomes 'Running',node becomes 'Ready'. 3. oc get csr, csrs were automatically approved $ oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE leopard-hw72r-worker-0 3 3 3 3 4d13h leopard-hw72r-worker-00 1 1 1 1 64m $ oc get machine NAME PHASE TYPE REGION ZONE AGE leopard-hw72r-master-0 Running 4d13h leopard-hw72r-master-1 Running 4d13h leopard-hw72r-master-2 Running 4d13h leopard-hw72r-worker-0-9snp8 Running 4d13h leopard-hw72r-worker-0-dhflh Running 4d13h leopard-hw72r-worker-0-wr85v Running 4d13h leopard-hw72r-worker-00-nqqzv Running 64m $ oc get node NAME STATUS ROLES AGE VERSION leopard-hw72r-master-0 Ready master 4d13h v1.17.1 leopard-hw72r-master-1 Ready master 4d13h v1.17.1 leopard-hw72r-master-2 Ready master 4d13h v1.17.1 leopard-hw72r-worker-0-9snp8 Ready worker 4d13h v1.17.1 leopard-hw72r-worker-0-dhflh Ready worker 4d13h v1.17.1 leopard-hw72r-worker-0-wr85v Ready worker 4d13h v1.17.1 leopard-hw72r-worker-00-nqqzv Ready worker 60m v1.17.1 $ oc get csr NAME AGE REQUESTOR CONDITION csr-2b2b7 60m system:node:leopard-hw72r-worker-00-nqqzv Approved,Issued csr-ghrgb 60m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
Reproduced this bug clusterversion: 4.4.0-0.nightly-2020-04-04-025830 Create a new machineset with replic=3, 1 machine joined the cluster and node status becomes Ready. 2 machines stuck at Provisioned status. "oc describe machine" show machine "instanceState: down" # oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE secondary-xgbq9-worker-0 3 3 3 3 18h secondary-xgbq9-worker-00 3 3 1 1 26m # oc get machine NAME PHASE TYPE REGION ZONE AGE secondary-xgbq9-master-0 Running 18h secondary-xgbq9-master-1 Running 18h secondary-xgbq9-master-2 Running 18h secondary-xgbq9-worker-0-hw22t Running 18h secondary-xgbq9-worker-0-lcrzj Running 18h secondary-xgbq9-worker-0-spwkj Running 18h secondary-xgbq9-worker-00-8x7hq Running 26m secondary-xgbq9-worker-00-g89d6 Provisioned 26m secondary-xgbq9-worker-00-q8nqs Provisioned 26m # oc get node NAME STATUS ROLES AGE VERSION secondary-xgbq9-master-0 Ready master 18h v1.17.1 secondary-xgbq9-master-1 Ready master 18h v1.17.1 secondary-xgbq9-master-2 Ready master 18h v1.17.1 secondary-xgbq9-worker-0-hw22t Ready worker 18h v1.17.1 secondary-xgbq9-worker-0-lcrzj Ready worker 18h v1.17.1 secondary-xgbq9-worker-0-spwkj Ready worker 18h v1.17.1 secondary-xgbq9-worker-00-8x7hq Ready worker 24m v1.17.1 # oc get machine secondary-xgbq9-worker-00-g89d6 -o yaml apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: annotations: instance-status: | {"kind":"Machine","apiVersion":"machine.openshift.io/v1beta1","metadata":{"name":"secondary-xgbq9-worker-00-g89d6","generateName":"secondary-xgbq9-worker-00-","namespace":"openshift-machine-api","selfLink":"/apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/secondary-xgbq9-worker-00-g89d6","uid":"fc95124b-6cc1-4716-a234-f025b7daafe7","resourceVersion":"1560225","generation":1,"creationTimestamp":"2020-04-10T09:10:54Z","labels":{"machine.openshift.io/cluster-api-cluster":"secondary-xgbq9","machine.openshift.io/cluster-api-machine-role":"worker","machine.openshift.io/cluster-api-machine-type":"worker","machine.openshift.io/cluster-api-machineset":"secondary-xgbq9-worker-00"},"annotations":{"VmId":"dd35df17-8766-47c3-b4c3-94579ea8d4c4","instance-status":""},"ownerReferences":[{"apiVersion":"machine.openshift.io/v1beta1","kind":"MachineSet","name":"secondary-xgbq9-worker-00","uid":"ad0d6173-9b12-4c4f-8df7-c0501d365396","controller":true,"blockOwnerDeletion":true}],"finalizers":["machine.machine.openshift.io"]},"spec":{"metadata":{"creationTimestamp":null},"providerSpec":{"value":{"apiVersion":"ovirtproviderconfig.openshift.io/v1beta1","cluster_id":"81602ae2-79a1-11ea-a904-00163e563201","credentialsSecret":{"name":"ovirt-credentials"},"id":"","kind":"OvirtMachineProviderSpec","metadata":{"creationTimestamp":null},"name":"","template_name":"rhcos_custom","userDataSecret":{"name":"worker-user-data"}}},"providerID":"dd35df17-8766-47c3-b4c3-94579ea8d4c4"},"status":{"lastUpdated":"2020-04-10T09:36:33Z","providerStatus":{"metadata":{"creationTimestamp":null},"instanceId":"secondary-xgbq9-worker-00-g89d6","instanceState":"down","conditions":[{"type":"MachineCreated","status":"True","lastProbeTime":"2020-04-10T09:36:33Z","lastTransitionTime":"2020-04-10T09:36:33Z","reason":"machineCreationSucceedReason","message":"machineCreationSucceedMessage"}]},"addresses":[{"type":"InternalDNS","address":"secondary-xgbq9-worker-00-g89d6"}],"phase":"Provisioned"}} creationTimestamp: "2020-04-10T09:10:54Z" finalizers: - machine.machine.openshift.io generateName: secondary-xgbq9-worker-00- generation: 1 labels: machine.openshift.io/cluster-api-cluster: secondary-xgbq9 machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: secondary-xgbq9-worker-00 name: secondary-xgbq9-worker-00-g89d6 namespace: openshift-machine-api ownerReferences: - apiVersion: machine.openshift.io/v1beta1 blockOwnerDeletion: true controller: true kind: MachineSet name: secondary-xgbq9-worker-00 uid: ad0d6173-9b12-4c4f-8df7-c0501d365396 resourceVersion: "1560252" selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/secondary-xgbq9-worker-00-g89d6 uid: fc95124b-6cc1-4716-a234-f025b7daafe7 spec: metadata: creationTimestamp: null providerSpec: value: apiVersion: ovirtproviderconfig.openshift.io/v1beta1 cluster_id: 81602ae2-79a1-11ea-a904-00163e563201 credentialsSecret: name: ovirt-credentials id: "" kind: OvirtMachineProviderSpec metadata: creationTimestamp: null name: "" template_name: rhcos_custom userDataSecret: name: worker-user-data status: addresses: - address: secondary-xgbq9-worker-00-g89d6 type: InternalDNS lastUpdated: "2020-04-10T09:36:33Z" phase: Provisioned providerStatus: conditions: - lastProbeTime: "2020-04-10T09:36:33Z" lastTransitionTime: "2020-04-10T09:36:33Z" message: machineCreationSucceedMessage reason: machineCreationSucceedReason status: "True" type: MachineCreated instanceId: secondary-xgbq9-worker-00-g89d6 instanceState: down metadata: creationTimestamp: null
Created attachment 1677766 [details] machine-controller.log
This is not a blocker for OCP 4.4.0 GA, target the earliest z-stream
(In reply to sunzhaohua from comment #3) > Reproduced this bug > clusterversion: 4.4.0-0.nightly-2020-04-04-025830 > > Create a new machineset with replic=3, 1 machine joined the cluster and node > status becomes Ready. 2 machines stuck at Provisioned status. "oc describe > machine" show machine "instanceState: down" > This means the VM is not running - can you make sure?
Hi Roy, We didn't have access to the RHV management console, so all the data we had were from the openshift side. I have downloaded the log Zhaohua attached and I see some message like this: ``` I0410 09:12:38.044200 1 machineservice.go:144] creating VM: secondary-xgbq9-worker-00-g89d6 E0410 09:12:58.857808 1 actuator.go:257] Machine error secondary-xgbq9-worker-00-g89d6: Error running oVirt VM: Fault reason is "Operation Failed". Fault detail is "[Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host dell-r640-008.dsal.lab.eng.tlv2.redhat.com did not satisfy internal filter Memory because its available memory is too low (9038 MB) to run the VM.]". HTTP response code is "409". HTTP response message is "409 Conflict". W0410 09:12:58.857839 1 controller.go:321] Failed to create machine "secondary-xgbq9-worker-00-g89d6": Error running oVirt VM: Fault reason is "Operation Failed". Fault detail is "[Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host dell-r640-008.dsal.lab.eng.tlv2.redhat.com did not satisfy internal filter Memory because its available memory is too low (9038 MB) to run the VM.]". HTTP response code is "409". HTTP response message is "409 Conflict". {"level":"error","ts":1586509978.8578815,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"machine_controller","request":"openshift-machine-api/secondary-xgbq9-worker-00-g89d6","error":"Error running oVirt VM: Fault reason is \"Operation Failed\". Fault detail is \"[Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host dell-r640-008.dsal.lab.eng.tlv2.redhat.com did not satisfy internal filter Memory because its available memory is too low (9038 MB) to run the VM.]\". HTTP response code is \"409\". HTTP response message is \"409 Conflict\".","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/cluster-api-provider-ovirt/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} ``` It seems the machine had error in creation and then stuck at 'Provisioning'.
Talk to your RHV admin, is seems the VM can't start because there is simply not enough memory. Look at the log: "did not satisfy internal filter Memory because its available memory is too low (9038 MB) to run the VM."
ive attached case that is showing this issue is still an issue: [root@bastion ~]# oc get nodes NAME STATUS ROLES AGE VERSION openshift-stage-wz4zh-master-0 Ready master 22h v1.17.1+912792b openshift-stage-wz4zh-master-1 Ready master 22h v1.17.1+912792b openshift-stage-wz4zh-master-2 Ready master 22h v1.17.1+912792b openshift-stage-wz4zh-worker-0-8w2xb Ready worker 18h v1.17.1+912792b openshift-stage-wz4zh-worker-0-ckzxm Ready worker 21h v1.17.1+912792b openshift-stage-wz4zh-worker-0-n8qtb Ready worker 21h v1.17.1+912792b [root@bastion ~]# oc -n openshift-machine-api get machine NAME PHASE TYPE REGION ZONE AGE openshift-stage-wz4zh-master-0 Running 22h openshift-stage-wz4zh-master-1 Running 22h openshift-stage-wz4zh-master-2 Running 22h openshift-stage-wz4zh-worker-0-8w2xb Provisioned 17h openshift-stage-wz4zh-worker-0-ckzxm Provisioned 21h openshift-stage-wz4zh-worker-0-n8qtb Provisioned 21h a must-gather may not be possible, but the cu was able to run *oc adm inspect* on namespaces, so please if the must-gather issue cannot be resolved, what other information would you like. I've also requested log-collector data on the RHVM and a worker node. thank-you, m
Created attachment 1699204 [details] must-gather from cluster
Created attachment 1699206 [details] namespaces included are machine-api and machine-config-operator
Hi I have similar issue on my OCP Env - OCP 4.4.11 on RHV 4.3.9.4-11.el7 [root@bastion ~]# oc -n openshift-machine-api get machine NAME PHASE TYPE REGION ZONE AGE ocprhv-g86sn-master-0 Running 6d23h ocprhv-g86sn-master-1 Running 6d23h ocprhv-g86sn-master-2 Running 6d23h ocprhv-g86sn-worker-0-8z6c4 Running 6d23h ocprhv-g86sn-worker-0-fqdf5 Provisioned 6d23h ocprhv-g86sn-worker-0-s68hx Provisioned 5d21h ocprhv-g86sn-worker-0-xdjlz Provisioned 6d23h Provider State is set to "Image locked" for Machine in provisioneed state I scaled-up Machine Set with one more worker, was OK, Node is ready but statut on machine side is provisioned Tell me is you need data from OCP/RHV Luc Pioro Account Solutions Architect
*** Bug 1840553 has been marked as a duplicate of this bug. ***
Verification steps: 1. Scale up the existing worker MachineSet -> new Machine ends up in Running state, Node is in Ready status, no pending CSRs 2. Scale down the existing worker MachineSet -> previously created Machine no longer listed among Machines and no longer listed among Nodes So all of this is fine. However, then I hit the issue again in quite a strange situation. I'm going to describe what exactly I did. 1. oc get machineset primary-dlzl6-worker-0 -o yaml --export > ms.yaml (exported the original MachineSet) 2. vim ms.yaml (edited the file so that replica count is 1, I changed nothing else) 3. oc apply -f ms.yaml As nothing except replica count was changed, the change was applied to the original MachineSet. It scaled down to 1 Machine without any issue. 4. In webconsole, I scaled the MachineSet back to 3 replicas in one step And here it comes. One of the newly created Machines got stuck in Provisioned state, was not listed among Nodes and there are two penging CSRs. According to oc get machine -o yaml, the instanceState is reboot_in_progress. However, when I log into the RHV manager I can see the machine is not rebooting. CLI output: http://pastebin.test.redhat.com/884795 Verification conducted with: OCP 4.6.0-0.nightly-2020-07-15-004428 RHV 4.3.11.2-0.1.el7
Failing QA based on comment 23
After a discussion with Roy we found out that the worker machines got into Running state eventually, it just took longer than expected. Therefore moving this BZ to VERIFIED and opening follow-up BZ with lower priority: https://bugzilla.redhat.com/show_bug.cgi?id=1858854
Failing in: openshift-4.6.0-0.nightly-2020-07-22-074636 I was playing with the scaling and not only I got in this issue scaling the cluster up, as one machine got stuck in a new state (Provisioning). The steps: 1. On project '' I scaled up the machineset '' from 3 to 4 # oc edit machineset secondary-xpvlm-worker-0 (Changed replicas= from 3 to 4) 2. Once finished, I scaled down to 1 3. Then I scaled up to 3 The results: # oc get machines NAME PHASE TYPE REGION ZONE AGE secondary-xpvlm-master-0 Running 3d secondary-xpvlm-master-1 Running 3d secondary-xpvlm-master-2 Running 3d secondary-xpvlm-worker-0-2bkts Provisioning 18h secondary-xpvlm-worker-0-55clj Provisioned 18h secondary-xpvlm-worker-0-zwbb6 Running 3d # oc get machinesets NAME DESIRED CURRENT READY AVAILABLE AGE secondary-xpvlm-worker-0 3 3 1 1 3d # oc logs machine-approver-6c6964ffb-kdr4w machine-approver-controller E0729 19:44:18.096102 1 reflector.go:380] github.com/openshift/cluster-machine-approver/main.go:239: Faile d to watch *v1beta1.CertificateSigningRequest: Get "https://127.0.0.1:6443/apis/certificates.k8s.io/v1beta1/cert ificatesigningrequests?allowWatchBookmarks=true&resourceVersion=2575390&timeoutSeconds=579&watch=true": dial tcp 127.0.0.1:6443: connect: connection refused E0729 19:44:18.096868 1 reflector.go:380] github.com/openshift/cluster-machine-approver/status.go:99: Failed to watch *v1.ClusterOperator: Get "https://127.0.0.1:6443/apis/config.openshift.io/v1/clusteroperators?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dmachine-approver&resourceVersion=2818758&timeoutSeconds=426&watch=true": dial tcp 127.0.0.1:6443: connect: connection refused E0729 19:44:23.432116 1 reflector.go:380] github.com/openshift/cluster-machine-approver/status.go:99: Failed to watch *v1.ClusterOperator: the server is currently unable to handle the request (get clusteroperators.config.openshift.io) I0729 19:47:29.895233 1 main.go:147] CSR csr-hjndh added I0729 19:47:29.962273 1 main.go:182] CSR csr-hjndh not authorized: failed to find machine for node seconda ry-xpvlm-worker-0-55clj I0729 19:47:29.962316 1 main.go:218] Error syncing csr csr-hjndh: failed to find machine for node secondar y-xpvlm-worker-0-55clj I0729 19:47:29.967613 1 main.go:147] CSR csr-hjndh added I0729 19:47:29.993158 1 main.go:182] CSR csr-hjndh not authorized: failed to find machine for node seconda ry-xpvlm-worker-0-55clj (...) It basically continues repeating itself until the csr get out of the expected time range and it starts dumping the following: (...) I0730 03:00:37.931211 1 main.go:147] CSR csr-6rcxd added I0730 03:00:37.950011 1 main.go:182] CSR csr-6rcxd not authorized: CSR csr-6rcxd creation time 2020-07-30 03:00:37 +0000 UTC not in range (2020-07-29 19:42:14 +0000 UTC, 2020-07-29 21:42:24 +0000 UTC) I0730 03:00:37.950152 1 main.go:218] Error syncing csr csr-6rcxd: CSR csr-6rcxd creation time 2020-07-30 03:00:37 +0000 UTC not in range (2020-07-29 19:42:14 +0000 UTC, 2020-07-29 21:42:24 +0000 UTC) (...) # oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-28n6h 16h kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-29lw7 16h kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-2nq66 8h kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-422tb 6h24m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-4298h 60m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-459kq 11h kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-49jvz 17h kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending (...) There are 73 of these # oc logs machine-api-controllers-675ccc84b7-4d5lw machine-controller I0729 19:40:45.376076 1 reflector.go:158] Listing and watching *v1beta1.Machine from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126 I0729 19:40:45.387496 1 controller.go:164] Reconciling Machine "secondary-xpvlm-master-2" I0729 19:40:45.387553 1 controller.go:376] Machine "secondary-xpvlm-master-2" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:40:45.417705 1 machineservice.go:262] Fetching VM by ID: cbbd193d-72e2-4d02-bc70-d2961b8cc953 I0729 19:40:45.536577 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-master-2 I0729 19:40:45.536630 1 controller.go:284] Reconciling machine "secondary-xpvlm-master-2" triggers idempotent update I0729 19:40:45.554546 1 machineservice.go:262] Fetching VM by ID: cbbd193d-72e2-4d02-bc70-d2961b8cc953 I0729 19:40:45.596294 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-master-2 I0729 19:40:45.602572 1 actuator.go:262] Updating machine resource I0729 19:40:45.673792 1 actuator.go:271] Updating machine status sub-resource I0729 19:40:45.691730 1 controller.go:164] Reconciling Machine "secondary-xpvlm-worker-0-zwbb6" I0729 19:40:45.691761 1 controller.go:376] Machine "secondary-xpvlm-worker-0-zwbb6" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:40:45.702461 1 machineservice.go:262] Fetching VM by ID: eb9c15d7-7f60-4576-820f-3becbcd1b2ba I0729 19:40:45.736967 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-worker-0-zwbb6 I0729 19:40:45.737045 1 controller.go:284] Reconciling machine "secondary-xpvlm-worker-0-zwbb6" triggers i dempotent update I0729 19:40:45.755392 1 machineservice.go:262] Fetching VM by ID: eb9c15d7-7f60-4576-820f-3becbcd1b2ba I0729 19:40:45.789264 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-worker-0-zwbb6 I0729 19:40:45.808221 1 actuator.go:262] Updating machine resource I0729 19:40:45.833615 1 actuator.go:271] Updating machine status sub-resource I0729 19:40:45.846750 1 controller.go:164] Reconciling Machine "secondary-xpvlm-master-0" I0729 19:40:45.846786 1 controller.go:376] Machine "secondary-xpvlm-master-0" in namespace "openshift-mach ine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:40:45.888556 1 machineservice.go:262] Fetching VM by ID: 88215f56-fefc-408d-ae80-9c6e3e780292 I0729 19:40:45.921894 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-master-0 I0729 19:40:45.921940 1 controller.go:284] Reconciling machine "secondary-xpvlm-master-0" triggers idempot ent update I0729 19:40:45.956470 1 machineservice.go:262] Fetching VM by ID: 88215f56-fefc-408d-ae80-9c6e3e780292 I0729 19:40:46.003732 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-master-0 I0729 19:40:46.007976 1 actuator.go:262] Updating machine resource I0729 19:40:46.019861 1 actuator.go:271] Updating machine status sub-resource I0729 19:40:46.029360 1 controller.go:164] Reconciling Machine "secondary-xpvlm-master-1" I0729 19:40:46.029392 1 controller.go:376] Machine "secondary-xpvlm-master-1" in namespace "openshift-mach ine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:40:46.039276 1 machineservice.go:262] Fetching VM by ID: a01683dc-4071-48cd-a40f-56987a663ca3 I0729 19:40:46.138830 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-master-1 I0729 19:40:46.138885 1 controller.go:284] Reconciling machine "secondary-xpvlm-master-1" triggers idempot ent update I0729 19:40:46.147479 1 machineservice.go:262] Fetching VM by ID: a01683dc-4071-48cd-a40f-56987a663ca3 I0729 19:40:46.182617 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-master-1 I0729 19:40:46.188361 1 actuator.go:262] Updating machine resource I0729 19:40:46.248731 1 actuator.go:271] Updating machine status sub-resource I0729 19:42:25.007316 1 controller.go:164] Reconciling Machine "secondary-xpvlm-worker-0-55clj" I0729 19:42:25.007371 1 controller.go:376] Machine "secondary-xpvlm-worker-0-55clj" in namespace "openshif t-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:42:25.026622 1 controller.go:164] Reconciling Machine "secondary-xpvlm-worker-0-2bkts" I0729 19:42:25.026653 1 controller.go:376] Machine "secondary-xpvlm-worker-0-2bkts" in namespace "openshif t-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:42:46.082403 1 machineservice.go:196] Extending the OS disk from 17179869184 to 128849018880 I0729 19:42:46.420546 1 machineservice.go:209] Waiting while extending the OS disk I0729 19:44:26.993616 1 actuator.go:262] Updating machine resource I0729 19:44:27.014082 1 actuator.go:271] Updating machine status sub-resource I0729 19:44:27.025247 1 controller.go:164] Reconciling Machine "secondary-xpvlm-worker-0-2bkts" I0729 19:44:27.025311 1 controller.go:376] Machine "secondary-xpvlm-worker-0-2bkts" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:44:27.087045 1 controller.go:428] Machine "secondary-xpvlm-worker-0-2bkts" going into phase "Provisioning" I0729 19:44:27.098363 1 controller.go:319] Reconciling machine object secondary-xpvlm-worker-0-2bkts triggers idempotent create. I0729 19:44:27.142084 1 machineservice.go:126] creating VM: secondary-xpvlm-worker-0-2bkts I0729 19:44:47.993861 1 machineservice.go:196] Extending the OS disk from 17179869184 to 128849018880 I0729 19:44:48.259039 1 machineservice.go:209] Waiting while extending the OS disk I0729 19:46:09.001274 1 actuator.go:262] Updating machine resource I0729 19:46:09.142437 1 actuator.go:271] Updating machine status sub-resource I0729 19:46:09.175710 1 controller.go:164] Reconciling Machine "secondary-xpvlm-worker-0-55clj" I0729 19:46:09.175784 1 controller.go:376] Machine "secondary-xpvlm-worker-0-55clj" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:46:09.191239 1 machineservice.go:262] Fetching VM by ID: a7a7e81e-098c-458f-a8fa-d1060b782b9f I0729 19:46:09.299097 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-worker-0-55clj I0729 19:46:09.299135 1 controller.go:284] Reconciling machine "secondary-xpvlm-worker-0-55clj" triggers idempotent update I0729 19:46:09.310142 1 machineservice.go:262] Fetching VM by ID: a7a7e81e-098c-458f-a8fa-d1060b782b9f I0729 19:46:09.342089 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-worker-0-55clj E0729 19:46:09.522784 1 actuator.go:295] failed to lookup the VM IP lookup secondary-xpvlm-worker-0-55clj on 172.30.0.10:53: no such host - skip setting addresses for this machine E0729 19:46:09.522841 1 controller.go:286] Error updating machine "openshift-machine-api/secondary-xpvlm-worker-0-55clj": lookup secondary-xpvlm-worker-0-55clj on 172.30.0.10:53: no such host {"level":"error","ts":1596051969.5229006,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"machine_controller","request":"openshift-machine-api/secondary-xpvlm-worker-0-55clj","error":"lookup secondary-xpvlm-worker-0-55clj on 172.30.0.10:53: no such host","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/cluster-api-provider-ovirt/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} I0729 19:46:10.523314 1 controller.go:164] Reconciling Machine "secondary-xpvlm-worker-0-2bkts" I0729 19:46:10.523382 1 controller.go:376] Machine "secondary-xpvlm-worker-0-2bkts" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0729 19:46:10.534959 1 machineservice.go:262] Fetching VM by ID: 8edcee4b-aa4e-4499-b8bb-044b07f8260e I0729 19:46:10.565728 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-worker-0-2bkts I0729 19:46:10.565778 1 controller.go:284] Reconciling machine "secondary-xpvlm-worker-0-2bkts" triggers idempotent update I0729 19:46:10.576903 1 machineservice.go:262] Fetching VM by ID: 8edcee4b-aa4e-4499-b8bb-044b07f8260e I0729 19:46:10.609187 1 machineservice.go:270] Got VM by ID: secondary-xpvlm-worker-0-2bkts E0729 19:46:10.773276 1 actuator.go:295] failed to lookup the VM IP lookup secondary-xpvlm-worker-0-2bkts on 172.30.0.10:53: no such host - skip setting addresses for this machine E0729 19:46:10.773321 1 controller.go:286] Error updating machine "openshift-machine-api/secondary-xpvlm-worker-0-2bkts": lookup secondary-xpvlm-worker-0-2bkts on 172.30.0.10:53: no such host {"level":"error","ts":1596051970.7733457,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"machine_controller","request":"openshift-machine-api/secondary-xpvlm-worker-0-2bkts","error":"lookup secondary-xpvlm-worker-0-2bkts on 172.30.0.10:53: no such host","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/cluster-api-provider-ovirt/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} (...) It basically continues repeating itself
(In reply to Guilherme Santos from comment #29) > Failing in: openshift-4.6.0-0.nightly-2020-07-22-074636 > > I was playing with the scaling and not only I got in this issue scaling the > cluster up, as one machine got stuck in a new state (Provisioning). Hi, please follow up with Evgeni or opening a new bug on that, and keep this one VERIFIED. This is not related to the originally reported issue on IPI installation
engine log is useless as it doesn't cover the period. Can you attach the right one? all the new logging added in that PR are not displayed by default, can you increase the log level please? The VM may be running but the guest not necessarily. RHV host logs and verifying that the guest actually runs would be helpful
Evgeny found out that the worker VM hasn't booted - most probably because we've exhausted our IP address pool. Let me run one more iteration of the test and if that goes smooth, I'll move to VERIFIED.
After some investigation, we found out that the issue described in comment #33 was caused by our DHCP infra setup, therefore unrelated to the OCP on RHV functionality. I performed OCP deployment 4 times, each time doing series of scale-downs and scale-ups and did not encounter the original issue. I believe this is big enough sample to verify the fix. Before this fix was delivered, I used to encounter the original issue in about 75 % of the cases of scale up, so I believe if it was still present, it would pop up during my verification. As the scale-up issue is not encountered any more, it seems to indicate that also the root cause of machines being stuck in Provisioned state during the cluster deploy has been removed. This is far more difficult to confirm 100 % though, as the issue reproduced only rarely during cluster deployment. If the issue during the cluster deployment should re-appear, I think it's safe to assume that it's root cause is different from this one.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
As mentioned in comment #39, this bug was CLOSED ERRATA. If the solution does not work for you, open a new bug report. Please don't reopen this bug as the problem you're seeing may have similar result but different root cause and deserve to be analyzed accordingly.