Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-06-26-215024 How reproducible: Always Steps to Reproduce: 1. Power off an vm from vsphere client 2. Delete from disk 3. Check machine phase Actual results: Machine is still in Running phase. $ oc get node NAME STATUS ROLES AGE VERSION zhsunvsphere629-z75kh-master-0 Ready master 130m v1.18.3+f291db1 zhsunvsphere629-z75kh-master-1 Ready master 130m v1.18.3+f291db1 zhsunvsphere629-z75kh-master-2 Ready master 130m v1.18.3+f291db1 zhsunvsphere629-z75kh-worker-gm4x9 Ready worker 80m v1.18.3+f291db1 zhsunvsphere629-z75kh-worker-p7xlp Ready worker 80m v1.18.3+f291db1 zhsunvsphere629-z75kh-worker1-vspkl NotReady worker 37m v1.18.3+f291db1 $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsunvsphere629-z75kh-master-0 Running 158m zhsunvsphere629-z75kh-master-0 vsphere://420bf774-c6d5-efeb-e0ad-23d35172b2ac poweredOn zhsunvsphere629-z75kh-master-1 Running 158m zhsunvsphere629-z75kh-master-1 vsphere://420baa76-6f3b-d0aa-3e33-96060c60cb89 poweredOn zhsunvsphere629-z75kh-master-2 Running 158m zhsunvsphere629-z75kh-master-2 vsphere://420ba873-ec4e-b7b2-867f-62349371f0b3 poweredOn zhsunvsphere629-z75kh-worker-gm4x9 Running 114m zhsunvsphere629-z75kh-worker-gm4x9 vsphere://420bff1c-9f5f-e217-2f7c-04bd9fd618f5 poweredOn zhsunvsphere629-z75kh-worker-p7xlp Running 114m zhsunvsphere629-z75kh-worker-p7xlp vsphere://420bdd2f-d235-57db-3fce-b55505111cb8 poweredOn zhsunvsphere629-z75kh-worker1-vspkl Running 67m zhsunvsphere629-z75kh-worker1-vspkl vsphere://420bba46-a730-2953-de19-67e0e78897b3 poweredOff Expected results: The machine phase is set "Failed" Additional info:
Hi, I can't reproduce this bug, all works as expected. Can you make sure that deleting VM from the disk machine controller reconciled machine? It should reconcile the machine about every 15 minutes.
4.5.0-0.nightly-2020-07-02-190154 Power off an vm from vsphere client, machine controller reconciled machine. Deleting VM from the disk machine controller didn't reconcile machine. I waited for about 30 mins. # oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsunvshpere73-gfbpr-master-0 Running 2d17h zhsunvshpere73-gfbpr-master-0 vsphere://422b8f34-078c-1798-b4f4-65580e1516d2 poweredOn zhsunvshpere73-gfbpr-master-1 Running 2d17h zhsunvshpere73-gfbpr-master-1 vsphere://422b24b4-ede7-0a8f-3cef-f21f6c1cb45f poweredOn zhsunvshpere73-gfbpr-master-2 Running 2d17h zhsunvshpere73-gfbpr-master-2 vsphere://422ba8b4-3d4f-875b-e08a-20c6e33167f6 poweredOn zhsunvshpere73-gfbpr-worker-dx9j8 Running 2d17h zhsunvshpere73-gfbpr-worker-dx9j8 vsphere://422b38f2-35c6-9a79-4339-0c3e750e4e0c poweredOn zhsunvshpere73-gfbpr-worker-gvn7j Running 2d17h zhsunvshpere73-gfbpr-worker-gvn7j vsphere://422bc235-ceeb-b3b1-a28a-bf3cb6310c1a poweredOff
Created attachment 1699980 [details] machine controller log
Planning to clarify on this during the next sprint.
Is it possible to check if this bug appears on 4.6? Can I get access to your test environent because I can't repoduce the bug. Machine fails when I delete the VM from disk. I0804 12:39:52.456421 1 reconciler.go:268] ademicev-6hc7q-worker-q7pzt: reconciling powerstate annotation I0804 12:39:52.457774 1 reconciler.go:708] ademicev-6hc7q-worker-q7pzt: Updating provider status I0804 12:39:52.464439 1 machine_scope.go:102] ademicev-6hc7q-worker-q7pzt: patching machine I0804 12:49:26.494885 1 controller.go:169] ademicev-6hc7q-worker-q7pzt: reconciling Machine I0804 12:49:26.494915 1 actuator.go:83] ademicev-6hc7q-worker-q7pzt: actuator checking if machine exists I0804 12:49:31.604472 1 session.go:113] Find template by instance uuid: 30d9461a-cae9-42d7-8369-859a421c6a3a I0804 12:49:31.623139 1 reconciler.go:175] ademicev-6hc7q-worker-q7pzt: does not exist I0804 12:49:31.623159 1 controller.go:424] ademicev-6hc7q-worker-q7pzt: going into phase "Failed" I0804 12:49:31.652301 1 controller.go:169] ademicev-6hc7q-worker-q7pzt: reconciling Machine I0804 12:49:31.652330 1 actuator.go:83] ademicev-6hc7q-worker-q7pzt: actuator checking if machine exists I0804 12:49:31.660480 1 session.go:113] Find template by instance uuid: 30d9461a-cae9-42d7-8369-859a421c6a3a I0804 12:49:31.687004 1 reconciler.go:175] ademicev-6hc7q-worker-q7pzt: does not exist I0804 12:49:31.687022 1 controller.go:424] ademicev-6hc7q-worker-q7pzt: going into phase "Failed" I0804 12:49:31.707896 1 controller.go:169] ademicev-6hc7q-worker-q7pzt: reconciling Machine W0804 12:49:31.707922 1 controller.go:266] ademicev-6hc7q-worker-q7pzt: machine has gone "Failed" phase. It won't reconcile
Alexander Demicev, I can't reproduce this bug on 4.6. clusterversion: 4.6.0-0.nightly-2020-08-05-041346 # oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsun85vsphere1-c7wnq-master-0 Running 82m zhsun85vsphere1-c7wnq-master-0 vsphere://422b0dd9-f3cf-c551-cb59-6c4e263c2855 poweredOn zhsun85vsphere1-c7wnq-master-1 Running 82m zhsun85vsphere1-c7wnq-master-1 vsphere://422b2d75-2901-1eb7-ed24-c6cbc3791c02 poweredOn zhsun85vsphere1-c7wnq-master-2 Running 82m zhsun85vsphere1-c7wnq-master-2 vsphere://422bc84f-2509-dad3-07ed-17e6afab3ed2 poweredOn zhsun85vsphere1-c7wnq-worker-sfnj8 Running 70m zhsun85vsphere1-c7wnq-worker-sfnj8 vsphere://422bf3b8-1a9e-0852-d541-8f9f31ee6055 poweredOn zhsun85vsphere1-c7wnq-worker-zbh2n Failed 70m zhsun85vsphere1-c7wnq-worker-zbh2n vsphere://422be640-7278-6637-e094-616fb3e61468 Unknown
Closing this BZ, because the bug appears only on 4.5. All progress can be tracked here https://bugzilla.redhat.com/show_bug.cgi?id=1869320
I've linked the PR that is being cherry-picked in the dependent PR, once it's verified that this fixes the issue, we can move on with the backport
Validated on - 4.6.0-0.nightly-2020-08-23-214712 Steps : Deleted VM from vsphere and from disk Machine status became failed after sometime [miyadav@miyadav vsp]$ oc get machines -o wide --config vsp NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE jima082401-hxvhx-master-0 Running 9h jima082401-hxvhx-master-0 vsphere://422be929-b6f7-c263-2e26-78fc44f17e8c poweredOn jima082401-hxvhx-master-1 Running 9h jima082401-hxvhx-master-1 vsphere://422bc938-1b49-aad2-fc95-bf367c3e387f poweredOn jima082401-hxvhx-master-2 Running 9h jima082401-hxvhx-master-2 vsphere://422b4670-78cb-e14e-695c-db98041ef7bb poweredOn jima082401-hxvhx-worker-tpplc Failed 8h jima082401-hxvhx-worker-tpplc vsphere://422ba266-fcf6-a399-4039-0f62a14b3f52 Unknown Additional info : Moved to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196