Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1918101

Summary: [vsphere]Delete Provisioning machine took about 12 minutes
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: dmoiseev
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: aarapov, dmoiseev, mgugino
Version: 4.7   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Due to no distinction between various vCenter task types in machine-controller deletion procedure blocking if failed task presented in vCenter. Consequence: Deletion of not actually created machine (due to some reasons, like lack of datastore space) takes a long time. Fix: Machine-controller deletion procedure now check vCenter task type and do not block deletion. Result: Machine in Provisioning phase deletes quickly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:36:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sunzhaohua 2021-01-20 03:26:27 UTC
Description of problem:
Create a new machine, machine stuck in Provisioning status because of "Insufficient disk space on datastore", then delete machine, it took about 12 minutes.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-01-18-164445

How reproducible:
Always

Steps to Reproduce:
1. Create a new machine, machine stuck in Provisioning status because of "Insufficient disk space on datastore" 
2. Delete this machine
3.

Actual results:
It took about 12 minutes to delete the machine stucks in Provisioning.
$ ./oc get machine
NAME                              PHASE          TYPE   REGION   ZONE   AGE
jstuevervcsa-72g5s-master-0       Running                               148m
jstuevervcsa-72g5s-master-1       Running                               148m
jstuevervcsa-72g5s-master-2       Running                               148m
jstuevervcsa-72g5s-worker-4cgzd   Running                               140m
jstuevervcsa-72g5s-worker-bsh6l   Running                               140m
jstuevervcsa-72g5s-worker-cjpkx   Provisioning                          32m
jstuevervcsa-72g5s-worker-l9r5q   Running                               81m
jstuevervcsa-72g5s-worker-qgl87   Provisioning                          29m
jstuevervcsa-72g5s-worker-twhzg   Running                               140m

$ ./oc delete machine jstuevervcsa-72g5s-worker-cjpkx

$ ./oc logs -f machine-api-controllers-64ffb8bcd-z27zb -c machine-controller | grep jstuevervcsa-72g5s-worker-cjpkx
I0119 03:58:02.122325       1 controller.go:312] jstuevervcsa-72g5s-worker-cjpkx: reconciling machine triggers idempotent create
I0119 03:58:02.122330       1 actuator.go:66] jstuevervcsa-72g5s-worker-cjpkx: actuator creating machine
E0119 03:58:02.140637       1 actuator.go:57] jstuevervcsa-72g5s-worker-cjpkx error: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Create machine: Insufficient disk space on datastore ''.
I0119 03:58:02.140680       1 machine_scope.go:102] jstuevervcsa-72g5s-worker-cjpkx: patching machine
W0119 03:58:02.180596       1 controller.go:314] jstuevervcsa-72g5s-worker-cjpkx: failed to create machine: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Create machine: Insufficient disk space on datastore ''.
I0119 03:58:45.298585       1 controller.go:168] jstuevervcsa-72g5s-worker-cjpkx: reconciling Machine
I0119 03:58:45.298726       1 controller.go:426] jstuevervcsa-72g5s-worker-cjpkx: going into phase "Deleting"
I0119 03:58:45.314565       1 controller.go:208] jstuevervcsa-72g5s-worker-cjpkx: reconciling machine triggers delete
I0119 03:58:45.314657       1 actuator.go:150] jstuevervcsa-72g5s-worker-cjpkx: actuator deleting machine
I0119 03:58:45.339695       1 machine_scope.go:102] jstuevervcsa-72g5s-worker-cjpkx: patching machine
E0119 03:58:45.420267       1 actuator.go:57] jstuevervcsa-72g5s-worker-cjpkx error: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Delete machine: Insufficient disk space on datastore ''.
E0119 03:58:45.420501       1 controller.go:229] jstuevervcsa-72g5s-worker-cjpkx: failed to delete machine: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Delete machine: Insufficient disk space on datastore ''.
I0119 03:58:45.452358       1 controller.go:168] jstuevervcsa-72g5s-worker-cjpkx: reconciling Machine
I0119 03:58:45.452497       1 controller.go:208] jstuevervcsa-72g5s-worker-cjpkx: reconciling machine triggers delete
I0119 03:58:45.452532       1 actuator.go:150] jstuevervcsa-72g5s-worker-cjpkx: actuator deleting machine
I0119 03:58:45.475338       1 machine_scope.go:102] jstuevervcsa-72g5s-worker-cjpkx: patching machine
E0119 03:58:45.529736       1 actuator.go:57] jstuevervcsa-72g5s-worker-cjpkx error: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Delete machine: Insufficient disk space on datastore ''.
E0119 03:58:45.529859       1 controller.go:229] jstuevervcsa-72g5s-worker-cjpkx: failed to delete machine: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Delete machine: Insufficient disk space on datastore ''.
I0119 03:59:24.101804       1 controller.go:168] jstuevervcsa-72g5s-worker-cjpkx: reconciling Machine
I0119 03:59:24.101841       1 controller.go:208] jstuevervcsa-72g5s-worker-cjpkx: reconciling machine triggers delete
I0119 03:59:24.101847       1 actuator.go:150] jstuevervcsa-72g5s-worker-cjpkx: actuator deleting machine
I0119 03:59:24.119039       1 machine_scope.go:102] jstuevervcsa-72g5s-worker-cjpkx: patching machine
E0119 03:59:24.156785       1 actuator.go:57] jstuevervcsa-72g5s-worker-cjpkx error: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Delete machine: Insufficient disk space on datastore ''.
E0119 03:59:24.157008       1 controller.go:229] jstuevervcsa-72g5s-worker-cjpkx: failed to delete machine: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Delete machine: Insufficient disk space on datastore ''.
I0119 04:00:09.955183       1 controller.go:168] jstuevervcsa-72g5s-worker-cjpkx: reconciling Machine
I0119 04:00:09.955195       1 controller.go:208] jstuevervcsa-72g5s-worker-cjpkx: reconciling machine triggers delete
I0119 04:00:09.955201       1 actuator.go:150] jstuevervcsa-72g5s-worker-cjpkx: actuator deleting machine
I0119 04:00:09.968497       1 machine_scope.go:102] jstuevervcsa-72g5s-worker-cjpkx: patching machine
E0119 04:00:09.994062       1 actuator.go:57] jstuevervcsa-72g5s-worker-cjpkx error: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Delete machine: Insufficient disk space on datastore ''.
E0119 04:00:09.994147       1 controller.go:229] jstuevervcsa-72g5s-worker-cjpkx: failed to delete machine: jstuevervcsa-72g5s-worker-cjpkx: reconciler failed to Delete machine: Insufficient disk space on datastore ''.
I0119 04:10:19.517488       1 controller.go:168] jstuevervcsa-72g5s-worker-cjpkx: reconciling Machine
I0119 04:10:19.517629       1 controller.go:208] jstuevervcsa-72g5s-worker-cjpkx: reconciling machine triggers delete
I0119 04:10:19.517635       1 actuator.go:150] jstuevervcsa-72g5s-worker-cjpkx: actuator deleting machine
I0119 04:10:19.573565       1 reconciler.go:240] jstuevervcsa-72g5s-worker-cjpkx: vm does not exist
I0119 04:10:19.573690       1 machine_scope.go:102] jstuevervcsa-72g5s-worker-cjpkx: patching machine
I0119 04:10:19.608631       1 actuator.go:109] jstuevervcsa-72g5s-worker-cjpkx: actuator checking if machine exists
I0119 04:10:19.629500       1 reconciler.go:199] jstuevervcsa-72g5s-worker-cjpkx: does not exist
I0119 04:10:19.660077       1 controller.go:260] jstuevervcsa-72g5s-worker-cjpkx: machine deletion successful


Expected results:
Machine could be deleted quickly.

Additional info:

Comment 1 Joel Speed 2021-01-20 10:34:56 UTC
It's interesting that we can't delete a machine if there's no space on the datastore, perhaps we need to check there's space before we attempt to create a VM, and go into Failed if not

Comment 2 Michael Gugino 2021-01-21 17:01:26 UTC
The bug is a case where we should set some status on the machine object if we receive an error, rather than a 'machine still exists response'.

Aside from that, cluster owners are required to ensure their infrastructure is healthy.  I don't think we should be accountable for ensuring enough space exists on the infrastructure.  The API will tell us when there isn't, and that's the check.

Comment 3 Joel Speed 2021-02-05 14:59:58 UTC
I think adding some healthchecks to the datacenter to prevent us trying to create machines on unhealthy datacenters may be useful, will see if someone has time to look at this next sprint

Comment 5 sunzhaohua 2021-05-25 02:34:40 UTC
verified
clusterversion: 4.8.0-0.nightly-2021-05-21-233425

Comment 8 errata-xmlrpc 2021-07-27 22:36:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438