Bug 1600641 - Failed to detach volume due to "Boot from hard disk failure"
Summary: Failed to detach volume due to "Boot from hard disk failure"
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1458798
TreeView+ depends on / blocked
 
Reported: 2018-07-12 16:37 UTC by Rajini Karthik
Modified: 2023-03-21 18:56 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-27 15:14:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs (1.90 MB, application/zip)
2018-07-12 16:42 UTC, Rajini Karthik
no flags Details
nova booting (85.70 KB, image/png)
2018-07-18 14:30 UTC, Rajini Karthik
no flags Details
dump (5.86 KB, text/plain)
2018-07-18 14:31 UTC, Rajini Karthik
no flags Details

Description Rajini Karthik 2018-07-12 16:37:51 UTC
Description of problem:
We have built VNX cinder and manila container images via Red Hat auto build service, and deployed them with OSP 13 GA version successfully 

1.	Failed to detach volume
a.	Root cause:      The request failed before it reached Cinder, we believe it is not a VNX driver issue.
b.	Workaround:   N/A

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Rajini Karthik 2018-07-12 16:42:11 UTC
Nova failed to detach volume from instance, then raise DeviceDetachFailed Exception, the request even not reach Cinder, so I think it should not be Cinder driver’s issue, maybe it is Platform or Nova issue.
Did you met similar issue on your test environment?

My Test Steps as:
1.	Deploy OSP13 with Cinder VNX backend configured
2.	Create cinder volume – passed
3.	Create nova instance – passed
4.	Attach volume to instance – passed
5.	Detach volume from instance – failed
6.	Deploy OSP13 with Cinder Unity backend configured
7.	Repeat step 2 ~ 5 – also failed at volume detach

Detail logs as attachment.

Error logs from nova-compute.log:
--------------------------------------------------
2018-07-12 00:04:21.259 1 DEBUG nova.virt.libvirt.guest [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] detach device xml: <disk type="block" device="disk">
  <driver name="qemu" type="raw" cache="none" io="native"/>
  <source dev="/dev/sdc"/>
  <target bus="virtio" dev="vdb"/>
  <serial>10341e87-805e-424c-a422-253add859849</serial>
  <address type="pci" domain="0x0000" bus="0x00" slot="0x06" function="0x0"/>
</disk>
detach_device /usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py:477
2018-07-12 00:04:26.267 1 DEBUG nova.virt.libvirt.guest [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] Successfully detached device vdb from guest. Persistent? False. Live? True _try_detach_device /usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py:400
2018-07-12 00:04:26.268 1 DEBUG oslo.service.loopingcall [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] Exception which is in the suggested list of exceptions occurred while invoking function: nova.virt.libvirt.guest._do_wait_and_retry_detach. _func /usr/lib/python2.7/site-packages/oslo_service/loopingcall.py:400
2018-07-12 00:04:26.269 1 DEBUG oslo.service.loopingcall [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] Cannot retry nova.virt.libvirt.guest._do_wait_and_retry_detach upon suggested exception since retry count (7) reached max retry count (7). _func /usr/lib/python2.7/site-packages/oslo_service/loopingcall.py:410
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] Dynamic interval looping call 'oslo_service.loopingcall._func' failed: DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain.
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall Traceback (most recent call last):
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 137, in _run_loop
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall     result = func(*self.args, **self.kw)
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 415, in _func
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall     return self._sleep_time
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall     self.force_reraise()
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall     six.reraise(self.type_, self.value, self.tb)
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 394, in _func
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall     result = f(*args, **kwargs)
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 457, in _do_wait_and_retry_detach
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall     device=alternative_device_name, reason=reason)
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain.
2018-07-12 00:04:26.270 1 ERROR oslo.service.loopingcall 
2018-07-12 00:04:26.274 1 WARNING nova.virt.block_device [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] [instance: 14f3b85f-358f-4ba2-b31e-868f579dabba] Guest refused to detach volume 10341e87-805e-424c-a422-253add859849: DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain.
2018-07-12 00:04:26.274 1 DEBUG oslo_concurrency.lockutils [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] Lock "4682c294-5575-4534-9805-d1958808bfda" released by "nova.virt.block_device._do_locked_detach" :: held 101.180s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] Exception during message handling: DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain.
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     function_name, call_dict, binary)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     self.force_reraise()
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 976, in decorated_function
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 214, in decorated_function
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     kwargs['instance'], e, sys.exc_info())
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     self.force_reraise()
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 202, in decorated_function
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5436, in detach_volume
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     attachment_id=attachment_id)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5389, in _detach_volume
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     attachment_id=attachment_id, destroy_bdm=destroy_bdm)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 415, in detach
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     attachment_id, destroy_bdm)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 412, in _do_locked_detach
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     self._do_detach(*args, **_kwargs)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 341, in _do_detach
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     self.driver_detach(context, instance, volume_api, virt_driver)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 310, in driver_detach
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     {'vol': volume_id}, instance=instance)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     self.force_reraise()
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 300, in driver_detach
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     encryption=encryption)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1610, in detach_volume
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     wait_for_detach()
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 423, in func
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return evt.wait()
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return hubs.get_hub().switch()
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return self.greenlet.switch()
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 137, in _run_loop
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     result = func(*self.args, **self.kw)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 415, in _func
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     return self._sleep_time
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     self.force_reraise()
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 394, in _func
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     result = f(*args, **kwargs)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 457, in _do_wait_and_retry_detach
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server     device=alternative_device_name, reason=reason)
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain.
2018-07-12 00:04:27.155 1 ERROR oslo_messaging.rpc.server

Comment 2 Rajini Karthik 2018-07-12 16:42:44 UTC
Created attachment 1458472 [details]
Logs

Comment 3 Rajini Karthik 2018-07-12 16:49:39 UTC
There are several hits on the Launchpad bugs on the same issue
https://bugs.launchpad.net/nova/+bug/1565859

From RH, solution in progress.
https://access.redhat.com/solutions/3359111

Comment 4 Alan Bishop 2018-07-12 16:57:49 UTC
I checked cinder's logs and cinder-api.log shows the volume 10341e87-805e-424c-a422-253add859849 was successfully detached.

This is an issue for Compute folks to look at.

Comment 5 melanie witt 2018-07-12 22:36:49 UTC
FYI, the nova-compute.log you attached to this BZ does not contain the DeviceDetachFailed message that you pasted in comment 1.

From your paste, the key error message is this one:

  2018-07-12 00:04:26.274 1 WARNING nova.virt.block_device [req-3a007970-2d27-4986-8725-73bdbbfcd4f8 d66799e3a29e46ab96b1a9ba20964221 97834a97be7240adb9473b97a3f7fa27 - default default] [instance: 14f3b85f-358f-4ba2-b31e-868f579dabba] Guest refused to detach volume 10341e87-805e-424c-a422-253add859849: DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain.

A request to detach a volume from a transient/live guest domain is an ACPI request to the guest asking to detach the volume. The guest can refuse to comply with the request, which is what you pasted from the compute log: "Guest refused to detach volume."

This usually means the volume is still in use and mounted by the guest. In your test, is the guest doing anything with the volume? Did it mount the volume? Are there other things in the environment like use of LVM or multipathd?

We need to find out what is keeping the volume in-use.

Comment 6 Rajini Karthik 2018-07-13 16:31:57 UTC
Actually we do nothing on the attached volume, the test is quite simple:
Attach volume, wait several minutes, then detach volume.

Comment 7 melanie witt 2018-07-17 21:40:56 UTC
Okay, so the guest is idle during this test.

From the logs, I agree that the request to detach the volume does not reach cinder -- it stops at nova because the guest refused to detach the volume. We need to troubleshoot what is causing the guest to refuse to detach the volume.

Is there any way for us to gain access to your test environment to do some troubleshooting?

If not, there are some virsh commands you can run while the volume is attached to the guest domain and please paste the output:

$ virsh domblklist <domain>

$ virsh domblkstat <domain>

And then after logging into the guest, list block devices:

$ lsblk

$ df

Finally, after getting the above info, we can try to detach the volume outside of nova using the same command nova uses, to get an error message directly from libvirt:

$ virsh detach-disk <domain> <target> --live --config

Upon getting this information, we'll determine the next steps we need to take to debug this.

Comment 8 Rajini Karthik 2018-07-18 14:28:40 UTC
The output of domblklist and domblkstat as:

[root@osp12com0 nova]# virsh domblklist instance-00000003
Target     Source
------------------------------------------------
vda        /var/lib/nova/instances/34c6a2ea-5299-46e9-b5af-37203c0d5ab9/disk
vdb        /dev/sdd

[root@osp12com0 nova]# virsh domblkstat instance-00000003  rd_req 2  rd_bytes 1024  wr_req 0  wr_bytes 0  flush_operations 0  rd_total_times 953734  wr_total_times 0  flush_total_times 0


I failed to connect the vm console, no output:

[root@osp12com0 puppet-generated]# virsh console instance-00000003 Connected to domain instance-00000003 Escape character is ^]


From Overcloud GUI, I found that the instance get stuck at "Booting from Hard Disk...", snapshot as attachment.
Seems the instance never start up successfully.
I don't know why the volume attach succeed (no error return when executing 'nova volume-attach instance-id volume-id')

Comment 9 Rajini Karthik 2018-07-18 14:30:50 UTC
Created attachment 1459727 [details]
nova booting

Comment 10 Rajini Karthik 2018-07-18 14:31:18 UTC
Created attachment 1459728 [details]
dump

Comment 11 melanie witt 2018-07-18 18:37:51 UTC
Thanks. From the nova-compute.log, the instance was created successfully and the volume attached successfully. But as you show from the instance console, the instance has not successfully finished booting up.

Are you able to create an instance and have it successfully complete booting up (in the console) if you wait and _don't_ attach any volume yet? I wonder if the early attachment of the volume is causing a problem with the boot process and causing it to get stuck.

Nova can't know whether the instance has completed its boot process (this can take varying amounts of time depending on the image etc), so it can't reject the volume attach request based on that. The code paths succeed as far as connecting it to the libvirt guest domain and cinder calls, so that must be independent from the boot process happening inside the guest.

It will help to know whether the instance can finish booting successfully if you wait and don't attach the volume until it's finished. That way, we can find out whether the attachment of the volume is interfering with the boot process. Can you try that and let us know if the instance can boot without getting stuck if you don't attach a volume right away?

Comment 12 Rajini Karthik 2018-07-19 14:31:51 UTC
Hi,

I created two new nova instances last night, the two instances running overnight, still get stuck at "Booting from Hard Disk..."

Related configuration as:

1. virt_type is set to 'kvm' in nova.conf:
[root@osp12com0 nova]# grep 'virt_type=' /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf
#virt_type=kvm
virt_type=kvm
# If virt_type="kvm|qemu", it will default to "host-model", otherwise it will

2. Hypervisor type is 'QEMU':
(overcloud) [stack@manager ~]$ openstack hypervisor list
+----+-----------------------+-----------------+-------------+-------+
| ID | Hypervisor Hostname   | Hypervisor Type | Host IP     | State |
+----+-----------------------+-----------------+-------------+-------+
|  1 | osp12com0.localdomain | QEMU            | 172.16.2.18 | down  |
+----+-----------------------+-----------------+-------------+-------+

3. Virtualization is enabled in compute node:
[root@osp12com0 nova]# grep -E 'svm|vmx' /proc/cpuinfo
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm epb tpr_shadow vnmi ept vpid xsaveopt dtherm ida arat pln pts
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm epb tpr_shadow vnmi ept vpid xsaveopt dtherm ida arat pln pts
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm epb tpr_shadow vnmi ept vpid xsaveopt dtherm ida arat pln pts
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm epb tpr_shadow vnmi ept vpid xsaveopt dtherm ida arat pln pts
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm epb tpr_shadow vnmi ept vpid xsaveopt dtherm ida arat pln pts
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm epb tpr_shadow vnmi ept vpid xsaveopt dtherm ida arat pln pts
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm epb tpr_shadow vnmi ept vpid xsaveopt dtherm ida arat pln pts
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm epb tpr_shadow vnmi ept vpid xsaveopt dtherm ida arat pln pts

4. Nested virtualization is also enabled:
[root@osp12com0 nova]# cat /sys/module/kvm_intel/parameters/nested
Y

5. Both of undercloud and overcloud nodes (compute node, control node) are running upon ESXi


Thanks
Yong

Comment 13 Rajini Karthik 2018-07-20 17:48:22 UTC
Here in the Futureville labs in Austin, we have just completed validation of basic cinder functionality.
We have successfully installed and tested OSp13 with vnx cinder backend and did not see this issue. 

The problems previously encountered were reported by the Dell  EMC Shanghai team. This can now be isolated to their environment. It will be now a lower priority issue.

Comment 14 melanie witt 2018-07-20 20:05:33 UTC
Thanks for the update.

We discussed this BZ in our bug triage call today and the error message, "Guest refused to detach volume 10341e87-805e-424c-a422-253add859849: DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain." is expected and not a bug considering that the instance is stuck during the boot process. It is not expected that the guest will respond to the detach request if it is stuck booting. The reason the attach succeeds is because guest participation is not required for a volume attach -- however for volume detach, the guest must respond to the detach request.

Since this is expected behavior for an instance stuck booting, we are closing this BZ as NOTABUG.

Comment 15 Rajini Karthik 2018-07-26 19:51:55 UTC
This is still a nova issue. Can we get help to resolve this pl? Can we reopen ?

Comment 16 melanie witt 2018-07-27 15:14:00 UTC
This is not a nova issue -- this is not a bug in nova. Nova successfully created the virtual machine and the guest is booting from the image you supplied. Your instances are failing to boot from the images you're using and you must debug that. Please open a case at https://access.redhat.com/support if you need additional support for the issue.


Note You need to log in before you can comment on or make changes to this bug.