Bug 1728298
| Summary: | nova virtio-scsi drivers turned to virtio-blk after rescue mode | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marc Methot <mmethot> |
| Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
| Status: | CLOSED EOL | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 13.0 (Queens) | CC: | dasmith, eglynn, fwissing, jhakimra, kchamart, sbauza, sgordon, ssigwald, vromanso |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | --- | Flags: | fwissing:
needinfo?
(mbooth) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-11 20:40:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Unfortunately there's not quite enough detail in the reproducer steps for me to get a handle on this. Please could you provide these again, and include: * The exact commands used to perform openstack operations. * The exact method used to 'break' a VM, and the resulting state of the VM. * The exact points in the reproducer steps at which instance dumpxmls were taken. Possibly relevant (depending on clarification of the reproducer steps), it looks like there's a bug in rescue. Before rescuing the instance we save the current domain xml to unrescue.xml, which is generated by LibvirtDriver._get_existing_domain_xml(). This method will get the xml of a running domain, but if it can't for any reason it will regenerate it *without block_device_info*, which would omit attached volumes. I have attempted to reproduce this issue on OSP13. I can't be sure I have reproduced the issue reported due to lack of logs, but I have reproduced an issue. I have done the following:
$ openstack image create --disk-format qcow2 --file cirros-0.4.0-x86_64-disk.img cirros
$ openstack image create --disk-format qcow2 --file cirros-0.4.0-x86_64-disk.img --property hw_disk_bus=scsi --property hw_qemu_guest_agent=yes --property hw_scsi_model=virtio-scsi cirros_scsi
$ openstack server create --flavor m1.tiny --image cirros_scsi mbooth_scsitest
# Confirmed libvirt is using scsi
$ openstack server rescue --image cirros mbooth_scsitest
# Confirmed libvirt is using virtio-blk
$ openstack server unrescue mbooth_scsitest
# Confirmed libvirt is using scsi
$ openstack volume create --size 1 mbooth_scsitest_vol
$ openstack server add volume mbooth_scsitest mbooth_scsitest_vol
I got the following error in nova-compute.log:
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [req-3bf4da4c-f14c-490b-b54f-2d26bf89a729 106ac4b9ef2c4f97b8bfb732ef2be298 942db8408f734f5dbfaf0e6e2c433333 - default default] [instance: 40
2122de-ee60-4c0d-87e2-c5ff4ad03ffb] Failed to attach 6f3b33d6-fbfc-492d-9a03-286a23a38cbf at /dev/sda: libvirtError: Requested operation is not valid: target sda already exists
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] Traceback (most recent call last):
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5339, in _attach_vo
lume
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] do_driver_attach=True)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 46, in wrapped
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] ret_val = method(obj, context, *args, **kwargs)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 624, in attach
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] virt_driver, do_driver_attach)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 602, in _do_attach
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] do_driver_attach)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 549, in _volume_attach
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] attachment_id)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] self.force_reraise()
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] six.reraise(self.type_, self.value, self.tb)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 540, in _volume_attach
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] device_type=self['device_type'], encryption=encryption)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1503, in attach_volume
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] encryption=encryption)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] self.force_reraise()
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] six.reraise(self.type_, self.value, self.tb)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1476, in attach_volume
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] guest.attach_device(conf, persistent=True, live=live)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 306, in attach_device
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] self._domain.attachDeviceFlags(device_xml, flags=flags)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] result = proxy_call(self._autowrap, f, *args, **kwargs)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] rv = execute(f, *args, **kwargs)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] six.reraise(c, e, tb)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] rv = meth(*args, **kwargs)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 605, in attachDeviceFlags
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb] libvirtError: Requested operation is not valid: target sda already exists
2019-07-19 16:30:57.023 1 ERROR nova.compute.manager [instance: 402122de-ee60-4c0d-87e2-c5ff4ad03ffb]
I just confirmed that repeating the above but without the rescue/unrescue step, the attach succeeds. I just confirmed that repeating the above with the rescue step but using the cirros_scsi image, the attach succeeds. The attach failure seems to be related to the different rescue image. The originally reported stop/start step may not be required, assuming the reproduced issue is also the reported issue. Note that we can't be sure of the latter due to missing logs. Hi guys, What else is needed from the customer to move this forward? /Freddy OSP 13 was retired on June 27, 2023. No further work is expected to occur on this issue. |
Description of problem: Upon rescue of an instance with a different rescue image, the target changes. Customer releases images that are use virtio-scsi disk bus. However they won't limit users utilizing custom images with virtio-blk and then enforce to use that image to rescue instance (basically a user could use a different image that would cause the break). In those cases, this anomaly would break the whole instance and it is irreversible other than re-building the instance. Version-Release number of selected component (if applicable): latest 13 How reproducible: Everytime Steps to Reproduce: 1) VMs boot with image property virtio-scsi and sets boot disk as virtio-scsi (root disk /dev/sda) 2) Attach a persistent storage (It's /dev/sdb) 2) VMs break 3) VM is rescued with vritio-blk image property and boots disks as virtio-blk 4) VMs is unrescued 5) VM comes back online with virtio-scsi driver (Root disk is /dev/sda, existing persistent storage is /dev/sdb until this point). deviation starts here.. a) Attach persistent storage shows attached at Openstack Layer but not visible from KVM layer and instance VM Instance. (Ideally, it should be /dev/sdc) 6) Perform stop and start via nova cli or horizon 7) VM boots up with following anomalies again. a) VM boots with rescue image driver : i.e, virtio-blk for root disk. Also note, virtio-scsi is loaded and visible mod list in VM . (root disk becomes /dev/vda, persistent storage (including the attached volume in step 5a) ) becomes /dev/sda, /dev/sdb) Rescuing with another image that has virtio-scsi OR with same image doesn't break anything. Expected results: Would be ideal to rescue the instance, and then have it return to original driver.