Bug 1003649 - It is impossible to hotplug a disk if the previous hotplug failed
Summary: It is impossible to hotplug a disk if the previous hotplug failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Sergey Gotliv
QA Contact: Katarzyna Jachim
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-02 14:40 UTC by Katarzyna Jachim
Modified: 2016-02-10 17:18 UTC (History)
12 users (show)

Fixed In Version: is17
Doc Type: Bug Fix
Doc Text:
Previously, the HotPlugDiskToVmCommand simultaneously updated the 'isPlugged' and 'bootOrder' properties of all devices attached to virtual machines. This could cause a race with another thread which handled the hot plug of another disk for the same virtual machine, for example it was not possible to hotplug a disk if the previous hotplug failed. Now, HotPlugDiskToVmCommand only updates 'isPlugged' for the device that was plugged by this command, so this race does not occur.
Clone Of:
Environment:
Last Closed: 2014-01-21 16:14:48 UTC
oVirt Team: Storage
Target Upstream Version:
abaron: Triaged+


Attachments (Terms of Use)
test logs (vdsm, engine, server etc.) (8.45 MB, application/x-bzip2)
2013-09-02 14:40 UTC, Katarzyna Jachim
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0040 0 normal SHIPPED_LIVE vdsm bug fix and enhancement update 2014-01-21 20:26:21 UTC
oVirt gerrit 19311 0 None MERGED engine: Split update of 'isPlugged' and 'bootOrder' properties... 2020-04-20 08:20:22 UTC
oVirt gerrit 19521 0 None MERGED engine: Split update of 'isPlugged' and 'bootOrder' properties... 2020-04-20 08:20:22 UTC

Description Katarzyna Jachim 2013-09-02 14:40:06 UTC
Created attachment 792887 [details]
test logs (vdsm, engine, server etc.)

Description of problem:
When we try to hotplug 10 disks at once, sometimes all the calls finish with status 'complete' and there is no error in vdsm/engine log, but some of the disks are reported by RHEV-M as unplugged. However, if we want to plug such a disk again, the operation will fail with the following error:

Thread-770::ERROR::2013-08-30 01:08:22,232::vm::3252::vm.Vm::(hotplugDisk) vmId=`3758e0a3-e715-4e82-8a3f-187fd3a4f6f8`::Hotplug failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 3250, in hotplugDisk
    self._dom.attachDevice(driveXml)
  File "/usr/share/vdsm/vm.py", line 824, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 399, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: internal error unable to reserve PCI address 0:0:15.0


Version-Release number of selected component (if applicable): is12


How reproducible:
Happens sometimes in following automated tests:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-iscsi-sdk
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-iscsi-rest
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-nfs-sdk
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-nfs-rest

Failed job: http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-nfs-rest/44/

id of problematic disk: b9bdb478-5e64-4e50-a2db-6a4077c6fb6a


Steps to Reproduce:
1. hotplug 10 disks at once
2. if one of them is not reported as plugged, try to plug it again


Actual results:
* it is impossible to plug a disk if previous hotplug failed


Expected results:
* if hotplug fails it should be rolled back so that the next hotplug should be possible

Comment 1 Ayal Baron 2013-09-12 10:12:47 UTC
Aharon, seems to me like there should be 2 tests here (after we figure out what the issue is), to make sure we have deterministic results.

Comment 2 Ayal Baron 2013-09-16 11:20:21 UTC
The issue is that attaching devices changes the boot order of the VM and since we stopped taking an exclusive lock at VM level (3.1) the race is available.
So attaching multiple devices to the same VM concurrently is raceful.

Comment 9 Charlie 2013-11-28 00:32:44 UTC
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 10 errata-xmlrpc 2014-01-21 16:14:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html


Note You need to log in before you can comment on or make changes to this bug.