Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1003649

Summary: It is impossible to hotplug a disk if the previous hotplug failed
Product: Red Hat Enterprise Virtualization Manager Reporter: Katarzyna Jachim <kjachim>
Component: vdsmAssignee: Sergey Gotliv <sgotliv>
Status: CLOSED ERRATA QA Contact: Katarzyna Jachim <kjachim>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: abaron, acanan, amureini, bazulay, iheim, kjachim, lpeer, ncredi, scohen, sgotliv, tnisan, yeylon
Target Milestone: ---Flags: abaron: Triaged+
Target Release: 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: is17 Doc Type: Bug Fix
Doc Text:
Previously, the HotPlugDiskToVmCommand simultaneously updated the 'isPlugged' and 'bootOrder' properties of all devices attached to virtual machines. This could cause a race with another thread which handled the hot plug of another disk for the same virtual machine, for example it was not possible to hotplug a disk if the previous hotplug failed. Now, HotPlugDiskToVmCommand only updates 'isPlugged' for the device that was plugged by this command, so this race does not occur.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-21 16:14:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test logs (vdsm, engine, server etc.) none

Description Katarzyna Jachim 2013-09-02 14:40:06 UTC
Created attachment 792887 [details]
test logs (vdsm, engine, server etc.)

Description of problem:
When we try to hotplug 10 disks at once, sometimes all the calls finish with status 'complete' and there is no error in vdsm/engine log, but some of the disks are reported by RHEV-M as unplugged. However, if we want to plug such a disk again, the operation will fail with the following error:

Thread-770::ERROR::2013-08-30 01:08:22,232::vm::3252::vm.Vm::(hotplugDisk) vmId=`3758e0a3-e715-4e82-8a3f-187fd3a4f6f8`::Hotplug failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 3250, in hotplugDisk
    self._dom.attachDevice(driveXml)
  File "/usr/share/vdsm/vm.py", line 824, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 399, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: internal error unable to reserve PCI address 0:0:15.0


Version-Release number of selected component (if applicable): is12


How reproducible:
Happens sometimes in following automated tests:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-iscsi-sdk
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-iscsi-rest
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-nfs-sdk
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-nfs-rest

Failed job: http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_hotplug_disk_hooks-nfs-rest/44/

id of problematic disk: b9bdb478-5e64-4e50-a2db-6a4077c6fb6a


Steps to Reproduce:
1. hotplug 10 disks at once
2. if one of them is not reported as plugged, try to plug it again


Actual results:
* it is impossible to plug a disk if previous hotplug failed


Expected results:
* if hotplug fails it should be rolled back so that the next hotplug should be possible

Comment 1 Ayal Baron 2013-09-12 10:12:47 UTC
Aharon, seems to me like there should be 2 tests here (after we figure out what the issue is), to make sure we have deterministic results.

Comment 2 Ayal Baron 2013-09-16 11:20:21 UTC
The issue is that attaching devices changes the boot order of the VM and since we stopped taking an exclusive lock at VM level (3.1) the race is available.
So attaching multiple devices to the same VM concurrently is raceful.

Comment 9 Charlie 2013-11-28 00:32:44 UTC
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 10 errata-xmlrpc 2014-01-21 16:14:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html