Description of problem: VM fails to start after changing IDE disks boot order Version-Release number of selected component (if applicable): rhevm-backend-3.4.4-2.2.el6ev.noarch How reproducible: Steps to Reproduce: 1. Create new VM with IDE disk, verify it starts 2. Add second IDE disk 3. Turn off bootable flag from 1st disk, make 2nd disk bootable 4. Start the VM Actual results: VM fails to start, 2015-01-28 15:47:40,801 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-30) Correlation ID: 846da54, Job ID: 66a0ed9b-c7c7-4822-b6f8-b89b56387d12, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM fed2 on Host 10.xx.xx.xx. 2015-01-28 15:47:40,888 INFO [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-4-thread-30) Lock Acquired to object EngineLock [exclusiveLocks= key: 35f3b562-cb69-4ee1-8b47-47256520182b value: VM , sharedLocks= ] 2015-01-28 15:47:40,902 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (org.ovirt.thread.pool-4-thread-30) START, IsVmDuringInitiatingVDSCommand( vmId = 35f3b562-cb69-4ee1-8b47-47256520182b), log id: 3107c06 2015-01-28 15:47:40,902 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (org.ovirt.thread.pool-4-thread-30) FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 3107c06 2015-01-28 15:47:40,908 WARN [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-4-thread-30) CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT Expected results: VM starts Additional info:
this might be a libvirt issue, i was able to reproduce this locally using the steps described, and this is what i got: engine send to vdsm create with 3 ide devices: 1 - cdrom, with address (assigned by libvirt from previous run) 2 - disk with address (assigned by libvirt from previous run) 3 - disk without address (the new disk) but with request to be boot (boot_order=1) <disk device="cdrom" snapshot="no" type="file"> <address bus="1" controller="0" target="0" type="drive" unit="0"/> <source file="" startupPolicy="optional"/> <target bus="ide" dev="hdc"/> <readonly/> <serial/> </disk> <disk device="disk" snapshot="no" type="file"> <source file="/rhev/data-center/.../..."/> <target bus="ide" dev="hda"/> <serial>483d2327-84bb-4de6-90a9-d282d5237ae1</serial> <boot order="1"/> <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw"/> </disk> <disk device="disk" snapshot="no" type="file"> <address bus="0" controller="0" target="0" type="drive" unit="0"/> <source file="/rhev/data-center/.../..."/> <target bus="ide" dev="hdb"/> <serial>901e7fe4-f479-4c9c-871e-d18ac1317cad</serial> <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw"/> </disk> this results with Thread-97773::ERROR::2015-02-22 18:26:41,923::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`517dce2c-f140-4079-bf61-e7eb116887a1`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 3335, in _run self._connection.createXML(domxml, flags), File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/00000002-0002-0002-0002-00000000033d/f23b9542-85ac-4957-84f4-a8ed99154502/images/901e7fe4-f479-4c9c-871e-d18ac1317cad/b7e35ad0-e7f8-4958-a44d-495fa19725e2,if=none,id=drive-ide0-0-0,format=raw,serial=901e7fe4-f479-4c9c-871e-d18ac1317cad,cache=none,werror=stop,rerror=stop,aio=threads: Duplicate ID 'drive-ide0-0-0' for drive the error above refer to the older disk. i don't think that the engine should have cleared the first disk address because like to have stable addresses between boot cycles.. Michal what do you think?
looks like libvirt bug in disk assignment?
... <address bus="0" controller="0" target="0" type="drive" unit="0"/> <source file="/rhev/data-center/.../..."/> <target bus="ide" dev="hdb"/> ... This doesn't make any sense. The address is for hda while target is specified as hdb. If you then try to add an hda disk, libvirt will try to assign it the right address, which is currently occupied by this second disk. Whoever changed the target of the former hda to hdb should have also changed the address (or alternatively, should have removed it).
Could you attach the original domain XML and the XML vdsm generated after changing the boot order?
I don't have the reproducer anymore, as i can't keep them for months.
Omer, do you have the domain xmls from the first and second runs?
Let's hope VDSM guys can provide the XMLs then. So far looking at the only XML we got here (comment 2) the bug would be in VDSM which provided wrong XML.
Created attachment 1000437 [details] engine log of the scenario
Created attachment 1000439 [details] vdsm log
attached engine and vdsm log of the scenario, please let me know if anything else is needed vm name: ideVm vm id: 106acd37-f29b-468b-aa86-99db62c95070 first run on 2015-03-11 15:09:35 single ide disk, and ide cd-rom after this run, the engine db has: engine=> select device,is_managed,boot_order,address from vm_device where vm_id in (select vm_guid from vm_static where vm_name ilike 'ideVm') and type='disk'; device | is_managed | boot_order | address --------+------------+------------+----------------------------------------------------- disk | t | 1 | {bus=0, controller=0, type=drive, target=0, unit=0} cdrom | t | 0 | {bus=1, controller=0, type=drive, target=0, unit=0} after adding another disk and setting it to be bootable engine=> select device,is_managed,boot_order,address from vm_device where vm_id in (select vm_guid from vm_static where vm_name ilike 'ideVm') and type='disk'; device | is_managed | boot_order | address --------+------------+------------+----------------------------------------------------- disk | t | 0 | {bus=0, controller=0, type=drive, target=0, unit=0} disk | t | 1 | cdrom | t | 0 | {bus=1, controller=0, type=drive, target=0, unit=0} second run (attempt) time: 2015-03-11 15:18:16
Thanks, comment 5 is correct according to the vdsm.log. It seems the original hda disk was renamed as hdb but its address was left untouched and so it conflicts with the automatically assigned address of the new hda disk. VDSM should either add hdb without changing hda or change or even remove the address of the original hda when it renames it to hdb.
yep. Taking back, I suspect normalizeDrivesIndices and related stuff not taking into account the existing drives, hence flagging storage
*** Bug 1313406 has been marked as a duplicate of this bug. ***
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.
Tested with the following code: ---------------------------------------- rhevm-4.0.5-0.1.el7ev.noarch vdsm-4.18.15-1.el7ev.x86_64 Tested with the following scenario: Steps to Reproduce: 1. Create new VM with IDE disk, verify it starts 2. Add second IDE disk 3. Turn off bootable flag from 1st disk, make 2nd disk bootable 4. Start the VM - Now starts successfully Actual results: The VM now starts successfully Expected results: Moving to VERIFIED!