Bug 1397077

Summary: [z-stream clone - 4.0.5] VM fails to start after changing IDE disk boot order
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-engineAssignee: Fred Rolland <frolland>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.4CC: amureini, bazulay, dyuan, eedri, frolland, fromani, iheim, jbuchta, jdenemar, jsuchane, kgoldbla, lpeer, lsurette, michal.skrivanek, mkalinin, mzhan, ppostler, ratamir, rbalakri, Rhev-m-bugs, rhodain, srevivo, tnisan, xuzhang, ykaul, ylavi
Target Milestone: ovirt-4.0.5Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1186817 Environment:
Last Closed: 2016-11-24 09:40:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1186817    
Bug Blocks:    

Comment 1 rhev-integ 2016-11-21 14:32:49 UTC
Description of problem:

VM fails to start after changing IDE disks boot order

Version-Release number of selected component (if applicable):

rhevm-backend-3.4.4-2.2.el6ev.noarch

How reproducible:


Steps to Reproduce:
1. Create new VM with IDE disk, verify it starts
2. Add second IDE disk
3. Turn off bootable flag from 1st disk, make 2nd disk bootable
4. Start the VM

Actual results:

VM fails to start,

2015-01-28 15:47:40,801 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-30) Correlation ID: 846da54, Job ID: 66a0ed9b-c7c7-4822-b6f8-b89b56387d12, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM fed2 on Host 10.xx.xx.xx.
2015-01-28 15:47:40,888 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-4-thread-30) Lock Acquired to object EngineLock [exclusiveLocks= key: 35f3b562-cb69-4ee1-8b47-47256520182b value: VM
, sharedLocks= ]
2015-01-28 15:47:40,902 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (org.ovirt.thread.pool-4-thread-30) START, IsVmDuringInitiatingVDSCommand( vmId = 35f3b562-cb69-4ee1-8b47-47256520182b), log id: 3107c06
2015-01-28 15:47:40,902 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (org.ovirt.thread.pool-4-thread-30) FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 3107c06
2015-01-28 15:47:40,908 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-4-thread-30) CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT


Expected results:

VM starts

Additional info:

This comment was originaly posted by akotov

Comment 4 rhev-integ 2016-11-21 14:33:08 UTC
this might be a libvirt issue, i was able to reproduce this locally  using the steps described, and this is what i got:
engine send to vdsm create with 3 ide devices:
1 - cdrom, with address (assigned by libvirt from previous run)
2 - disk with address (assigned by libvirt from previous run)
3 - disk without address (the new disk) but with request to be boot (boot_order=1)

<disk device="cdrom" snapshot="no" type="file">
    <address bus="1" controller="0" target="0" type="drive" unit="0"/>
    <source file="" startupPolicy="optional"/>
    <target bus="ide" dev="hdc"/>
    <readonly/>
    <serial/>
</disk>
<disk device="disk" snapshot="no" type="file">
    <source file="/rhev/data-center/.../..."/>
    <target bus="ide" dev="hda"/>
    <serial>483d2327-84bb-4de6-90a9-d282d5237ae1</serial>
    <boot order="1"/>
    <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw"/>
</disk>
<disk device="disk" snapshot="no" type="file">
    <address bus="0" controller="0" target="0" type="drive" unit="0"/>
    <source file="/rhev/data-center/.../..."/>
    <target bus="ide" dev="hdb"/>
    <serial>901e7fe4-f479-4c9c-871e-d18ac1317cad</serial>
    <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw"/>
</disk>

this results with
Thread-97773::ERROR::2015-02-22 18:26:41,923::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`517dce2c-f140-4079-bf61-e7eb116887a1`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3335, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/00000002-0002-0002-0002-00000000033d/f23b9542-85ac-4957-84f4-a8ed99154502/images/901e7fe4-f479-4c9c-871e-d18ac1317cad/b7e35ad0-e7f8-4958-a44d-495fa19725e2,if=none,id=drive-ide0-0-0,format=raw,serial=901e7fe4-f479-4c9c-871e-d18ac1317cad,cache=none,werror=stop,rerror=stop,aio=threads: Duplicate ID 'drive-ide0-0-0' for drive

the error above refer to the older disk.
i don't think that the engine should have cleared the first disk address because like to have stable addresses between boot cycles..

Michal what do you think?

This comment was originaly posted by ofrenkel

Comment 5 rhev-integ 2016-11-21 14:33:18 UTC
looks like libvirt bug in disk assignment?

This comment was originaly posted by michal.skrivanek

Comment 7 rhev-integ 2016-11-21 14:33:38 UTC
...
    <address bus="0" controller="0" target="0" type="drive" unit="0"/>
    <source file="/rhev/data-center/.../..."/>
    <target bus="ide" dev="hdb"/>
...

This doesn't make any sense. The address is for hda while target is specified as hdb. If you then try to add an hda disk, libvirt will try to assign it the right address, which is currently occupied by this second disk. Whoever changed the target of the former hda to hdb should have also changed the address (or alternatively, should have removed it).

This comment was originaly posted by jdenemar

Comment 8 rhev-integ 2016-11-21 14:33:47 UTC
Could you attach the original domain XML and the XML vdsm generated after changing the boot order?

This comment was originaly posted by jdenemar

Comment 9 rhev-integ 2016-11-21 14:33:57 UTC
I don't have the reproducer anymore, as i can't keep them for months.

This comment was originaly posted by akotov

Comment 10 rhev-integ 2016-11-21 14:34:07 UTC
Omer, do you have the domain xmls from the first and second runs?

This comment was originaly posted by michal.skrivanek

Comment 11 rhev-integ 2016-11-21 14:34:17 UTC
Let's hope VDSM guys can provide the XMLs then. So far looking at the only XML we got here (comment 2) the bug would be in VDSM which provided wrong XML.

This comment was originaly posted by jdenemar

Comment 12 rhev-integ 2016-11-21 14:34:27 UTC
Created attachment 1000437 [details]
engine log of the scenario

This comment was originaly posted by ofrenkel

Comment 13 rhev-integ 2016-11-21 14:34:38 UTC
Created attachment 1000439 [details]
vdsm log

This comment was originaly posted by ofrenkel

Comment 14 rhev-integ 2016-11-21 14:34:47 UTC
attached engine and vdsm log of the scenario, please let me know if anything else is needed
vm name: ideVm
vm id: 106acd37-f29b-468b-aa86-99db62c95070

first run on 2015-03-11 15:09:35
single ide disk, and ide cd-rom

after this run, the engine db has:
engine=> select device,is_managed,boot_order,address from vm_device where vm_id in (select vm_guid from vm_static where vm_name ilike 'ideVm') and type='disk';
 device | is_managed | boot_order |                       address                       
--------+------------+------------+-----------------------------------------------------
 disk     | t                 |          1 | {bus=0, controller=0, type=drive, target=0, unit=0}
 cdrom  | t                |          0 | {bus=1, controller=0, type=drive, target=0, unit=0}


after adding another disk and setting it to be bootable
engine=> select device,is_managed,boot_order,address from vm_device where vm_id in (select vm_guid from vm_static where vm_name ilike 'ideVm') and type='disk';
 device | is_managed | boot_order |                       address                       
--------+------------+------------+-----------------------------------------------------
 disk   | t              |          0       | {bus=0, controller=0, type=drive, target=0, unit=0}
 disk   | t              |          1       | 
 cdrom  | t           |          0       | {bus=1, controller=0, type=drive, target=0, unit=0}


second run (attempt) time: 2015-03-11 15:18:16

This comment was originaly posted by ofrenkel

Comment 15 rhev-integ 2016-11-21 14:34:54 UTC
Thanks, comment 5 is correct according to the vdsm.log. It seems the original hda disk was renamed as hdb but its address was left untouched and so it conflicts with the automatically assigned address of the new hda disk. VDSM should either add hdb without changing hda or change or even remove the address of the original hda when it renames it to hdb.

This comment was originaly posted by jdenemar

Comment 16 rhev-integ 2016-11-21 14:35:03 UTC
yep. Taking back, I suspect normalizeDrivesIndices and related stuff not taking into account the existing drives, hence flagging storage

This comment was originaly posted by michal.skrivanek

Comment 18 rhev-integ 2016-11-21 14:35:22 UTC
*** Bug 1313406 has been marked as a duplicate of this bug. ***

This comment was originaly posted by michal.skrivanek

Comment 20 rhev-integ 2016-11-21 14:35:41 UTC
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

This comment was originaly posted by ydary

Comment 26 rhev-integ 2016-11-21 14:36:35 UTC
Tested with the following code:
----------------------------------------
rhevm-4.0.5-0.1.el7ev.noarch
vdsm-4.18.15-1.el7ev.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. Create new VM with IDE disk, verify it starts
2. Add second IDE disk
3. Turn off bootable flag from 1st disk, make 2nd disk bootable
4. Start the VM - Now starts successfully



Actual results:
The VM now starts successfully

Expected results:



Moving to VERIFIED!

This comment was originaly posted by kgoldbla

Comment 29 Allon Mureinik 2016-11-24 09:40:23 UTC
4.0.5 has been released, closing.

Comment 31 Marina Kalinin 2016-11-29 15:27:14 UTC
4.0.5 vdsm errata:
https://access.redhat.com/errata/RHBA-2016:2787

4.0.5-1 vdsm errata:
https://access.redhat.com/errata/RHBA-2016:2818

So, I guess, it is somewhere in there.
4.0.5-1 to be on the safe side.