Bug 1186817 - VM fails to start after changing IDE disk boot order
Summary: VM fails to start after changing IDE disk boot order
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.1.0-alpha
: ---
Assignee: Fred Rolland
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
: 1313406 (view as bug list)
Depends On: ovirt_refactor_disk_class_hierarchy
Blocks: 1397077
TreeView+ depends on / blocked
 
Reported: 2015-01-28 15:18 UTC by akotov
Modified: 2019-12-16 04:38 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1397077 (view as bug list)
Environment:
Last Closed: 2017-04-25 00:48:32 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine log of the scenario (46.06 KB, text/plain)
2015-03-11 13:28 UTC, Omer Frenkel
no flags Details
vdsm log (799.31 KB, text/plain)
2015-03-11 13:29 UTC, Omer Frenkel
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2184781 0 None None None 2016-03-15 15:16:43 UTC
Red Hat Product Errata RHEA-2017:0997 0 normal SHIPPED_LIVE Red Hat Virtualization Manager (ovirt-engine) 4.1 GA 2017-04-18 20:11:26 UTC
oVirt gerrit 63726 0 master MERGED engine: Clear IDE disk address on bootable change 2016-09-18 11:42:25 UTC
oVirt gerrit 64101 0 ovirt-engine-4.0 MERGED engine: Clear IDE disk address on bootable change 2016-09-18 15:46:17 UTC

Description akotov 2015-01-28 15:18:15 UTC
Description of problem:

VM fails to start after changing IDE disks boot order

Version-Release number of selected component (if applicable):

rhevm-backend-3.4.4-2.2.el6ev.noarch

How reproducible:


Steps to Reproduce:
1. Create new VM with IDE disk, verify it starts
2. Add second IDE disk
3. Turn off bootable flag from 1st disk, make 2nd disk bootable
4. Start the VM

Actual results:

VM fails to start,

2015-01-28 15:47:40,801 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-30) Correlation ID: 846da54, Job ID: 66a0ed9b-c7c7-4822-b6f8-b89b56387d12, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM fed2 on Host 10.xx.xx.xx.
2015-01-28 15:47:40,888 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-4-thread-30) Lock Acquired to object EngineLock [exclusiveLocks= key: 35f3b562-cb69-4ee1-8b47-47256520182b value: VM
, sharedLocks= ]
2015-01-28 15:47:40,902 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (org.ovirt.thread.pool-4-thread-30) START, IsVmDuringInitiatingVDSCommand( vmId = 35f3b562-cb69-4ee1-8b47-47256520182b), log id: 3107c06
2015-01-28 15:47:40,902 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (org.ovirt.thread.pool-4-thread-30) FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 3107c06
2015-01-28 15:47:40,908 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-4-thread-30) CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT


Expected results:

VM starts

Additional info:

Comment 2 Omer Frenkel 2015-02-22 16:47:45 UTC
this might be a libvirt issue, i was able to reproduce this locally  using the steps described, and this is what i got:
engine send to vdsm create with 3 ide devices:
1 - cdrom, with address (assigned by libvirt from previous run)
2 - disk with address (assigned by libvirt from previous run)
3 - disk without address (the new disk) but with request to be boot (boot_order=1)

<disk device="cdrom" snapshot="no" type="file">
    <address bus="1" controller="0" target="0" type="drive" unit="0"/>
    <source file="" startupPolicy="optional"/>
    <target bus="ide" dev="hdc"/>
    <readonly/>
    <serial/>
</disk>
<disk device="disk" snapshot="no" type="file">
    <source file="/rhev/data-center/.../..."/>
    <target bus="ide" dev="hda"/>
    <serial>483d2327-84bb-4de6-90a9-d282d5237ae1</serial>
    <boot order="1"/>
    <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw"/>
</disk>
<disk device="disk" snapshot="no" type="file">
    <address bus="0" controller="0" target="0" type="drive" unit="0"/>
    <source file="/rhev/data-center/.../..."/>
    <target bus="ide" dev="hdb"/>
    <serial>901e7fe4-f479-4c9c-871e-d18ac1317cad</serial>
    <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw"/>
</disk>

this results with
Thread-97773::ERROR::2015-02-22 18:26:41,923::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`517dce2c-f140-4079-bf61-e7eb116887a1`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3335, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/00000002-0002-0002-0002-00000000033d/f23b9542-85ac-4957-84f4-a8ed99154502/images/901e7fe4-f479-4c9c-871e-d18ac1317cad/b7e35ad0-e7f8-4958-a44d-495fa19725e2,if=none,id=drive-ide0-0-0,format=raw,serial=901e7fe4-f479-4c9c-871e-d18ac1317cad,cache=none,werror=stop,rerror=stop,aio=threads: Duplicate ID 'drive-ide0-0-0' for drive

the error above refer to the older disk.
i don't think that the engine should have cleared the first disk address because like to have stable addresses between boot cycles..

Michal what do you think?

Comment 3 Michal Skrivanek 2015-03-06 12:33:19 UTC
looks like libvirt bug in disk assignment?

Comment 5 Jiri Denemark 2015-03-06 12:47:16 UTC
...
    <address bus="0" controller="0" target="0" type="drive" unit="0"/>
    <source file="/rhev/data-center/.../..."/>
    <target bus="ide" dev="hdb"/>
...

This doesn't make any sense. The address is for hda while target is specified as hdb. If you then try to add an hda disk, libvirt will try to assign it the right address, which is currently occupied by this second disk. Whoever changed the target of the former hda to hdb should have also changed the address (or alternatively, should have removed it).

Comment 6 Jiri Denemark 2015-03-06 12:58:09 UTC
Could you attach the original domain XML and the XML vdsm generated after changing the boot order?

Comment 7 akotov 2015-03-06 13:00:21 UTC
I don't have the reproducer anymore, as i can't keep them for months.

Comment 8 Michal Skrivanek 2015-03-06 13:18:04 UTC
Omer, do you have the domain xmls from the first and second runs?

Comment 9 Jiri Denemark 2015-03-06 13:23:25 UTC
Let's hope VDSM guys can provide the XMLs then. So far looking at the only XML we got here (comment 2) the bug would be in VDSM which provided wrong XML.

Comment 10 Omer Frenkel 2015-03-11 13:28:48 UTC
Created attachment 1000437 [details]
engine log of the scenario

Comment 11 Omer Frenkel 2015-03-11 13:29:29 UTC
Created attachment 1000439 [details]
vdsm log

Comment 12 Omer Frenkel 2015-03-11 13:31:04 UTC
attached engine and vdsm log of the scenario, please let me know if anything else is needed
vm name: ideVm
vm id: 106acd37-f29b-468b-aa86-99db62c95070

first run on 2015-03-11 15:09:35
single ide disk, and ide cd-rom

after this run, the engine db has:
engine=> select device,is_managed,boot_order,address from vm_device where vm_id in (select vm_guid from vm_static where vm_name ilike 'ideVm') and type='disk';
 device | is_managed | boot_order |                       address                       
--------+------------+------------+-----------------------------------------------------
 disk     | t                 |          1 | {bus=0, controller=0, type=drive, target=0, unit=0}
 cdrom  | t                |          0 | {bus=1, controller=0, type=drive, target=0, unit=0}


after adding another disk and setting it to be bootable
engine=> select device,is_managed,boot_order,address from vm_device where vm_id in (select vm_guid from vm_static where vm_name ilike 'ideVm') and type='disk';
 device | is_managed | boot_order |                       address                       
--------+------------+------------+-----------------------------------------------------
 disk   | t              |          0       | {bus=0, controller=0, type=drive, target=0, unit=0}
 disk   | t              |          1       | 
 cdrom  | t           |          0       | {bus=1, controller=0, type=drive, target=0, unit=0}


second run (attempt) time: 2015-03-11 15:18:16

Comment 13 Jiri Denemark 2015-03-11 13:47:49 UTC
Thanks, comment 5 is correct according to the vdsm.log. It seems the original hda disk was renamed as hdb but its address was left untouched and so it conflicts with the automatically assigned address of the new hda disk. VDSM should either add hdb without changing hda or change or even remove the address of the original hda when it renames it to hdb.

Comment 14 Michal Skrivanek 2015-03-11 14:04:02 UTC
yep. Taking back, I suspect normalizeDrivesIndices and related stuff not taking into account the existing drives, hence flagging storage

Comment 16 Michal Skrivanek 2016-03-15 15:16:58 UTC
*** Bug 1313406 has been marked as a duplicate of this bug. ***

Comment 18 Yaniv Lavi 2016-05-09 10:59:43 UTC
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Comment 24 Kevin Alon Goldblatt 2016-10-05 12:44:20 UTC
Tested with the following code:
----------------------------------------
rhevm-4.0.5-0.1.el7ev.noarch
vdsm-4.18.15-1.el7ev.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. Create new VM with IDE disk, verify it starts
2. Add second IDE disk
3. Turn off bootable flag from 1st disk, make 2nd disk bootable
4. Start the VM - Now starts successfully



Actual results:
The VM now starts successfully

Expected results:



Moving to VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.