Bug 1140323

Summary: fails to run VM - duplicate ID
Product: [Retired] oVirt Reporter: Dan Kenigsberg <danken>
Component: ovirt-engine-coreAssignee: Tomas Jelinek <tjelinek>
Status: CLOSED DUPLICATE QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: high    
Version: 3.5CC: bazulay, bugs, ecohen, exploit, gklein, iheim, istein, mavital, mgoldboi, michal.skrivanek, movciari, ofrenkel, rbalakri, tjelinek, yeylon
Target Milestone: ---   
Target Release: 3.4.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1109880 Environment:
Last Closed: 2014-09-15 11:11:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1109880    
Bug Blocks: 1118689    
Attachments:
Description Flags
extract of engine log
none
extract of vdsm log none

Description Dan Kenigsberg 2014-09-10 16:57:55 UTC
+++ This bug was initially created as a clone of Bug #1109880 +++

Description of problem:
when i'm trying to run VM, it fails with this error:
VM testvm is down with error. Exit message: internal error process exited while connecting to monitor: qemu-kvm: -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial=: Duplicate ID 'drive-ide0-1-0' for drive .

vdsm-4.14.7-3.el6ev in 3.4 compatibility mode
ovirt-engine-3.5.0-0.0.master.20140605145557.git3ddd2de.el6.noarch

Version-Release number of selected component (if applicable):

How reproducible:
always

Actual results:
VM fails to run

Expected results:
VM should run

Additional info:

--- Additional comment from  on 2014-06-16 17:36:27 IDT ---



--- Additional comment from Dan Kenigsberg on 2014-06-16 19:38:02 IDT ---

Engine specifies cdrom twice, with two different device ids.

{'index': '2', 'iface': 'ide', 'specParams': {'path': ''}, 'readonly': 'true', 'deviceId': '713938ee-bdd0-4a84-80d2-387e3b9e13f4', 'path': '', 'device': 'cdrom', 'shared': 'false', 'type': 'disk'},
{'index': '2', 'iface': 'ide', 'specParams': {'path': ''}, 'readonly': 'true', 'deviceId': 'cab66d14-1e9d-498c-82c3-0ab6545ac2c7', 'path': '', 'device': 'cdrom', 'shared': 'false', 'type': 'disk'}

--- Additional comment from  on 2014-06-17 13:39:24 IDT ---

any workaround for this?

--- Additional comment from Michal Skrivanek on 2014-06-19 12:13:03 IDT ---

(In reply to movciari from comment #0)
How reproducible:
always

always == always for this particular VM or for all VMs or…?

--- Additional comment from  on 2014-06-19 15:07:02 IDT ---

(In reply to Michal Skrivanek from comment #4)
> (In reply to movciari from comment #0)
> How reproducible:
> always
> 
> always == always for this particular VM or for all VMs or…?

for all VMs (at least on my setup)

--- Additional comment from Michal Skrivanek on 2014-06-19 15:09:07 IDT ---

even a new VM you create?
would you please include engine.log for that attempt?

--- Additional comment from  on 2014-06-19 17:17:12 IDT ---

in engine.log i already posted, i created VM called "minivm" on line 3824, and it failed to run around line 3935

--- Additional comment from Omer Frenkel on 2014-06-22 18:35:01 IDT ---

i have few questions to help me understand the root cause of the issue:
1. when creating the vm, do you select any iso?

2. is this a clean installation or upgrade?
i suspect you have something wrong with your blank template configuration
could you please attach the result of the following db query?

select type,device,is_managed,alias,spec_params from vm_device where vm_id = '00000000-0000-0000-0000-000000000000' order by device;

3. i'm interested to know if the duplicate device created on add vm or run,
can you please attach the result of this query as well? (replace <VM_NAME> with the new vm name):

select device,is_managed,alias,spec_params from vm_device where vm_id = (select vm_guid from vm_static where vm_name='<VM_NAME>') order by device;


thanks!

--- Additional comment from  on 2014-06-23 13:14:17 IDT ---

(In reply to Omer Frenkel from comment #8)
1. - i don't select any iso, i don't even have iso domain

2. clean install on a new VM

engine=# select type,device,is_managed,alias,spec_params from vm_device where vm_id = '00000000-0000-0000-0000-000000000000' order by device;
 type  | device | is_managed | alias |     spec_params      
-------+--------+------------+-------+----------------------
 video | cirrus | t          |       | { "vram" : "65536" }
(1 row)

3. 
on old vm:
engine=# select device,is_managed,alias,spec_params from vm_device where vm_id = (select vm_guid from vm_static where vm_name='minivm') order by device;
 device | is_managed | alias |     spec_params     
--------+------------+-------+---------------------
 bridge | t          |       | {
                             : }
 cdrom  | t          |       | {
                             :   "path" : ""
                             : }
 cdrom  | t          |       | {
                             :   "path" : ""
                             : }
 disk   | t          |       | 
 qxl    | t          |       | {
                             :   "vram" : "32768",
                             :   "heads" : "1"
                             : }
 qxl    | t          |       | {
                             :   "vram" : "32768",
                             :   "heads" : "1"
                             : }
(6 rows)
new vm i just created:
engine=# select device,is_managed,alias,spec_params from vm_device where vm_id = (select vm_guid from vm_static where vm_name='newvm') order by device;
 device | is_managed | alias |     spec_params     
--------+------------+-------+---------------------
 bridge | t          |       | {
                             : }
 cdrom  | t          |       | {
                             :   "path" : ""
                             : }
 cdrom  | t          |       | {
                             :   "path" : ""
                             : }
 disk   | t          |       | 
 qxl    | t          |       | {
                             :   "vram" : "32768",
                             :   "heads" : "1"
                             : }
 qxl    | t          |       | {
                             :   "vram" : "32768",
                             :   "heads" : "1"
                             : }
(6 rows)

--- Additional comment from Michal Skrivanek on 2014-06-26 15:24:12 IDT ---

seems not to be happening when *not* using instance types

--- Additional comment from Ilanit Stein on 2014-08-12 12:41:22 IDT ---

Verified on ovirt-engine 3.5-rc1

Created a VM both from template (with attached cd), and instance type (Large).
VM started successfully.

--- Additional comment from  on 2014-09-10 18:52:54 IDT ---

Hi,

This bug is also present on 3.4 and the patch needs to be backported.
This happens when using the blank template for creating a new VMs and modifying the advanced options like attaching a CDROM before booting the first time.

Thank you

Comment 1 Tomas Jelinek 2014-09-11 11:31:18 UTC
This bug has been caused by a regression introduced by http://gerrit.ovirt.org/#/c/27630

This problematic patch is part of 3.5 but is not in 3.4 so the regression does not affect 3.4 so there is no need to backport the http://gerrit.ovirt.org/#/c/29291 to 3.4

I have just tried:
- create a new VM from blank template and attach a CD to it
- create a new VM from a template containing the CD
- create a new VM with no CD just boot from disk
- create a new VM and boot from network

all of the above cases work without any issues.

If there indeed is a problem, it has to be a different issue.

(tested on the current origin/ovirt-engine-3.4 (e.g. on 468680520259bf688e724638453c4b6843af6874)).

@Omer: I don't think we should block the 3.4.4 on this issue, what do you think?

Comment 2 exploit 2014-09-12 10:52:44 UTC
(In reply to Tomas Jelinek from comment #1)
> This bug has been caused by a regression introduced by
> http://gerrit.ovirt.org/#/c/27630
> 
> This problematic patch is part of 3.5 but is not in 3.4 so the regression
> does not affect 3.4 so there is no need to backport the
> http://gerrit.ovirt.org/#/c/29291 to 3.4
> 
> I have just tried:
> - create a new VM from blank template and attach a CD to it
> - create a new VM from a template containing the CD
> - create a new VM with no CD just boot from disk
> - create a new VM and boot from network
> 
> all of the above cases work without any issues.
> 
> If there indeed is a problem, it has to be a different issue.
> 
> (tested on the current origin/ovirt-engine-3.4 (e.g. on
> 468680520259bf688e724638453c4b6843af6874)).
> 
> @Omer: I don't think we should block the 3.4.4 on this issue, what do you
> think?

Hi Tomas,

I'll try to be the most accurate.

I migrated from engine 3.2 (dreyou repo) to regular 3.3, then 3.4.
currently I use vdsm 4.14.11.2-0 on the host and 3.4.3 latest engine.
I'm using the qemu-kvm-0.12.1.2-2.415.el6_5.10 from the ovirt's jenkins for emulation.
In my engine I have 3 FC storage domains and three host clusters.
Then I start creating a new vm, from the blank template, only setting vm name and disk, and in advanced options, I attach any cd to install the OS. The vm start to boot on the first host of the cluster, and after a few second attempts to start on the following host, and finally fail to boot anywhere with the attached logs.
Whatever is the storage or the cluster or the host, the issue is the same. 
On the same datacenter I have hundred vms that were successfully created before upgraded to 3.4 and run fine. 
two workarounds are possible to make them boot : 
1) "run once"
2) run the first time without attaching any cd, stop it, and then attach the cd and boot it.

Log attachment is after.
Tell me if you need more infos.

Comment 3 exploit 2014-09-12 10:53:31 UTC
Created attachment 936918 [details]
extract of engine log

Comment 4 exploit 2014-09-12 10:54:22 UTC
Created attachment 936919 [details]
extract of vdsm log

Comment 5 Tomas Jelinek 2014-09-12 11:38:57 UTC
ok, adding the findings also here:

Hi,

I still can not simulate it. But looking into a code this could happen if your "blank" template has 2 devices. Could you please verify it by invoking this SQL Query:
select * from vm_device_view where vm_id = '00000000-0000-0000-0000-000000000000';

If it indeed returns 2 devices, than you are facing this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1075102

It is fixed for 3.5 (http://gerrit.ovirt.org/#/c/25684/) but not for 3.4.z 

@Omer: what do you say? Shell I backport the mentioned patch to 3.4.z?

Comment 6 exploit 2014-09-12 13:30:39 UTC
select * from vm_device_view where vm_id = '00000000-0000-0000-0000-000000000000';
              device_id               |                vm_id                 | type  | device | address | boot_order |     spec_
params      | is_managed | is_plugged | is_readonly | alias | custom_properties | snapshot_id 
--------------------------------------+--------------------------------------+-------+--------+---------+------------+----------
------------+------------+------------+-------------+-------+-------------------+-------------
 00000004-0004-0004-0004-000000000004 | 00000000-0000-0000-0000-000000000000 | video | qxl    |         |            | { "vram" 
: "65536" } | t          |            | f           |       |                   | 
 ec8e001e-6732-4792-b485-00c7361cf07d | 00000000-0000-0000-0000-000000000000 | sound | ich6   |         |          0 | {        
            | t          | t          | t           |       |                   | 
                                                                                                                     : }

So it seems to be the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=1075102

Comment 7 Tomas Jelinek 2014-09-12 13:48:52 UTC
yeah, it seems so. Moreover the second (sound) device seems to be malformed (the spec_params are supposed to be a json and it is just '{'). So something went wrong while doing the update...

As a walkaround you could just do:

delete from vm_device where device_id = 'ec8e001e-6732-4792-b485-00c7361cf07d';

and it should start working.

@Omer: shell I backport the http://gerrit.ovirt.org/#/c/25684/ ? I don't understand how that second device could make it to the DB during update...

Comment 8 exploit 2014-09-12 14:44:33 UTC
thanks for the workaround, now I can start any new vm with a cdrom for installing.
But I still could't boot vms I created just after the upgrade.
After searching a bit, I erased the duplicated media (cdrom) for those ones with the id of the vm and it also worked.
Thanks a lot for your help.

Comment 9 Omer Frenkel 2014-09-14 08:22:26 UTC
(In reply to Tomas Jelinek from comment #7)
> 
> @Omer: shell I backport the http://gerrit.ovirt.org/#/c/25684/ ? I don't
> understand how that second device could make it to the DB during update...

yes, thanks!

Comment 10 Omer Frenkel 2014-09-15 11:11:15 UTC
this clone is not right for 3.4 since its not the same issue,
the right issue is as in bug 1075102
since it wasn't in any build yet, i've moved it back to post as Tomas sent a backport, re-target to 3.4.4,  and made it block the 3.4.4 tracker.

im closing this bug as its not relevant for 3.4

please re-open if i got anything wrong.

*** This bug has been marked as a duplicate of bug 1109880 ***