Bug 1571796 - VM fail to start with libvirtError: XML error: Multiple 'scsi' controllers with index '0'
Summary: VM fail to start with libvirtError: XML error: Multiple 'scsi' controllers wi...
Keywords:
Status: CLOSED DUPLICATE of bug 1570349
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.2.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Sharon Gratch
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-25 12:40 UTC by Michael Burman
Modified: 2018-05-03 10:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-03 10:21:13 UTC
oVirt Team: Virt
Embargoed:


Attachments (Terms of Use)
logs (756.50 KB, application/x-gzip)
2018-04-25 12:40 UTC, Michael Burman
no flags Details

Description Michael Burman 2018-04-25 12:40:16 UTC
Created attachment 1426616 [details]
logs

Description of problem:
vdsm fail to start with libvirtError: XML error: Multiple 'scsi' controllers with index '0'.

Scenario - 
Run VM on a host and reboot the host, when host is up, try to run this VM on this host, result - failed with:

2018-04-25 15:31:48,147+0300 ERROR (vm/f8abc451) [virt.vm] (vmId='f8abc451-ad7d-4916-ae0b-fca5cfa643bb') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2869, in _run
    dom = self._connection.defineXML(domxml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirtError: XML error: Multiple 'scsi' controllers with index '0'
2018-04-25 15:31:48,147+0300 INFO  (vm/f8abc451) [virt.vm] (vmId='f8abc451-ad7d-4916-ae0b-fca5cfa643bb') Changed state to Down: XML error: Multiple 'scsi' controllers with index '0' (code=1) (vm:1683)
2018-04-25 15:31:48,165+0300 INFO  (vm/f8abc451) [virt.vm] (vmId='f8abc451-ad7d-4916-ae0b-fca5cfa643bb') Stopping connection (guestagent:438)


Version-Release number of selected component (if applicable):
vdsm-4.20.26-1.el7ev.x86_64

How reproducible:
Seems to be 100% with the described scenario above

Steps to Reproduce:
1. Run VM, 
2. Reboot the host
3. When host is up, try to run the VM again on this host

Actual results:
Failed with - libvirtError: XML error: Multiple 'scsi' controllers with index '0'

Expected results:
Should run

Comment 1 Michal Skrivanek 2018-04-26 05:28:48 UTC
dupe of bug 1568399 with logs?

Michael, how old is this VM? Whenwws it created/run for the first time? Does the same happen for a newly created VM? The host restart shouldn’t be significant, it should behave the same when you just stop and start VM

Comment 2 Michael Burman 2018-04-26 08:14:50 UTC
(In reply to Michal Skrivanek from comment #1)
> dupe of bug 1568399 with logs?
> 
> Michael, how old is this VM? Whenwws it created/run for the first time? Does
> the same happen for a newly created VM? The host restart shouldn’t be
> significant, it should behave the same when you just stop and start VM

Hi Michal,
Indeed sounds like dup of BZ 1568399, it failed with the same error.
The vm is pretty old, created long time ago, my 4.2 env(this specific one) is running upgrades since the first d/s qe build, so i guess since then. Can't say exactly when first run. Where i can see that?
Now this VM can't longer run and fail with this error all the time, on all hosts in the cluster. I have the env available for you guys to take a look if you want. 
New VMs can start as expected.

Comment 3 Michal Skrivanek 2018-04-26 10:39:01 UTC
(In reply to Michael Burman from comment #2)
> (In reply to Michal Skrivanek from comment #1)
> > dupe of bug 1568399 with logs?
> > 
> > Michael, how old is this VM? Whenwws it created/run for the first time? Does
> > the same happen for a newly created VM? The host restart shouldn’t be
> > significant, it should behave the same when you just stop and start VM
> 
> Hi Michal,
> Indeed sounds like dup of BZ 1568399, it failed with the same error.
> The vm is pretty old, created long time ago, my 4.2 env(this specific one)
> is running upgrades since the first d/s qe build, so i guess since then.
> Can't say exactly when first run. Where i can see that?
> Now this VM can't longer run and fail with this error all the time, on all
> hosts in the cluster. I have the env available for you guys to take a look
> if you want. 
> New VMs can start as expected.

then it's safe to say it is the same issue as bug 1543833. All VMs created in some 4.2.2 builds will suffer from the same, we only have amanual workaround as mentioned there - edit,switch virtio scsi/virtio-block and back
In bug 1568399 we're waiting for logs, but so far I assume it's also the same issue and doesn't need any further change.

If you indeed do not see it happening for newer VMs or in VMs upgraded from 4.1 to 4.2.3+ I believe we can close it

Comment 4 Michael Burman 2018-04-28 09:26:26 UTC
(In reply to Michal Skrivanek from comment #3)
> (In reply to Michael Burman from comment #2)
> > (In reply to Michal Skrivanek from comment #1)
> > > dupe of bug 1568399 with logs?
> > > 
> > > Michael, how old is this VM? Whenwws it created/run for the first time? Does
> > > the same happen for a newly created VM? The host restart shouldn’t be
> > > significant, it should behave the same when you just stop and start VM
> > 
> > Hi Michal,
> > Indeed sounds like dup of BZ 1568399, it failed with the same error.
> > The vm is pretty old, created long time ago, my 4.2 env(this specific one)
> > is running upgrades since the first d/s qe build, so i guess since then.
> > Can't say exactly when first run. Where i can see that?
> > Now this VM can't longer run and fail with this error all the time, on all
> > hosts in the cluster. I have the env available for you guys to take a look
> > if you want. 
> > New VMs can start as expected.
> 
> then it's safe to say it is the same issue as bug 1543833. All VMs created
> in some 4.2.2 builds will suffer from the same, we only have amanual
> workaround as mentioned there - edit,switch virtio scsi/virtio-block and back
> In bug 1568399 we're waiting for logs, but so far I assume it's also the
> same issue and doesn't need any further change.
> 
> If you indeed do not see it happening for newer VMs or in VMs upgraded from
> 4.1 to 4.2.3+ I believe we can close it

Michal, the work around you suggested doesn't work for me. The VM can't start no matter what with the same error.

Comment 5 Sharon Gratch 2018-05-01 15:33:15 UTC
According to attached logs, I see that the following error occurred:
"managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='64de71e2-c745-4758-8fe4-259d854e682f', vmId='f8abc451-ad7d-4916-ae0b-fca5cfa643bb'}', device='virtio-scsi', type='CONTROLLER', specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}'"

That implies that the problematic VM run with 2 SCSI controllers due to previous invalid upgrade process. 
Those 2 SCSI controllers are:
1. device name='scsi', managed=False, plugged=True
2. device name=virtio-scsi, managed=True, Plugged=False

As long as you didn't edit this VM, you could run this VM without noticing this issue. But once you edited it, the un-plugged controller became plugged. So now you tried to run this VM with 2 plugged SCSI controllers that have the same index and therefore the VM failed to start:
1. device name='scsi', managed=False, pluggable=True
2. device name=virtio-scsi, managed=True, Pluggable=True

As a workaround, for solving it, please try to run this VM via Run Once for only one time. The first Run Once will fail but since the un-managed devices are removed then next runs (regular or Run once) will succeed and this VM will be cleaned from this un-managed controller.

Comment 6 Michael Burman 2018-05-02 05:59:01 UTC
(In reply to Sharon Gratch from comment #5)
> According to attached logs, I see that the following error occurred:
> "managed non pluggable device was removed unexpectedly from libvirt:
> 'VmDevice:{id='VmDeviceId:{deviceId='64de71e2-c745-4758-8fe4-259d854e682f',
> vmId='f8abc451-ad7d-4916-ae0b-fca5cfa643bb'}', device='virtio-scsi',
> type='CONTROLLER', specParams='[]', address='', managed='true',
> plugged='false', readOnly='false', deviceAlias='', customProperties='[]',
> snapshotId='null', logicalName='null', hostDevice='null'}'"
> 
> That implies that the problematic VM run with 2 SCSI controllers due to
> previous invalid upgrade process. 
> Those 2 SCSI controllers are:
> 1. device name='scsi', managed=False, plugged=True
> 2. device name=virtio-scsi, managed=True, Plugged=False
> 
> As long as you didn't edit this VM, you could run this VM without noticing
> this issue. But once you edited it, the un-plugged controller became
> plugged. So now you tried to run this VM with 2 plugged SCSI controllers
> that have the same index and therefore the VM failed to start:
> 1. device name='scsi', managed=False, pluggable=True
> 2. device name=virtio-scsi, managed=True, Pluggable=True
> 
> As a workaround, for solving it, please try to run this VM via Run Once for
> only one time. The first Run Once will fail but since the un-managed devices
> are removed then next runs (regular or Run once) will succeed and this VM
> will be cleaned from this un-managed controller.

Thank yo Sharon, it helped, VM can run now)

Comment 7 Sharon Gratch 2018-05-02 10:44:22 UTC
So in reply to Michal Skrivanek from comment #3:
> then it's safe to say it is the same issue as bug 1543833. All VMs created
> in some 4.2.2 builds will suffer from the same, we only have amanual
> workaround as mentioned there - edit,switch virtio scsi/virtio-block and back
> In bug 1568399 we're waiting for logs, but so far I assume it's also the
> same issue and doesn't need any further change.
> 
> If you indeed do not see it happening for newer VMs or in VMs upgraded from
> 4.1 to 4.2.3+ I believe we can close it

Michael, can we close this bug?

Comment 8 Michael Burman 2018-05-02 11:59:22 UTC
(In reply to Sharon Gratch from comment #7)
> So in reply to Michal Skrivanek from comment #3:
> > then it's safe to say it is the same issue as bug 1543833. All VMs created
> > in some 4.2.2 builds will suffer from the same, we only have amanual
> > workaround as mentioned there - edit,switch virtio scsi/virtio-block and back
> > In bug 1568399 we're waiting for logs, but so far I assume it's also the
> > same issue and doesn't need any further change.
> > 
> > If you indeed do not see it happening for newer VMs or in VMs upgraded from
> > 4.1 to 4.2.3+ I believe we can close it
> 
> Michael, can we close this bug?

From my side(qe side) no, it's a real bug, i don't feel ok with this. But if you(dev) think we should/need close it(based on Michal's comment#7) i won't fight about it).
If deciding to close, then the resolution of close should be everything except NOTABUG.
Thanks,

Comment 9 Michal Skrivanek 2018-05-03 10:21:13 UTC
the invalid upgrade triggers the same problem as in bug 1570349

*** This bug has been marked as a duplicate of bug 1570349 ***


Note You need to log in before you can comment on or make changes to this bug.