Bug 1553305 - [PPC] - Starting VM for the 2nd time failed after snapshots created- XML error: target 'sdc' duplicated for disk sources - libvirt.py", line 3676, in defineXML
Summary: [PPC] - Starting VM for the 2nd time failed after snapshots created- XML erro...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.3
: 4.2.3
Assignee: Sharon Gratch
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-08 16:26 UTC by Avihai
Modified: 2018-05-10 06:26 UTC (History)
5 users (show)

Fixed In Version: ovirt-engine-4.2.3
Clone Of:
Environment:
Last Closed: 2018-05-10 06:26:53 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
relevant engine , vdsm log (740.85 KB, application/x-gzip)
2018-03-08 16:26 UTC, Avihai
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 88760 0 master MERGED core: prevent overlapping names for scsi cd-rom and disks 2020-10-26 15:17:08 UTC
oVirt gerrit 88998 0 master MERGED core: prevent overlapping addresses for scsi cd-rom and disks 2020-10-26 15:17:23 UTC
oVirt gerrit 89424 0 ovirt-engine-4.2 MERGED core: prevent overlapping addresses for scsi cd-rom and disks 2020-10-26 15:17:08 UTC
oVirt gerrit 89425 0 ovirt-engine-4.2 MERGED core: prevent overlapping names for scsi cd-rom and disks 2020-10-26 15:17:08 UTC

Description Avihai 2018-03-08 16:26:09 UTC
Created attachment 1405914 [details]
relevant engine , vdsm log

Description of problem:
Seen on PPC ENV.

Running several automation tests that start VM with bootable disk + with several other disks +FS .Then stop the VM  , do some operation like create snapshots or  move a disk from one SD to another SD & start VM for the 2nd time -> caused it to failed with Engine error :

2018-03-07 19:11:45,631+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-0) [] EVENT_ID: VM_DOWN_ERROR(119), VM copy_disk_vm_iscsi is down with error. Exit m
essage: XML error: target 'sdc' duplicated for disk sources '/rhev/data-center/mnt/blockSD/9d2e9fdb-2af9-4291-a669-3501563adb2a/images/4721eeb0-6fe2-45a5-9daa-8bbf5cf54f09/efd0bc8c-95b5-4171-a5f6-88dd57004d79' a
nd '/rhev/data-center/mnt/blockSD/9d2e9fdb-2af9-4291-a669-3501563adb2a/images/4721eeb0-6fe2-45a5-9daa-8bbf5cf54f09/efd0bc8c-95b5-4171-a5f6-88dd57004d79'.

VDSM ERROR:
2018-03-07 19:11:40,700+0200 ERROR (vm/642f82fe) [virt.vm] (vmId='642f82fe-8118-4faa-8d94-c489264dc1e1') The vm start process failed (vm:940)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2829, in _run
    dom = self._connection.defineXML(domxml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.2.2-0.1.el7.noarch
vdsm-4.20.20-1.el7ev.ppc64le


How reproducible:
high as it Occurred several times running copy disk + live/cold merge TC's 

Steps to Reproduce:
One of the scenarios for example:
Test Setup   1: Creating VM vm_TestCase18894_0720273583
Test Setup   2: Starting VM vm_TestCase18894_0720273583
Test Setup   3: [class] Start VM vm_TestCase18894_0720273583 with {'wait_for_ip': True, 'pause': False, 'use_cloud_init': False, 'timeout': 600, Test Setup   4: Creating disks with filesystem and attach to VM 
Test Step   5: Creating files on vm's 'vm_TestCase18894_0720273583' disks
Test Step   6: Creating file /mount-point_0720301020/test_file_0
Test Step   7: Creating file /mount-point_0720303314/test_file_0
Test Step   8: Creating file /mount-point_0720305527/test_file_0
Test Step   9: Creating file /mount-point_0720311764/test_file_0
Test Step  10: Creating file test_file_0
Test Step  11: Creating file test_file_0
Test Step  12: Creating snapshot of vm vm_TestCase18894_0720273583
Test Step  13: Before snapshot: 7 volumes
Test Step  14: Stop vm vm_TestCase18894_0720273583 with {'async': 'true'}
Test Step  15: Adding new snapshot to vm vm_TestCase18894_0720273583
Test Step  16: Add snapshot to VM vm_TestCase18894_0720273583 with 
Test Step  17: Start VM FOR THE 2nd TIME!!!


Actual results:
Start VM fails .

Expected results:


Additional info:
Issue did not occur on NON-PPC ENV.

Occured via automation in the following TC's :
Tests effected :
1)  Copy disk - test_same_domain_same_alias[iscsi]
2) TestCase18894.test_basic_snapshot_deletion[iscsi]
3) TestCase18894.test_basic_snapshot_deletion[nfs]
4) TestCase6038.test_basic_snapshot_deletion[iscsi]
5) test_live_merge.TestCase6052.test_basic_snapshot_deletion_with_io[iscsi]
6) test_live_merge.TestCase6052.test_basic_snapshot_deletion_with_io[nfs]

Comment 1 Michal Skrivanek 2018-03-09 06:30:49 UTC
Seems the scsi cd disk allocation doesn't work correctly and overlaps with disks

Comment 2 Yaniv Kaul 2018-03-09 07:01:58 UTC
(In reply to Michal Skrivanek from comment #1)
> Seems the scsi cd disk allocation doesn't work correctly and overlaps with
> disks

And is it PPC only?

Comment 3 Michal Skrivanek 2018-03-09 07:03:15 UTC
Yes, we still use IDE CD-ROM on x86

Comment 4 Michal Skrivanek 2018-03-09 07:59:46 UTC
Arik, btw,
- cdDiskInterface in LibvirtVmXmlBuilder::writeDisks is not able to handle type "sata" which is a valid type (in osinfo, not in use by default)
- switch(diskInterface) case VirtIO doesn't need to handle CDs as there is no virtio-block CDROM possible, but that switch doesn't have a "sata" case either.

Comment 5 Michal Skrivanek 2018-03-09 08:34:13 UTC
I would also want to ask about Ia98b398a5b6c33c092fec40335b8ad63505f08ce which is only on master, you added "if (device == null || !device.isManaged()) {" to writeDisks - after your change we're skipping the cdrom index skips.
the comment below should be changed too then, as it's only about the null case.
also, we're not sending any unmanaged disks? why?

anyway, shouldn't be relevant to the error, which should be reproducible with scsi cdrom(set in osinfo) and 2+ scsi disks

Comment 6 Arik 2018-03-11 09:31:27 UTC
(In reply to Michal Skrivanek from comment #4)
> Arik, btw,
> - cdDiskInterface in LibvirtVmXmlBuilder::writeDisks is not able to handle
> type "sata" which is a valid type (in osinfo, not in use by default)

Right, that's a known limitation - we've never set SATA interface [1].
This can easily be supported though.

> - switch(diskInterface) case VirtIO doesn't need to handle CDs as there is
> no virtio-block CDROM possible, but that switch doesn't have a "sata" case
> either.

Note that CDs are written elsewhere [2].
This code only handles hard-disks and the index handling you see there is meant to skipping the index of the CD-ROM device.

[1] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/storage/DiskInterface.java
[2] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L1921

Comment 7 Arik 2018-03-11 11:51:19 UTC
(In reply to Michal Skrivanek from comment #5)
> I would also want to ask about Ia98b398a5b6c33c092fec40335b8ad63505f08ce
> which is only on master, you added "if (device == null ||
> !device.isManaged()) {" to writeDisks - after your change we're skipping the
> cdrom index skips.

Right, no need to execute that code when the device is not being written at the end.

> the comment below should be changed too then, as it's only about the null
> case.

Yes, recent changes made this comment redundant. Need to cleanup that part.

> also, we're not sending any unmanaged disks? why?

Yes, we still don't have all the information we need on unmanaged disks.
Now that we have the domain XML at hand on the engine side, we may restore all the information we need but I guess the benefit of that would not be that high. Normally, we don't have unmanaged disks for a VM that was started by the engine. The only benefit I can think of is that it will enable us to start external VMs.

> 
> anyway, shouldn't be relevant to the error, which should be reproducible
> with scsi cdrom(set in osinfo) and 2+ scsi disks

Posted a patch to fix the name (so we won't have an overlap between the CD and a disk), but that's not enough - the current logic of address assignment for SCSI devices ignore CDs - not sure if the previous logic supported it. Moving to Sharon to investigate.

Comment 8 Sharon Gratch 2018-03-12 16:18:47 UTC
> Posted a patch to fix the name (so we won't have an overlap between the CD
> and a disk), but that's not enough - the current logic of address assignment
> for SCSI devices ignore CDs - not sure if the previous logic supported it.
> Moving to Sharon to investigate.
Regarding SCSI CD's address assignment:
-There is an address overlapping issue for SCSI CDs, but the same problem exists also in 4.1 so no regression. Meaning that address assignment for SCSI disks ignores the address assigned for SCSI CD's and address overlapping might happen.

-Another issue to fix in 4.2 is setting the address for non payload CDs the same as in 4.1 (index should be set to 1 instead of 2).

Comment 9 Michal Skrivanek 2018-03-13 15:24:43 UTC
please also add SATA, should be trivial (pehaps refactor and choose if to use DiskInterface or string (strings are nicer;-). Thanks

Comment 10 Arik 2018-03-13 15:33:11 UTC
(In reply to Michal Skrivanek from comment #9)
> please also add SATA, should be trivial (pehaps refactor and choose if to
> use DiskInterface or string (strings are nicer;-). Thanks

SATA is only enabled for q35 - so it is a 4.3 material, right?

Comment 11 Michal Skrivanek 2018-03-13 15:36:19 UTC
q35 is, a small change to fix the class or unify the usage of class or string in LibvirtXmlBuilder should be easy to do now

Comment 12 Israel Pinto 2018-04-22 14:18:03 UTC
Verify with:
Engine version:4.2.3-0.1.el7
Host:
OS Version: RHEL - 7.5 - 8.el7
Kernel Version: 3.10.0 - 862.el7.ppc64le
KVM Version: 2.10.0 - 21.el7_5.2
LIBVIRT Version: libvirt-3.9.0-14.el7_5.3
VDSM Version: vdsm-4.20.26-1.el7ev

Run the following steps:
1. Create VM with OS
2. Start VM, add snapshot with memory 
3. Stop VM add snapshot without memory
4. Start VM
5. Start VM with snapshots
6. Stop VM, Start in pause mode
7. Stop VM, Start VM
8. (VM is UP) Hot plug: Disk, Nic 
9. (VM is UP) Hot plug: CPU, memory

PASS ALL

Comment 13 Sandro Bonazzola 2018-05-10 06:26:53 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.