Created attachment 1405914 [details] relevant engine , vdsm log Description of problem: Seen on PPC ENV. Running several automation tests that start VM with bootable disk + with several other disks +FS .Then stop the VM , do some operation like create snapshots or move a disk from one SD to another SD & start VM for the 2nd time -> caused it to failed with Engine error : 2018-03-07 19:11:45,631+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-0) [] EVENT_ID: VM_DOWN_ERROR(119), VM copy_disk_vm_iscsi is down with error. Exit m essage: XML error: target 'sdc' duplicated for disk sources '/rhev/data-center/mnt/blockSD/9d2e9fdb-2af9-4291-a669-3501563adb2a/images/4721eeb0-6fe2-45a5-9daa-8bbf5cf54f09/efd0bc8c-95b5-4171-a5f6-88dd57004d79' a nd '/rhev/data-center/mnt/blockSD/9d2e9fdb-2af9-4291-a669-3501563adb2a/images/4721eeb0-6fe2-45a5-9daa-8bbf5cf54f09/efd0bc8c-95b5-4171-a5f6-88dd57004d79'. VDSM ERROR: 2018-03-07 19:11:40,700+0200 ERROR (vm/642f82fe) [virt.vm] (vmId='642f82fe-8118-4faa-8d94-c489264dc1e1') The vm start process failed (vm:940) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2829, in _run dom = self._connection.defineXML(domxml) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in defineXML if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self) Version-Release number of selected component (if applicable): ovirt-engine-4.2.2.2-0.1.el7.noarch vdsm-4.20.20-1.el7ev.ppc64le How reproducible: high as it Occurred several times running copy disk + live/cold merge TC's Steps to Reproduce: One of the scenarios for example: Test Setup 1: Creating VM vm_TestCase18894_0720273583 Test Setup 2: Starting VM vm_TestCase18894_0720273583 Test Setup 3: [class] Start VM vm_TestCase18894_0720273583 with {'wait_for_ip': True, 'pause': False, 'use_cloud_init': False, 'timeout': 600, Test Setup 4: Creating disks with filesystem and attach to VM Test Step 5: Creating files on vm's 'vm_TestCase18894_0720273583' disks Test Step 6: Creating file /mount-point_0720301020/test_file_0 Test Step 7: Creating file /mount-point_0720303314/test_file_0 Test Step 8: Creating file /mount-point_0720305527/test_file_0 Test Step 9: Creating file /mount-point_0720311764/test_file_0 Test Step 10: Creating file test_file_0 Test Step 11: Creating file test_file_0 Test Step 12: Creating snapshot of vm vm_TestCase18894_0720273583 Test Step 13: Before snapshot: 7 volumes Test Step 14: Stop vm vm_TestCase18894_0720273583 with {'async': 'true'} Test Step 15: Adding new snapshot to vm vm_TestCase18894_0720273583 Test Step 16: Add snapshot to VM vm_TestCase18894_0720273583 with Test Step 17: Start VM FOR THE 2nd TIME!!! Actual results: Start VM fails . Expected results: Additional info: Issue did not occur on NON-PPC ENV. Occured via automation in the following TC's : Tests effected : 1) Copy disk - test_same_domain_same_alias[iscsi] 2) TestCase18894.test_basic_snapshot_deletion[iscsi] 3) TestCase18894.test_basic_snapshot_deletion[nfs] 4) TestCase6038.test_basic_snapshot_deletion[iscsi] 5) test_live_merge.TestCase6052.test_basic_snapshot_deletion_with_io[iscsi] 6) test_live_merge.TestCase6052.test_basic_snapshot_deletion_with_io[nfs]
Seems the scsi cd disk allocation doesn't work correctly and overlaps with disks
(In reply to Michal Skrivanek from comment #1) > Seems the scsi cd disk allocation doesn't work correctly and overlaps with > disks And is it PPC only?
Yes, we still use IDE CD-ROM on x86
Arik, btw, - cdDiskInterface in LibvirtVmXmlBuilder::writeDisks is not able to handle type "sata" which is a valid type (in osinfo, not in use by default) - switch(diskInterface) case VirtIO doesn't need to handle CDs as there is no virtio-block CDROM possible, but that switch doesn't have a "sata" case either.
I would also want to ask about Ia98b398a5b6c33c092fec40335b8ad63505f08ce which is only on master, you added "if (device == null || !device.isManaged()) {" to writeDisks - after your change we're skipping the cdrom index skips. the comment below should be changed too then, as it's only about the null case. also, we're not sending any unmanaged disks? why? anyway, shouldn't be relevant to the error, which should be reproducible with scsi cdrom(set in osinfo) and 2+ scsi disks
(In reply to Michal Skrivanek from comment #4) > Arik, btw, > - cdDiskInterface in LibvirtVmXmlBuilder::writeDisks is not able to handle > type "sata" which is a valid type (in osinfo, not in use by default) Right, that's a known limitation - we've never set SATA interface [1]. This can easily be supported though. > - switch(diskInterface) case VirtIO doesn't need to handle CDs as there is > no virtio-block CDROM possible, but that switch doesn't have a "sata" case > either. Note that CDs are written elsewhere [2]. This code only handles hard-disks and the index handling you see there is meant to skipping the index of the CD-ROM device. [1] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/storage/DiskInterface.java [2] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L1921
(In reply to Michal Skrivanek from comment #5) > I would also want to ask about Ia98b398a5b6c33c092fec40335b8ad63505f08ce > which is only on master, you added "if (device == null || > !device.isManaged()) {" to writeDisks - after your change we're skipping the > cdrom index skips. Right, no need to execute that code when the device is not being written at the end. > the comment below should be changed too then, as it's only about the null > case. Yes, recent changes made this comment redundant. Need to cleanup that part. > also, we're not sending any unmanaged disks? why? Yes, we still don't have all the information we need on unmanaged disks. Now that we have the domain XML at hand on the engine side, we may restore all the information we need but I guess the benefit of that would not be that high. Normally, we don't have unmanaged disks for a VM that was started by the engine. The only benefit I can think of is that it will enable us to start external VMs. > > anyway, shouldn't be relevant to the error, which should be reproducible > with scsi cdrom(set in osinfo) and 2+ scsi disks Posted a patch to fix the name (so we won't have an overlap between the CD and a disk), but that's not enough - the current logic of address assignment for SCSI devices ignore CDs - not sure if the previous logic supported it. Moving to Sharon to investigate.
> Posted a patch to fix the name (so we won't have an overlap between the CD > and a disk), but that's not enough - the current logic of address assignment > for SCSI devices ignore CDs - not sure if the previous logic supported it. > Moving to Sharon to investigate. Regarding SCSI CD's address assignment: -There is an address overlapping issue for SCSI CDs, but the same problem exists also in 4.1 so no regression. Meaning that address assignment for SCSI disks ignores the address assigned for SCSI CD's and address overlapping might happen. -Another issue to fix in 4.2 is setting the address for non payload CDs the same as in 4.1 (index should be set to 1 instead of 2).
please also add SATA, should be trivial (pehaps refactor and choose if to use DiskInterface or string (strings are nicer;-). Thanks
(In reply to Michal Skrivanek from comment #9) > please also add SATA, should be trivial (pehaps refactor and choose if to > use DiskInterface or string (strings are nicer;-). Thanks SATA is only enabled for q35 - so it is a 4.3 material, right?
q35 is, a small change to fix the class or unify the usage of class or string in LibvirtXmlBuilder should be easy to do now
Verify with: Engine version:4.2.3-0.1.el7 Host: OS Version: RHEL - 7.5 - 8.el7 Kernel Version: 3.10.0 - 862.el7.ppc64le KVM Version: 2.10.0 - 21.el7_5.2 LIBVIRT Version: libvirt-3.9.0-14.el7_5.3 VDSM Version: vdsm-4.20.26-1.el7ev Run the following steps: 1. Create VM with OS 2. Start VM, add snapshot with memory 3. Stop VM add snapshot without memory 4. Start VM 5. Start VM with snapshots 6. Stop VM, Start in pause mode 7. Stop VM, Start VM 8. (VM is UP) Hot plug: Disk, Nic 9. (VM is UP) Hot plug: CPU, memory PASS ALL
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.