Bug 1036111

Summary: Malformed libvirt XML is causing Storage Live Migration failure.
Product: Red Hat Enterprise Virtualization Manager Reporter: Amador Pahim <asegundo>
Component: vdsmAssignee: Amador Pahim <asegundo>
Status: CLOSED UPSTREAM QA Contact: Gadi Ickowicz <gickowic>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: acanan, amureini, asegundo, bazulay, hateya, iheim, lpeer, nlevinki, sbonazzo, scohen, sgotliv, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 3.4.0   
Hardware: All   
OS: Linux   
Whiteboard: storage
Fixed In Version: ovirt-3.4.0-beta3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-31 15:01:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 959705, 1034856    
Bug Blocks:    

Description Amador Pahim 2013-11-29 13:06:13 UTC
Description of problem:
A virtual machine with its disk in a NFS storage domain and a direct lun attached was started without option "snapshot=no" in libvirt XML Direct LUN section:

<disk type="block" device="disk">
 <driver name="qemu" type="raw" cache="none"/>
 <source dev="/dev/directlun/360010040993ef4e86b34209c00000000-afaec78d-6c53-4ef0-9aeb-927b006259b7"/>
 <target dev="vdb" bus="virtio"/>
 <alias name="virtio-disk1"/>
 <address type="pci" domain="0x0000" bus="0x00" slot="0x06" function="0x0"/>
</disk>

The consequent error when SLM is executed is (vdsm.log):

  libvirtError: unsupported configuration: source for disk 'vdb' is not a regular file; refusing to generate external snapshot name


Version-Release number of selected component (if applicable):
vdsm 4.10.2-25.1.el6ev

How reproducible:
Partially.

Steps to Reproduce:

A hack in vdsm code is needed to reproduce the effect (SLM failure), but the cause of malformed libvirt XML is unknown.

Reproducing the effect of a malformed libvirt XML:
1. Create a VM with a disk in a NFS Storage Domain.
2. Attach an iSCSI Direct LUN to it.
3. On Hypervisor, hack /usr/share/vdsm/libvirtvm.py code:

Remove:
1164         diskelem.setAttribute('snapshot', 'no')

Add:
1164         if deviceType == 'file':
1165             diskelem.setAttribute('snapshot', 'no')


4. Restart vdsmd service.
5. Start the virtual machine.

Actual results:
A VM without "snapshot=no" option in Direct LUN libvirt XML section is doomed to fail in Storage Live Migration.

Expected results:
- Point the root cause of malformed XML.
- Do not create such config again. Libvirt can't avoid, but we need to handle it someway.

Additional info:
- Due the SLM failure, user will face https://bugzilla.redhat.com/show_bug.cgi?id=1034856

Comment 3 Ayal Baron 2013-12-01 12:00:52 UTC
is there a hook adding this device or was it added properly from UI?
The code has had snapshot=no for all devices since 2011 so I'm not sure how this could happen without external intervention

Comment 4 Amador Pahim 2013-12-02 12:03:13 UTC
Ayal, thank you for the input.
In fact there is a hook based on upstream vdsm-hook-directlun adding Direct LUN. And it's buggy:

 def createDiskElement(domxml, devpath, lunid, options):
    '''
    <disk device="disk" type="block">
        <source dev="/dev/mapper/lunid"/>
        <target bus="virtio" dev="vda"/>
        <driver cache="none" error_policy="stop" name="qemu" type="raw"/>
    </disk>
    '''

    disk = domxml.createElement('disk')
    disk.setAttribute('device', 'disk')
    disk.setAttribute('type', 'block')

      ^^^^^^^^^^^^^^^^^ 


It should have a disk.setAttribute('snapshot', 'no') to avoid this issue.

Comment 7 Amador Pahim 2013-12-11 12:47:55 UTC
Merged upstream as 8869a5e5b5bcd219c82b9b92807bb95032d72d15