Bug 1325791

Summary: Live storage migration fail with ResourceUnavailable error
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: CoreAssignee: Nir Soffer <nsoffer>
Status: CLOSED WORKSFORME QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.18.0CC: amureini, bugs, nsoffer, tnisan, ylavi
Target Milestone: ovirt-3.6.7Flags: amureini: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-02 11:35:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nir Soffer 2016-04-11 08:28:04 UTC
Description of problem:

Live storage migration from 

In vdsm.log:

jsonrpc.Executor/2::DEBUG::2016-04-11 10:15:35,946::__init__::511::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.diskReplicateFinish' in bridge with {u'dstDisk': {u'dev
ice': u'disk', u'poolID': u'591475db-6fa9-455d-9c05-7f6e30fb06d5', u'volumeID': u'ef2f51d9-2f06-4ff2-816b-85611b68863a', u'domainID': u'f283d64e-b27c-414f-871b-a7a24ca608cb'
, u'imageID': u'feebf8ab-cc21-497d-903a-be59b79105a5'}, u'vmID': u'2bc754e2-1896-4193-a4ae-b9096caa935a', u'srcDisk': {u'device': u'disk', u'poolID': u'591475db-6fa9-455d-9c
05-7f6e30fb06d5', u'volumeID': u'ef2f51d9-2f06-4ff2-816b-85611b68863a', u'domainID': u'f283d64e-b27c-414f-871b-a7a24ca608cb', u'imageID': u'feebf8ab-cc21-497d-903a-be59b7910
5a5'}}

(No more logs in vdsm - this is another issue)

In engine log:

2016-04-11 10:12:32,998 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-8-thread-7) [220a7abc] Command 'VmReplicateD
iskFinishVDSCommand(HostName = voodoo6, VmReplicateDiskParameters:{runAsync='true', hostId='89887385-48b3-4648-b397-f2078519a61e', vmId='2bc754e2-1896-4193-a4ae-b9096caa935a
'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Resource unavailable, code = 40

In livbirt we see:

virsh # list
Please enter your authentication name: vdsm@ovirt
Please enter your password: 
 Id    Name                           State
----------------------------------------------------
 4     from-temp-iscsi                running

virsh # dumpxml 4
<domain type='kvm' id='4'>
  <name>from-temp-iscsi</name>
  <uuid>2bc754e2-1896-4193-a4ae-b9096caa935a</uuid>
  <metadata xmlns:ovirt="http://ovirt.org/vm/tune/1.0">
    <ovirt:qos/>
  </metadata>
  <maxMemory slots='16' unit='KiB'>4294967296</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static' current='1'>16</vcpu>
  <cputune>
    <shares>1020</shares>
  </cputune>
  <numatune>
    <memory mode='interleave' nodeset='0'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>oVirt</entry>
      <entry name='product'>oVirt Node</entry>
      <entry name='version'>7.2-9.el7</entry>
      <entry name='serial'>77A14806-F954-4D2A-AEA6-C098A4469927</entry>
      <entry name='uuid'>2bc754e2-1896-4193-a4ae-b9096caa935a</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>SandyBridge</model>
    <topology sockets='16' cores='1' threads='1'/>
    <numa>
      <cell id='0' cpus='0' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='variable' adjustment='0' basis='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source startupPolicy='optional'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <serial></serial>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/591475db-6fa9-455d-9c05-7f6e30fb06d5/f283d64e-b27c-414f-871b-a7a24ca608cb/images/feebf8ab-cc21-497d-903a-be59b79105a5/ef2f51d9-2f06-4ff2-816b-85611b68863a'/>
      <backingStore type='block' index='1'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/591475db-6fa9-455d-9c05-7f6e30fb06d5/f283d64e-b27c-414f-871b-a7a24ca608cb/images/feebf8ab-cc21-497d-903a-be59b79105a5/../feebf8ab-cc21-497d-903a-be59b79105a5/900b3b77-fb1e-4b9d-8688-2678a00c3444'/>
        <backingStore/>
      </backingStore>
      <mirror type='file' file='/rhev/data-center/591475db-6fa9-455d-9c05-7f6e30fb06d5/54546265-cd38-476d-95f2-3a8b274ed152/images/feebf8ab-cc21-497d-903a-be59b79105a5/ef2f51d9-2f06-4ff2-816b-85611b68863a' format='qcow2' job='copy' ready='yes'>
        <format type='qcow2'/>
        <source file='/rhev/data-center/591475db-6fa9-455d-9c05-7f6e30fb06d5/54546265-cd38-476d-95f2-3a8b274ed152/images/feebf8ab-cc21-497d-903a-be59b79105a5/ef2f51d9-2f06-4ff2-816b-85611b68863a'>
          <seclabel model='selinux' labelskip='yes'/>
        </source>
      </mirror>
      <target dev='vda' bus='virtio'/>
      <serial>feebf8ab-cc21-497d-903a-be59b79105a5</serial>
      <boot order='1'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='virtio-serial' index='0' ports='16'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='usb' index='0'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:1a:4a:16:01:55'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/2bc754e2-1896-4193-a4ae-b9096caa935a.com.redhat.rhevm.vdsm'/>
      <target type='virtio' name='com.redhat.rhevm.vdsm' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/2bc754e2-1896-4193-a4ae-b9096caa935a.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0' state='disconnected'/>
      <alias name='channel2'/>
      <address type='virtio-serial' controller='0' bus='0' port='3'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' port='5900' tlsPort='5901' autoport='yes' listen='10.35.0.110' passwdValidTo='2016-04-11T07:09:36' connected='disconnect'>
      <listen type='network' address='10.35.0.110' network='vdsm-ovirtmgmt'/>
    </graphics>
    <video>
      <model type='qxl' ram='65536' vram='32768' vgamem='16384' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='none'>
      <alias name='balloon0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c363,c924</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c363,c924</imagelabel>
  </seclabel>
</domain>

virsh # blockjob 4 vda --raw
 type=Block Copy
 bandwidth=0
 cur=1478950912
 end=1478950912

This is the failing check in vdsm/virt/vm.py:

        # Checking if we reached the replication mode ("mirroring" in libvirt
        # and qemu terms)
        if blkJobInfo['cur'] != blkJobInfo['end']:
            return response.error('unavail')                                                                                                                                 

May be:
- libvirt bug (changed api that vdsm depends on)
- engine bug (not handling ResourceUnavailable

Version-Release number of selected component (if applicable):
- ovirt-engine-3.6.0.3-1.fc22.noarch
- vdsm master 770f6d251616f3e0d18242a48f286b1d37d68149
- libvirt-daemon-1.2.17-13.el7_2.4.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Perform live storage migration

Actual results:
Migration fails, libvirt block jobs left running

Expected results:
block job aborted, migration suceeds

Additional info:

Issue appeared after adding rhel-7.z repo to host and updating host.

Todo:

- add logs showing libvirt response
- check with more recent engine
- check with older libvirt

Comment 1 Nir Soffer 2016-04-11 16:49:26 UTC
Reproducing is tricky, I improved the relevant logging and tried to reproduce
it again to understand what do we get from libvirt, but I cannot reproduce it.

Comment 2 Tal Nisan 2016-04-14 08:27:33 UTC
Nir, what are you doing with this bug? You wrote it's always reproducible and then you could not reproduce

Comment 3 Nir Soffer 2016-04-14 08:34:54 UTC
(In reply to Tal Nisan from comment #2)
> Nir, what are you doing with this bug? You wrote it's always reproducible
> and then you could not reproduce

I could reproduce it 2-3 times, and then I could not. We need more time to 
investigate this.

Comment 4 Nir Soffer 2016-04-20 20:42:22 UTC
Tested again with fresh install on ovirt-engine (3.6.4.1-1.el7.centos) and
updated rhel 7.2 hosts with vdsm master + https://gerrit.ovirt.org/56110.

Cannot reproduce. The attached patch improve logging so it this happens again,
we will have more info on the failure.

Comment 5 Yaniv Lavi 2016-05-02 11:35:02 UTC
Closing as works for me, please reopen if this happens again.