Bug 1572895

Summary: Live storage migration's diskReplicateFinish sometimes fails with 'Resource unavailable'
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: CoreAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED DUPLICATE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: medium    
Version: 4.20.23CC: bugs, tnisan
Target Milestone: ovirt-4.3.1Keywords: Automation
Target Release: ---Flags: ylavi: ovirt-4.3+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-01 16:00:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Elad 2018-04-28 12:59:19 UTC
Created attachment 1428065 [details]
logs

Description of problem:
A live storage migration operation sometimes fails with 'Resource unavailable' on vdsm.


Version-Release number of selected component (if applicable):
vdsm-4.20.27-1.el7ev.x86_64
libvirt-3.9.0-14.el7_5.3.x86_64
qemu-img-rhev-2.10.0-21.el7_5.2.x86_64
glusterfs-3.8.4-54.6.el7.x86_64
sanlock-3.6.0-1.el7.x86_64
RHEL7.5
kernel 3.10.0-862.el7.x86_64
ovirt-engine-4.2.3.3-0.1.el7.noarch

How reproducible:


Steps to Reproduce:
1. Create a VM and attatch 4 disks to it reside on iSCSI
2. Start the VM
3. Migrate all disks to different type storage domains (NFS or Gluster)

Actual results:

2018-04-27 21:44:16,971+0300 INFO  (jsonrpc/1) [api.virt] START diskReplicateFinish(srcDisk={u'device': u'disk', u'poolID': u'b21d3b46-4a16-11e8-9dfd-00163e7be000', u'volumeID': u'728b3222-45f4-483a-adc2-e56cbc7
37750', u'domainID': u'3b804599-2145-411f-961c-8027c4b3bd82', u'imageID': u'47379013-8220-4fb6-a49d-3a00781ee41f'}, dstDisk={u'device': u'disk', u'poolID': u'b21d3b46-4a16-11e8-9dfd-00163e7be000', u'volumeID': u
'728b3222-45f4-483a-adc2-e56cbc737750', u'domainID': u'd39cefc8-0138-4960-997d-f8ed132c618a', u'imageID': u'47379013-8220-4fb6-a49d-3a00781ee41f'}) from=::ffff:10.46.16.190,57308, flow_id=disks_syncAction_661ec4
31-2cfe-44f9, vmId=432dd185-08ef-4afc-a9e6-daecf8f370ee (api:46)
2018-04-27 21:44:16,973+0300 ERROR (jsonrpc/1) [virt.vm] (vmId='432dd185-08ef-4afc-a9e6-daecf8f370ee') Replication job unfinished (drive: 'vda', srcDisk: {u'device': u'disk', u'poolID': u'b21d3b46-4a16-11e8-9dfd-00163e7be000', u'volumeID': u'728b3222-45f4-483a-adc2-e56cbc737750', u'domainID': u'3b804599-2145-411f-961c-8027c4b3bd82', u'imageID': u'47379013-8220-4fb6-a49d-3a00781ee41f'}, job: {'end': 786432L, 'bandwidth': 0L, 'type': 2, 'cur': 655360L}) (vm:4685)
2018-04-27 21:44:16,973+0300 INFO  (jsonrpc/1) [api.virt] FINISH diskReplicateFinish return={'status': {'message': 'Resource unavailable', 'code': 40}} from=::ffff:10.46.16.190,57308, flow_id=disks_syncAction_661ec431-2cfe-44f9, vmId=432dd185-08ef-4afc-a9e6-daecf8f370ee (api:52)


engine.log:

2018-04-27 21:44:16,992+03 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [disks_syncAction_661ec431-2cfe-44f9] Failed VmReplicateDiskFinish (Disk '47379013-8220-4fb6-a49d-3a00781ee41f', VM '432dd185-08ef-4afc-a9e6-daecf8f370ee')
2018-04-27 21:44:16,992+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [disks_syncAction_661ec431-2cfe-44f9] Command 'LiveMigrateDisk' id: 'dfbea590-34f1-4b76-ab5d-e943725fa6b0' with children [46f74fa8-4c9f-4e6c-9acb-6e25d821c2ff, dad80a51-316a-4e32-86ff-156086962ade, 1b492bda-ef5a-4b46-84d1-1fec60e856c7] failed when attempting to perform the next operation, marking as 'ACTIVE'
2018-04-27 21:44:16,993+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [disks_syncAction_661ec431-2cfe-44f9] EngineException: Resource unavailable (Failed with error unavail and code 40): org.ovirt.engine.core.common.errors.EngineException: EngineException: Resource unavailable (Failed with error unavail and code 40)
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.replicateDiskFinish(LiveMigrateDiskCommand.java:406) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.completeLiveMigration(LiveMigrateDiskCommand.java:378) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.performNextOperation(LiveMigrateDiskCommand.java:242) [bll.jar:]
        at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:]
        at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:68) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:145) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:106) [bll.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_171]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_171]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_171]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)

2018-04-27 21:44:16,996+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [disks_syncAction_661ec431-2cfe-44f9] Command 'LiveMigrateDisk' id: 'dfbea590-34f1-4b76-ab5d-e943725fa6b0' child commands '[46f74fa8-4c9f-4e6c-9acb-6e25d821c2ff, dad80a51-316a-4e32-86ff-156086962ade, 1b492bda-ef5a-4b46-84d1-1fec60e856c7]' executions were completed, status 'FAILED'



Expected results:
Live storage migration should succeed

Additional info:
libvirt.log is unavailable for this test execution.

Comment 1 Benny Zlotnik 2018-04-29 12:18:12 UTC
Looks like: https://bugzilla.redhat.com/show_bug.cgi?id=1312909

Comment 2 Sandro Bonazzola 2019-01-28 09:36:39 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 3 Tal Nisan 2019-03-01 16:00:21 UTC

*** This bug has been marked as a duplicate of bug 1312909 ***