Bug 1465539

Summary: Auto Generated snapshots failed to remove when live migrating disks concurrently
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.1.4CC: amureini, bugs, eshenitz, mkalinin, ratamir
Target Milestone: ovirt-4.1.5Flags: rule-engine: ovirt-4.1+
Target Release: 4.1.5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-23 08:06:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1470127    
Bug Blocks:    
Attachments:
Description Flags
engine and vdsm logs
none
snapshot_remove_failed_logs
none
engine and vdsm logs none

Description Eyal Shenitzky 2017-06-27 15:06:50 UTC
Created attachment 1292372 [details]
engine and vdsm logs

Description of problem:

After migrating VM disks concurrently, Auto Generated snapshots of the live storage migration disks failed to remove.

Only one of the snapshots remove successfully.

Version-Release number of selected component (if applicable):
Engine - 4.2.0-0.0.master.20170624071949.gitff9611b.el7.centos
VDSM - 4.20.1-66.git228c7be.el7.centos.x86_64

Also, occur on:

Engine - 4.1.3.5-0.1.el7
VDSM - 4.19.20-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with 3 disks [A,B,C] of 10GB
2. Run the VM
3. Select to migrate disk A to different storage domain.
4. When the engine finishes creating the Auto-generated-snapshot for live storage migration, select to move disk B to different storage domain.
5. When the engine finishes creating the Auto-generated-snapshot for the second live storage migration, select to move disk C to different storage domain.

Actual results:
Live storage migration succeed but only one of the snapshots removed

Expected results:
Live storage migration succeeds, all Auto-generated-snapshots should remove.


Additional info:
Engine and VDSM logs attached

Comment 1 Eyal Shenitzky 2017-06-29 13:04:08 UTC
Created attachment 1292875 [details]
snapshot_remove_failed_logs

Comment 2 Eyal Shenitzky 2017-06-29 13:05:36 UTC
In some cases, 'Auto-generated for Live Storage Migration' failed to remove
Due to - 

2017-06-29 15:58:59,586+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler6) [disks_syncAction_6e619e6b-c15c-434b] Failed invoking ca
llback end method 'onFailed' for command '96124e81-f3bb-4217-b6b3-33f2ef04d188' with exception 'null', the callback is marked for end method retries

Attached new logs.

Comment 3 Eyal Shenitzky 2017-07-31 11:51:29 UTC
The following exception has occurred when running verification for the following scenario:

Steps to Reproduce:
1. Create a VM with 3 disks [A,B,C] of 10GB
2. Run the VM
3. Select to migrate disk A to different storage domain.
4. When the engine finishes creating the Auto-generated-snapshot for live storage migration, select to move disk B to different storage domain.
5. When the engine finishes creating the Auto-generated-snapshot for the second live storage migration, select to move disk C to different storage domain.

Engine logs:

2017-07-31 14:25:27,751+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [78b1604b-a6c4-40ff-96c7-b27fbd96cc9f] Command 'MergeVDSComma
nd(HostName = host_mixed_2, MergeVDSCommandParameters:{runAsync='true', hostId='46625bae-5eb7-492f-a7b4-026be13a2d2e', vmId='76d78845-a256-4417-811d-20039a229ce4', storageP
oolId='4f3d521b-4b6c-403a-ac0f-976ec9d952ad', storageDomainId='32af6829-6b8b-4abb-87d2-44ec1cbf63a4', imageGroupId='20644d1f-3964-49d8-bd03-a7890afec3ca', imageId='36521fc6
-e2b1-4579-bfAuto-generated for Live Storage Migration69-03e14c8a65ce', baseImageId='ab5663da-a328-4ac4-ac1e-05b5e5b50686', topImageId='36521fc6-e2b1-4579-bf69-03e14c8a65ce', bandwidth='0'})' execution failed: VDS
GenericException: VDSErrorException: Failed to MergeVDS, error = internal error: failed to query the maximum written offset of block device 'vda', code = -32603
2017-07-31 14:25:27,751+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [78b1604b-a6c4-40ff-96c7-b27fbd96cc9f] FINISH, MergeVDSComman
d, log id: 2d53ea6
2017-07-31 14:25:27,751+03 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-3) [78b1604b-a6c4-40ff-96c7-b27fbd96cc9f] Engine exception thrown while sending mer
ge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSError
Exception: Failed to MergeVDS, error = internal error: failed to query the maximum written offset of block device 'vda', code = -32603 (Failed with error unexpected and cod
e 16)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2166) [bll.jar:]
        at org.ovirt.engine.core.bll.MergeCommand.executeCommand(MergeCommand.java:45) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:137) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:105) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) [bll.jar:]
        at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:511) [bll.jar:]
at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:756) [bll.jar:]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_121]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_121]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121]
        at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437)
        at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:70) [wildfly-weld-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:80) [wildfly-weld-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:93) [wildfly-weld-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437)
        at org.ovirt.engine.core.bll.interceptors.CorrelationIdTrackerInterceptor.aroundInvoke(CorrelationIdTrackerInterceptor.java:13) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor93.invoke(Unknown Source) [:1.8.0_121]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121]
        at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptor.java:89)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437)at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:73) [weld-core-impl.jar:2.3.3.Final-redhat-1]
        at org.jboss.as.weld.ejb.EjbRequestScopeActivationInterceptor.processInvocation(EjbRequestScopeActivationInterceptor.java:83) [wildfly-weld-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45) [wildfly-ee-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
        at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:52)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:263) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:243) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:43) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.security.SecurityContextInterceptor.processInvocation(SecurityContextInterceptor.java:100) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.deployment.processors.StartupAwaitInterceptor.processInvocation(StartupAwaitInterceptor.java:22) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:66) [wildfly-ejb3-7.0.4.GA-redhat-2.jar:7.0.4.GA-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:64)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:356)
        at org.wildfly.security.manager.WildFlySecurityManager.doChecked(WildFlySecurityManager.java:636)
        at org.jboss.invocation.AccessCheckingInterceptor.processInvocation(AccessCheckingInterceptor.java:61)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:356)
        at org.jboss.invocation.PrivilegedWithCombinerInterceptor.processInvocation(PrivilegedWithCombinerInterceptor.java:80)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:198)at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:185)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:73)
        at org.ovirt.engine.core.bll.interfaces.BackendCommandObjectsHandler$$$view3.runAction(Unknown Source) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandExecutor.executeCommand(CommandExecutor.java:57) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandExecutor.lambda$executeAsyncCommand$0(CommandExecutor.java:46) [bll.jar:]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]
Caused by: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = internal error: failed to query the maximum written offset of block device 'vda', code = -32603
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createDefaultConcreteException(VdsBrokerCommand.java:76) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.createException(BrokerCommandBase.java:222) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:192) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand.executeVdsBrokerCommand(MergeVDSCommand.java:26) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:111) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:407) [vdsbroker.jar:]
        ... 88 more
2017-07-31 14:25:27,936+03 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-3) [78b1604b-a6c4-40ff-96c7-b27fbd96cc9f] Transaction rolled-back for command 'org.ovirt.engine.core.bll.MergeCommand'.

Actual Results:

The VM entered to 'pause' mode and failed to remove the 'Auto-generated for Live Storage Migration' snapshot.

Test with the following versions:
Engine - 4.1.5-0.1.el7
VDSM - 4.19.25-1.el7ev.x86_64

How reproducible:
100%

Comment 4 Ala Hino 2017-07-31 12:54:53 UTC
Logs please

Comment 5 Ala Hino 2017-07-31 13:46:26 UTC
Are you testing on RHEL 7.4?

Please send vdsm, libvirt and qemu versions.

Comment 6 Ala Hino 2017-07-31 13:50:07 UTC
Please note that in BZ 1464002 we encountered the following error:

libvirtError: internal error: failed to query the maximum written offset of block device 'sda'

This is related to RHEL 7.4.

Comment 7 Eyal Shenitzky 2017-08-01 08:18:23 UTC
Created attachment 1307455 [details]
engine and vdsm logs

Comment 8 Eyal Shenitzky 2017-08-01 08:20:28 UTC
libvirt - 3.2.0-14.el7.x86_64
qemu - 2.6.0-28.el7.10.1.x86_64

Comment 9 Allon Mureinik 2017-08-01 14:13:18 UTC
(In reply to Eyal Shenitzky from comment #8)
> libvirt - 3.2.0-14.el7.x86_64
> qemu - 2.6.0-28.el7.10.1.x86_64

These are too old - we need libvirt-3.2.0-14.el7_4.2.

Let's retest this on top of RHEL 7.4 with 0day

Comment 10 Elad 2017-08-15 11:48:52 UTC
Executed the following scenario:

1. Create a VM with 3 disks [A,B,C] of 10GB
2. Run the VM
3. Select to migrate disk A to different storage domain.
4. When the engine finishes creating the Auto-generated-snapshot for live storage migration, select to move disk B to different storage domain.
5. When the engine finishes creating the Auto-generated-snapshot for the second live storage migration, select to move disk C to different storage domain.

Live merge took place for each of the 3 snapshots and succeeded.



Used:
rhevm-4.1.5.2-0.1.el7.noarch
vdsm-4.19.27-1.el7ev.x86_64
libvirt-3.2.0-14.el7.x86_64