Bug 1392248

Summary: NFS Live Storage migration: Could not remove all image volumes
Product: [oVirt] vdsm Reporter: Markus Stockhausen <mst>
Component: CoreAssignee: Dan Kenigsberg <danken>
Status: CLOSED NOTABUG QA Contact: Aharon Canan <acanan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.17.32CC: amureini, bugs, mst, nsoffer, ylavi
Target Milestone: ovirt-4.0.6Flags: amureini: ovirt-4.0.z?
amureini: planning_ack?
amureini: devel_ack?
amureini: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-08 09:57:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm.log
none
engine.log none

Description Markus Stockhausen 2016-11-06 20:37:46 UTC
Created attachment 1217834 [details]
vdsm.log

Description of problem:

NFS Live storage migrations produces error "Could not remove all image volumes"


Version-Release number of selected component (if applicable):

OVirt 4.0.5
VDSM 4.17.32 (Ovirt 3.6 node)


How reproducible:

would say 100% (occured twice today)


Steps to Reproduce:
1. Start single disk live migration between to nfs storage domains

Actual results:

migration succeeds but log gives above error message at the end. The named directory is missing for NFS storage after migration. 

Expected results:

Message should not occur

Additional info:

SPM seems to be relocated directly after the error.

Comment 1 Markus Stockhausen 2016-11-06 20:38:15 UTC
SORRY! Ovirt is 4.0.4

Comment 2 Markus Stockhausen 2016-11-06 20:42:05 UTC
Created attachment 1217835 [details]
engine.log

Comment 3 Allon Mureinik 2016-11-07 09:07:50 UTC
Relevant error in vdsm.log:

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1544, in deleteImage
    pool.deleteImage(dom, imgUUID, volsByImg)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1982, in deleteImage
    domain.deleteImage(domain.sdUUID, imgUUID, volsByImg)
  File "/usr/share/vdsm/storage/fileSD.py", line 407, in deleteImage
    raise se.ImageDeleteError("%s %s" % (imgUUID, str(e)))
ImageDeleteError: Could not remove all image's volumes: (u'e9edf93f-f20b-4ccb-b298-45b4fa8da820 [Errno 39] Directory not empty',)

From the engine log we can see that there are volumes with the "_backup" suffix added to them that seem to muck things up:

2016-11-06 21:19:32,372 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (DefaultQuartzScheduler3) [74729fc2] Command 'DeleteImageGroupVDSCommand( DeleteImageGroupVDSCommandParameters:{runAsync='true', storagePoolId='94ed7a19-fade-4bd6-83f2-2cbb2f730b95', ignoreFailoverLimit='false', storageDomainId='c86d8409-4dd6-4e30-86dc-b5175a7ceb86', imageGroupId='e9edf93f-f20b-4ccb-b298-45b4fa8da820', postZeros='false', forceDelete='false'})' execution failed: IRSGenericException: IRSErrorException: Could not remove all image's volumes: (u'e9edf93f-f20b-4ccb-b298-45b4fa8da820 [Errno 39] Directory not empty',)
2016-11-06 21:19:32,372 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (DefaultQuartzScheduler3) [74729fc2] FINISH, DeleteImageGroupVDSCommand, log id: 2e0a458e
2016-11-06 21:19:32,375 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImagesListVDSCommand] (DefaultQuartzScheduler3) [74729fc2] START, GetImagesListVDSCommand( GetImagesListVDSCommandParameters:{runAsync='true', storagePoolId='94ed7a19-fade-4bd6-83f2-2cbb2f730b95', ignoreFailoverLimit='false', sdUUID='c86d8409-4dd6-4e30-86dc-b5175a7ceb86'}), log id: 3b2ad90c
2016-11-06 21:19:32,524 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler3) [74729fc2] ERROR, GetImagesListVDSCommand( GetImagesListVDSCommandParameters:{runAsync='true', storagePoolId='94ed7a19-fade-4bd6-83f2-2cbb2f730b95', ignoreFailoverLimit='false', sdUUID='c86d8409-4dd6-4e30-86dc-b5175a7ceb86'}), exception: For input string: "d75a16fec85a_backup", log id: 3b2ad90c
2016-11-06 21:19:32,524 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler3) [74729fc2] Exception: java.lang.NumberFormatException: For input string: "d75a16fec85a_backup"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [rt.jar:1.8.0_102]
	at java.lang.Long.parseLong(Long.java:589) [rt.jar:1.8.0_102]
	at java.lang.Long.valueOf(Long.java:776) [rt.jar:1.8.0_102]
	at java.lang.Long.decode(Long.java:928) [rt.jar:1.8.0_102]
	at java.util.UUID.fromString(UUID.java:206) [rt.jar:1.8.0_102]
	at org.ovirt.engine.core.compat.Guid.<init>(Guid.java:73) [compat.jar:]
	at org.ovirt.engine.core.vdsbroker.irsbroker.GetImagesListVDSCommand.executeIrsBrokerCommand(GetImagesListVDSCommand.java:23) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.lambda$executeVDSCommand$0(IrsBrokerCommand.java:161) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData.runInControlledConcurrency(IrsProxyData.java:248) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.executeVDSCommand(IrsBrokerCommand.java:158) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:451) [vdsbroker.jar:]
	at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2171) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.StorageHandlingCommandBase.runVdsCommand(StorageHandlingCommandBase.java:657) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand.isImageRemovedFromStorage(RemoveImageCommand.java:108) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand.executeCommand(RemoveImageCommand.java:89) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1305) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1447) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2075) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:166) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:105) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1490) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:398) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:493) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:475) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:670) [bll.jar:]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_102]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_102]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_102]
	at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_102]
	at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437)
	at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:70) [wildfly-weld-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:80) [wildfly-weld-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:93) [wildfly-weld-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437)
	at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:73) [weld-core-impl-2.3.2.Final.jar:2.3.2.Final]
	at org.jboss.as.weld.ejb.EjbRequestScopeActivationInterceptor.processInvocation(EjbRequestScopeActivationInterceptor.java:83) [wildfly-weld-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45) [wildfly-ee-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
	at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:52)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:263) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:243) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:43) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ejb3.security.SecurityContextInterceptor.processInvocation(SecurityContextInterceptor.java:100) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:66) [wildfly-ejb3-10.0.0.Final.jar:10.0.0.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:64)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:356)
	at org.wildfly.security.manager.WildFlySecurityManager.doChecked(WildFlySecurityManager.java:636)
	at org.jboss.invocation.AccessCheckingInterceptor.processInvocation(AccessCheckingInterceptor.java:61)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:356)
	at org.jboss.invocation.PrivilegedWithCombinerInterceptor.processInvocation(PrivilegedWithCombinerInterceptor.java:80)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
	at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:195)
	at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:185)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
	at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
	at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:73)
	at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view3.runInternalAction(Unknown Source) [bll.jar:]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_102]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_102]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_102]
	at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_102]
	at org.jboss.weld.util.reflection.Reflections.invokeAndUnwrap(Reflections.java:436) [weld-core-impl-2.3.2.Final.jar:2.3.2.Final]
	at org.jboss.weld.bean.proxy.EnterpriseBeanProxyMethodHandler.invoke(EnterpriseBeanProxyMethodHandler.java:127) [weld-core-impl-2.3.2.Final.jar:2.3.2.Final]
	at org.jboss.weld.bean.proxy.EnterpriseTargetBeanInstance.invoke(EnterpriseTargetBeanInstance.java:56) [weld-core-impl-2.3.2.Final.jar:2.3.2.Final]
	at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:100) [weld-core-impl-2.3.2.Final.jar:2.3.2.Final]
	at org.ovirt.engine.core.bll.BackendCommandObjectsHandler$BackendInternal$BackendLocal$2049259618$Proxy$_$$_Weld$EnterpriseProxy$.runInternalAction(Unknown Source) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInternalAction(CommandBase.java:2471) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInternalActionWithTasksContext(CommandBase.java:2496) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInternalActionWithTasksContext(CommandBase.java:2491) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.lsm.LiveStorageMigrationHelper.removeImage(LiveStorageMigrationHelper.java:23) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.performNextOperation(LiveMigrateDiskCommand.java:104) [bll.jar:]
	at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:28) [bll.jar:]
	at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:64) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:112) [bll.jar:]
	at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) [:1.8.0_102]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_102]
	at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_102]
	at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
	at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_102]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_102]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_102]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_102]
	at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_102]


Nir - any idea where these volumes come from?

Comment 4 Markus Stockhausen 2016-11-07 10:15:45 UTC
Snapshots produced some Quite Strange errors in the past. So wie sometimes Backup images folders with Suffix _backup.

That should not produce SPM relocation

Comment 5 Nir Soffer 2016-11-07 17:22:55 UTC
(In reply to Markus Stockhausen from comment #4)
> Snapshots produced some Quite Strange errors in the past. So wie sometimes
> Backup images folders with Suffix _backup.
> 
> That should not produce SPM relocation

In a perfect world, yes. In the current system you cannot add arbitrary files
inside storage domain, in particular no in an image directory.

Please move images/volumes backup files elsewhere, this will probably fix your
failing live storage migration.

Comment 6 Markus Stockhausen 2016-11-07 20:15:10 UTC
Thanks for the carification. Removing the backup folders fixed our problem. Nevertheless wouldnt it make sense to ignore folders that do not match the UUID scheme?

Comment 7 Nir Soffer 2016-11-07 20:28:03 UTC
(In reply to Markus Stockhausen from comment #6)
> Thanks for the carification. Removing the backup folders fixed our problem.
> Nevertheless wouldnt it make sense to ignore folders that do not match the
> UUID scheme?

When we delete an image directory, we delete:

- the volume data file (uuid)
- the volume metadata file (uuid.meta)
- the volume lease file (uuid.lease)

We assume that there are no other files in the image directory, and the directory
must be empty after removing all files.

We remove any other files in this directory but this means we will delete your
backups. Maybe the best thing would be to warn that the directory is not empty
without failing the request.

Comment 8 Yaniv Lavi 2016-11-08 09:57:49 UTC
The storage domains are not a API you can use and change. We have API for backup if you want to backup VMs. Copying files within the storage domain is not something we can support. I'm closing this bug. Please clone the VM, export it to a export domain or back it up using the backup API, if you need to maintain a copy of the VM.