Bug 1508844 - ExtendImageTicketVDSCommand might get stuck if host is restarted during image transfer
Summary: ExtendImageTicketVDSCommand might get stuck if host is restarted during image...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Daniel Erez
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-02 11:02 UTC by Yaniv Kaul
Modified: 2017-12-20 10:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-20 10:56:11 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 83662 0 master MERGED core: TransferImage - pause on extend failure 2020-08-26 17:24:38 UTC

Description Yaniv Kaul 2017-11-02 11:02:07 UTC
Description of problem:
I had an image transfer (Upload) going on, which I forgot about.
I updated a host (which moves a host to maint., upgrades it and reboots it).

1. Perhaps we should prevent maintenance mode while transferring an image via that host?
2. The below exception repeated itself endlessly.

Version-Release number of selected component (if applicable):
4.2.0

2017-11-02 06:56:53,062-04 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ExtendImageTicketVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-61) [334317bf-fef3-4ada-8499-6e0487426e2e] FINISH, ExtendImageTicketVDSCommand, log id: 59fadebd
2017-11-02 06:56:53,062-04 DEBUG [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] (EE-ManagedThreadFactory-engineScheduled-Thread-61) [334317bf-fef3-4ada-8499-6e0487426e2e] method: runVdsCommand, params: [ExtendImageTicket, ExtendImageTicketVDSCommandParameters:{hostId='c5406a7c-6637-4d5f-8a8d-48fce13d358c', ticketId='4ce63c9f-b9b2-4336-a6a3-101621753109', timeout='300'}], timeElapsed: 3002ms
2017-11-02 06:56:53,062-04 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-61) [334317bf-fef3-4ada-8499-6e0487426e2e] Failed to extend image transfer session for ticket '4ce63c9f-b9b2-4336-a6a3-101621753109': {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.NoRouteToHostException: No route to host (Failed with error VDS_NETWORK_ERROR and code 5022)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.extendImageTransferSession(TransferImageCommand.java:565) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.handleTransferring(TransferImageCommand.java:295) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.executeStateHandler(TransferImageCommand.java:163) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.proceedCommandExecution(TransferImageCommand.java:144) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommandCallback.doPolling(TransferImageCommandCallback.java:21) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:145) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:106) [bll.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_151]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:]


...
Caused by: java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [rt.jar:1.8.0_151]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) [rt.jar:1.8.0_151]
        at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:116) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:160) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:93) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.extend_image_ticket(JsonRpcVdsServer.java:1832) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.ExtendImageTicketVDSCommand.executeVdsBrokerCommand(ExtendImageTicketVDSCommand.java:12) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:112) [vdsbroker.jar:]

Comment 1 Avihai 2017-11-15 14:33:31 UTC
Hi Daniel , 

I moved hosts to maintenance & rebooted the host during upload -> the disk went to 'Paused by System' state & the transfer does not continue - is this the expected behavior?

Comment 2 Avihai 2017-11-15 14:37:59 UTC
I also got the following exception in the engine & upload can not be resumed:

2017-11-15 16:35:57,055+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [13893784-f38d-4e15-a016-0160804106e9] Failed to stop image transfer session for ticket '2910890b-77a8-419d-9a65-da8ebbc0f554': {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = Image daemon request failed: u"status=404, code=404, title=Not Found, explanation=The resource could not be found., detail=No such ticket u'2910890b-77a8-419d-9a65-da8ebbc0f554', reason=Not Found", code = 482 (Failed with error ImageDeamonError and code 482)
	at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]
	at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.stopImageTransferSession(TransferImageCommand.java:639) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.handlePaused(TransferImageCommand.java:441) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.handlePausedSystem(TransferImageCommand.java:344) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.executeStateHandler(TransferImageCommand.java:162) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommand.proceedCommandExecution(TransferImageCommand.java:140) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommandCallback.doPolling(TransferImageCommandCallback.java:21) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:145) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:106) [bll.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_151]
	at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:]
	at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent-1.0.jar:]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151]
	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]
	at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
	at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)
Caused by: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = Image daemon request failed: u"status=404, code=404, title=Not Found, explanation=The resource could not be found., detail=No such ticket u'2910890b-77a8-419d-9a65-da8ebbc0f554', reason=Not Found", code = 482
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createDefaultConcreteException(VdsBrokerCommand.java:81) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.createException(BrokerCommandBase.java:223) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:193) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand.executeVdsBrokerCommand(RemoveImageTicketVDSCommand.java:13) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:112) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:387) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand$$super(Unknown Source) [vdsbroker.jar:]
	at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) [:1.8.0_151]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_151]
	at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_151]
	at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:49) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final]
	at org.jboss.weld.interceptor.proxy.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:77) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final]
	at org.ovirt.engine.core.common.di.interceptor.LoggingInterceptor.apply(LoggingInterceptor.java:12) [common.jar:]
	at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) [:1.8.0_151]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_151]
	at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_151]
	at org.jboss.weld.interceptor.reader.SimpleInterceptorInvocation$SimpleMethodInvocation.invoke(SimpleInterceptorInvocation.java:73) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final]
	at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeAroundInvoke(InterceptorMethodHandler.java:84) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final]
	at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeInterception(InterceptorMethodHandler.java:72) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final]
	at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.invoke(InterceptorMethodHandler.java:56) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final]
	at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final]
	at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final]
	at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:]
	... 18 more

Comment 3 Avihai 2017-11-15 14:40:56 UTC
Created attachment 1352675 [details]
engine/vdsm/imageio logs

Added engine/vdsm/imageio logs

Comment 4 Daniel Erez 2017-11-15 14:43:23 UTC
This behavior is by design. We move the transfer to paused state on error. 
I.e. we attempt to extend for a couple of times, and move the transfer to pause on failure. The exception is also expected since the tickets are removed from the daemon on restart (tickets are saves in memory).

Comment 5 Avihai 2017-11-15 15:01:52 UTC
(In reply to Daniel Erez from comment #4)
> This behavior is by design. We move the transfer to paused state on error. 
> I.e. we attempt to extend for a couple of times, and move the transfer to
> pause on failure. The exception is also expected since the tickets are
> removed from the daemon on restart (tickets are saves in memory).

What about the fact that I can not resume/cancel the upload after host is up?

I still get the following error all the time now :
VDSM host_mixed_1 command RemoveImageTicketVDS failed: Image daemon request failed: u"status=404, code=404, title=Not Found, explanation=The resource could not be found., detail=No such ticket u'2910890b-77a8-419d-9a65-da8ebbc0f554', reason=Not Found"

Comment 6 Daniel Erez 2017-11-15 15:18:48 UTC
(In reply to Avihai from comment #5)
> (In reply to Daniel Erez from comment #4)
> > This behavior is by design. We move the transfer to paused state on error. 
> > I.e. we attempt to extend for a couple of times, and move the transfer to
> > pause on failure. The exception is also expected since the tickets are
> > removed from the daemon on restart (tickets are saves in memory).
> 
> What about the fact that I can not resume/cancel the upload after host is up?
> 
> I still get the following error all the time now :
> VDSM host_mixed_1 command RemoveImageTicketVDS failed: Image daemon request
> failed: u"status=404, code=404, title=Not Found, explanation=The resource
> could not be found., detail=No such ticket
> u'2910890b-77a8-419d-9a65-da8ebbc0f554', reason=Not Found"

Does it happen when trying to cancel the upload? If so, can you please open a separate bug on it.

Comment 7 Avihai 2017-11-19 08:24:32 UTC
Verified on ovirt-engine-4.2.0-0.0.master.20171114111003.git7aa1b91.el7.centos.noarch .

will open a new bug on cancel/resume upload after host reboot.

Comment 8 Sandro Bonazzola 2017-12-20 10:56:11 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.