Bug 923366
| Summary: | [engine-backend] engine keeps requesting vdsm to create snapshot when the creation fails (live snapshot) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Elad <ebenahar> | ||||||||||
| Component: | ovirt-engine | Assignee: | Maor <mlipchuk> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 3.2.0 | CC: | abaron, acathrow, amureini, dron, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, scohen, sgrinber, yeylon, ykaul | ||||||||||
| Target Milestone: | --- | Flags: | abaron:
Triaged+
|
||||||||||
| Target Release: | 3.2.0 | ||||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | storage | ||||||||||||
| Fixed In Version: | sf12 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | Type: | Bug | |||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
would like to clarify expected results, the overall user experience is bad as engine is rapidly trying to create snapshot, putting a side the fact that host is down, its flooding the events, i'm not familiar with this behaviour and it looks like a regression, DEV - please check that and add a flag if needed, VDSM log is empty. can you please reattach it. What type of solution will be best? There are few options: 1) On any exception from VDSM at the end command, we will simply end the command with failure - That will cause the live snapshot to be failed most of the time, which will cause the user to keep working on the old snapshot. 2) Use fixed amount of retries - For example 3 retries and if it will fail on the last one, engine will fail the action. I think that this solution can be good if we will be part of the infra structure layer, for all async operations. that way we can configure also the ammount of times user wants to keep try the operation. 3) Only remove the flood in the event log, but keep on trying to live snapshot the machine. I will appreciate other suggestions as well? (In reply to comment #3) > What type of solution will be best? > There are few options: > > 1) On any exception from VDSM at the end command, we will simply end the > command with failure - That will cause the live snapshot to be failed most > of the time, which will cause the user to keep working on the old snapshot. > > 2) Use fixed amount of retries - For example 3 retries and if it will fail > on the last one, engine will fail the action. What would happen to the snapshot (on SPM) when the action fails? (In reply to comment #4) > (In reply to comment #3) > > What type of solution will be best? > > There are few options: > > > > 1) On any exception from VDSM at the end command, we will simply end the > > command with failure - That will cause the live snapshot to be failed most > > of the time, which will cause the user to keep working on the old snapshot. > > > > 2) Use fixed amount of retries - For example 3 retries and if it will fail > > on the last one, engine will fail the action. > > What would happen to the snapshot (on SPM) when the action fails? When the action will fails on SPM, the operation of create snapshot will not end, but the events will not be at the same frequency compared to HSM, since the SPM will not poll the tasks, if it will not be non-responsive. Created attachment 713137 [details]
vdsm log
(In reply to comment #5) > (In reply to comment #4) > > (In reply to comment #3) > > > What type of solution will be best? > > > There are few options: > > > > > > 1) On any exception from VDSM at the end command, we will simply end the > > > command with failure - That will cause the live snapshot to be failed most > > > of the time, which will cause the user to keep working on the old snapshot. > > > > > > 2) Use fixed amount of retries - For example 3 retries and if it will fail > > > on the last one, engine will fail the action. > > > > What would happen to the snapshot (on SPM) when the action fails? > When the action will fails on SPM, the operation of create snapshot will not > end, but the events will not be at the same frequency compared to HSM, > since the SPM will not poll the tasks, if it will not be non-responsive. That is not what I meant. engine sends 'snapshot' to VM (on hsm) and vdms dies in the middle. Assuming we have X retries, if all of them fail, what would the rollback behaviour be? (In reply to comment #7) > (In reply to comment #5) > > (In reply to comment #4) > > > (In reply to comment #3) > > > > What type of solution will be best? > > > > There are few options: > > > > > > > > 1) On any exception from VDSM at the end command, we will simply end the > > > > command with failure - That will cause the live snapshot to be failed most > > > > of the time, which will cause the user to keep working on the old snapshot. > > > > > > > > 2) Use fixed amount of retries - For example 3 retries and if it will fail > > > > on the last one, engine will fail the action. > > > > > > What would happen to the snapshot (on SPM) when the action fails? > > When the action will fails on SPM, the operation of create snapshot will not > > end, but the events will not be at the same frequency compared to HSM, > > since the SPM will not poll the tasks, if it will not be non-responsive. > > That is not what I meant. > engine sends 'snapshot' to VM (on hsm) and vdms dies in the middle. > Assuming we have X retries, if all of them fail, what would the rollback > behaviour be? There will not be specific rollback, but engine can print the user the following event log USER_CREATE_LIVE_SNAPSHOT_FINISHED_FAILURE: Failed to create live snapshot '${SnapshotName}' for VM '${VmName}'. VM restart is recommended The bug reproduces also when trying to create snapshot from VM that runs on SPM (In reply to comment #9) > The bug reproduces also when trying to create snapshot from VM that runs on > SPM Elad, can u please add a VDSM and engine log for that, I suspect it will have much less events in this scenario Created attachment 713218 [details]
on SPM
The relevant log lines starts at 12:59
Created attachment 713219 [details]
on engine
The relevant log lines starts at 12:59
(In reply to comment #10) > (In reply to comment #9) > > The bug reproduces also when trying to create snapshot from VM that runs on > > SPM > > Elad, can u please add a VDSM and engine log for that, I suspect it will > have much less events in this scenario Thanks for the logs, it seems that the reason for that is that, when you stop VDSM in the SPM, engine will re-elect another host which will be the SPM instead. So the new SPM will keep polling the tasks and will try to do Snapshot on the HSM which was the SPM before. This scenario will only be reproduced when you got two hosts in the same DC. After discussed with Barak and Ayal, it was decided that on any failure engine will print an audit log as described in comment 8, and will finish the operation (In reply to comment #14) > After discussed with Barak and Ayal, it was decided that on any failure > engine will print an audit log as described in comment 8, and will finish > the operation By saying any failure it is meant any VDS command failure such as Connection refused as described in the bug. Pushed upstream at http://gerrit.ovirt.org/#/c/13262/ Tested on RHEVM-3.2 - SF12: rhevm-3.2.0-10.17.master.el6ev.noarch rhevm-backend-3.2.0-10.17.master.el6ev.noarch vdsm-4.10.2-13.0.el6ev.x86_64 libvirt-0.10.2-18.el6_4.2.x86_64 Engine is doing roll-back when failing to create snapshot during restart to vdsmd. 2013-04-04 15:15:20,703 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-3-thread-47) [43fdbcc3] START, SnapshotVDSCommand(HostName = nott-vds2, HostId = 8fed1a90-79f3-4c6e-b26b-1f5a9e2cc 8e8, vmId=72f0ff63-60e1-4131-ab12-9993708502c3), log id: 477c736a 2013-04-04 15:15:20,706 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-3-thread-47) [43fdbcc3] FINISH, SnapshotVDSCommand, log id: 477c736a 2013-04-04 15:15:20,706 WARN [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (pool-3-thread-47) [43fdbcc3] Wasnt able to live snpashot due to error: VdcBLLException: VdcBLLException: org.ovirt.engine. core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused. VM will still be configured to the new created snapshot 3.2 has been released 3.2 has been released 3.2 has been released |
Created attachment 712793 [details] vdsm+rhevm logs Description of problem: When vdsmd service is stopped while engine send a request to vdsm to create snapshot and the creation fails, engine keeps requesting vdsm to create snapshot infinitely. Version-Release number of selected component (if applicable): vdsm-4.10.2-11.0.el6ev.x86_64 rhevm-backend-3.2.0-10.14.beta1.el6ev.noarch libvirt-0.10.2-18.el6_4.2.x86_64 How reproducible: 100% Steps to Reproduce: On 2 hosts setup and 1 iSCSI domain: 1. create a VM with 3 disks 2. make sure that the VM is on HSM. 3. create snapshot of the VM and immediately stop vdsmd service on HSM. Actual results: Engine keeps sending create snapshot command to vdsm when vdsm service is down on vdsm. Expected results: engine should'nt send the create snapshot command to vdsm when vdsm service is down on vdsm. Additional info: see logs attached on engine: 2013-03-19 18:12:14,020 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-3-thread-48) [78571fb0] START, SnapshotVDSCommand(HostName = nott-vds2, HostId = 4a4cb868-e9e7-4192-b790-83defa5dd1bd, vmId=307dd290-8dba-47d4-b8dd-4d777a52821c), log id: ece236a 2013-03-19 18:12:14,024 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-3-thread-48) [78571fb0] FINISH, SnapshotVDSCommand, log id: ece236a 2013-03-19 18:12:14,024 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-3-thread-48) [78571fb0] transaction rolled back 2013-03-19 18:12:14,026 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-3-thread-48) [78571fb0] transaction rolled back 2013-03-19 18:12:14,037 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-3-thread-48) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type CreateAllSnapshotsFromVm threw an exception: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:169) [engine-bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:1591) [engine-bll.jar:] at org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand$2.runInTransaction(CreateAllSnapshotsFromVmCommand.java:226) [engine-bll.jar:] at org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand$2.runInTransaction(CreateAllSnapshotsFromVmCommand.java:221) [engine-bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:166) [engine-utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:108) [engine-utils.jar:] at org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand.performLiveSnapshot(CreateAllSnapshotsFromVmCommand.java:221) [engine-bll.jar:] at org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand$1.runInTransaction(CreateAllSnapshotsFromVmCommand.java:178) [engine-bll.jar:] at org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand$1.runInTransaction(CreateAllSnapshotsFromVmCommand.java:166) [engine-bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [engine-utils.jar:] at org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand.endVmCommand(CreateAllSnapshotsFromVmCommand.java:166) [engine-bll.jar:] at org.ovirt.engine.core.bll.VmCommand.endSuccessfully(VmCommand.java:224) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.internalEndSuccessfully(CommandBase.java:536) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.endActionInTransactionScope(CommandBase.java:483) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1511) [engine-bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [engine-utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:141) [engine-utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:110) [engine-utils.jar:] at org.ovirt.engine.core.bll.CommandBase.endAction(CommandBase.java:429) [engine-bll.jar:] at org.ovirt.engine.core.bll.Backend.endAction(Backend.java:379) [engine-bll.jar:] at sun.reflect.GeneratedMethodAccessor450.invoke(Unknown Source) [:1.7.0_09-icedtea] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_09-icedtea] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_09-icedtea] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:374) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.ovirt.engine.core.utils.ThreadLocalSessionCleanerInterceptor.injectWebContextToThreadLocal(ThreadLocalSessionCleanerInterceptor.java:11) [engine-utils.jar:] at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) [:1.7.0_09-icedtea] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_09-icedtea] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_09-icedtea] at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptorFactory$ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptorFactory.java:123) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [jboss-as-ejb3.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [jboss-as-ejb3.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:209) [jboss-as-ejb3.jar:7.1.3.Final-redhat-4] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:361) [jboss-as-ejb3.jar:7.1.3.Final-redhat-4] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:192) [jboss-as-ejb3.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:42) [jboss-as-ejb3.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:181) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.1.3.Final-redhat-4] at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view9.endAction(Unknown Source) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:137) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask.access$000(EntityAsyncTask.java:25) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask$1.run(EntityAsyncTask.java:106) [engine-bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:71) [engine-utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_09-icedtea] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09-icedtea] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09-icedtea] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_09-icedtea] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_09-icedtea] at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09-icedtea]