Bug 1293644

Summary: commands with mixed children types (CoCo/AsyncTasks) don't converge
Product: [oVirt] ovirt-engine Reporter: Daniel Erez <derez>
Component: BLL.InfraAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: acanan, amureini, bugs, gklein, lsurette, omachace, pstehlik, rbalakri, Rhev-m-bugs, srevivo, stirabos, tnisan, ykaul, ylavi
Target Milestone: ovirt-3.6.5Flags: rule-engine: ovirt-3.6.z+
rule-engine: exception+
ylavi: planning_ack+
tnisan: devel_ack+
pstehlik: testing_ack+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-21 14:42:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log none

Description Daniel Erez 2015-12-22 14:25:43 UTC
Description of problem:
Commands having both CoCo and async tasks don't converge, i.e. neither complete successfully or fail.

Version-Release number of selected component (if applicable):
3.6

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with an Image disk and a Cinder disk.
2. Create a snapshot.
3.

Actual results:
Action hangs infinitely.

Expected results:
Action should complete.

Additional info:
The issue is already solved on master by:
* https://gerrit.ovirt.org/#/c/47489/
* https://gerrit.ovirt.org/#/c/43971/

Comment 2 Ondra Machacek 2016-02-22 17:42:37 UTC
After creating snapshot I can see in log:

2016-02-22 19:33:52,906 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler_Worker-42) [31c311e0] Waiting on child command id: '836f8d38-39cd-449a-ad18-311f576b20f0' type:'CreateCinderSnapshot' of 'CreateAllCinderSnapshots' (id: '8f5e6783-cac4-497e-8ca2-68ad9e182fc3') to complete
2016-02-22 19:34:03,017 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler_Worker-81) [31c311e0] Waiting on child command id: '836f8d38-39cd-449a-ad18-311f576b20f0' type:'CreateCinderSnapshot' of 'CreateAllCinderSnapshots' (id: '8f5e6783-cac4-497e-8ca2-68ad9e182fc3') to complete
2016-02-22 19:34:13,110 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler_Worker-22) [31c311e0] Waiting on child command id: '836f8d38-39cd-449a-ad18-311f576b20f0' type:'CreateCinderSnapshot' of 'CreateAllCinderSnapshots' (id: '8f5e6783-cac4-497e-8ca2-68ad9e182fc3') to complete
2016-02-22 19:34:23,282 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler_Worker-69) [31c311e0] Waiting on child command id: '836f8d38-39cd-449a-ad18-311f576b20f0' type:'CreateCinderSnapshot' of 'CreateAllCinderSnapshots' (id: '8f5e6783-cac4-497e-8ca2-68ad9e182fc3') to complete

Comment 3 Red Hat Bugzilla Rules Engine 2016-02-22 17:42:44 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Moti Asayag 2016-02-24 07:30:43 UTC
Moving back to Daniel Erez who proposed the solution for this bug.

Comment 5 Daniel Erez 2016-02-24 07:45:44 UTC
Hi Ondra,

* In which build did you reproduce the issue?
* Is it the exact same issue?
* Can you attach the full logs?

Thanks

Comment 6 Daniel Erez 2016-03-20 11:51:08 UTC
(In reply to Daniel Erez from comment #5)
> Hi Ondra,
> 
> * In which build did you reproduce the issue?
> * Is it the exact same issue?
> * Can you attach the full logs?
> 
> Thanks

Hi Pavel,

Can you please provide information regarding the aforementioned questions?

Thanks!

Comment 7 Ondra Machacek 2016-03-21 10:18:24 UTC
Hi Daniel,

I am very sorry for late reply, but I missed needinfo.
Today I've retested with latest '3.6.4-1' and the snapshot creation completed successfully, so moving on verified. Sorry once again.

Comment 8 Ondra Machacek 2016-03-21 10:24:11 UTC
Created attachment 1138527 [details]
engine.log

Oh, so taking back, it looks like everything is fine, but it's not. I can't run the vm. and in log I see:
I am unsure if it's exact same issue. snaphosts/disks status is OK, but task didn't finished correctly.

2016-03-21 12:17:27,826 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-48) [5c47a1af] Command 'org.ovirt.engine.core.bll.RunVmCommand' faile
d: null
2016-03-21 12:17:27,826 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-48) [5c47a1af] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.storage.CinderBroker.updateConnectionInfoForDisk(CinderBroker.java:230) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommandBase.updateCinderDisksConnections(RunVmCommandBase.java:286) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.runVm(RunVmCommand.java:258) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.perform(RunVmCommand.java:435) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommand.executeVmCommand(RunVmCommand.java:362) [bll.jar:]
        at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:104) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1215) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1359) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1982) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1396) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:378) [bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:202) [bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:170) [bll.jar:]
        at org.ovirt.engine.core.bll.SortedMultipleActionsRunnerBase.runCommands(SortedMultipleActionsRunnerBase.java:20) [bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:179) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:89) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_85]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_85]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_85]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_85]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_85]
2016-03-21 12:17:27,872 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-48) [5c47a1af] Correlation ID: 5c47a1af, Job ID: ad82dc56-889b-409e-abcd-62fb86857275, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM vm (User: admin@internal).
2016-03-21 12:17:28,944 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler_Worker-76) [5f5ccbc6] Waiting on child command id: 'edf0e7eb-1982-4ba4-9c3f-120306519f44' type:'CreateCinderSnapshot' of 'CreateAllCinderSnapshots' (id: '22173ea0-254a-4b94-ac87-d27ccca85243') to complete

Comment 9 Daniel Erez 2016-03-21 12:58:19 UTC
Hi Ondra,

* Did you reproduce the exact same scenario?
* Can you please attach vdsm logs as well?

Thanks!

Comment 10 Ondra Machacek 2016-03-21 14:49:20 UTC
Yes I've reproduces exactly same scenario, but it turns out it's issue with my openstack.
So I've tried with different cinder and it worked. Sorry for confusion, I am closing this bz.