Description of problem: When creating a pool vm, the tasks stuck on started and doesn't ends (even not agter severl days or after engine restart) Version-Release number of selected component (if applicable): rhevm.noarch 3.6.0-0.1000.7.a03a5cd.master.el6ev art.noarch 1.0.6-1.6v36 How reproducible: 100% Steps to Reproduce: 1.create a pool vm 2.look at the pool vm tasks 3. Actual results: the task never ends Expected results: the tasks spouse to end Additional info: the automation build : http://jenkins-ci.eng.lab.tlv.redhat.com/view/0%20Unstable%203.6/job/rhevm_3.6_el6-engine_el7-host_automation_coretools_two_hosts_restapi_vms_nfs_rest_factory/407/
Created attachment 1014190 [details] the tasks print screen
Created attachment 1014191 [details] the engine and hosts logs
*** Bug 1215955 has been marked as a duplicate of this bug. ***
*** Bug 1247506 has been marked as a duplicate of this bug. ***
This bug was verified in 3.6.0-2 but was broken again in 3.6.0-3 (vdsm-4.17.0-1054.git562e711.el7.noarch)
Reproduced again on 3.6.0-0.0.master.20150804111407.git122a3a0.el6, attaching relevant part of log (just created a new pool).
Created attachment 1062514 [details] new_pool_engine_log
I could not see any issue in the add pool commands flow that would cause this, all the steps are finished but the job is still open. so i think its not the same issue as in the original report, but same result.. also there seem to be some related exception in the log: 015-08-13 18:35:17,262 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-8-thread-13) [307c57f6] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'AddVm', executionIndex: '0' 2015-08-13 18:35:17,279 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (org.ovirt.thread.pool-8-thread-18) [fd52c13] transaction rolled back 2015-08-13 18:35:17,279 ERROR [org.ovirt.engine.core.bll.job.ExecutionHandler] (org.ovirt.thread.pool-8-thread-18) [fd52c13] Exception: javax.persistence.EntityNotFoundException: Unable to find org.ovirt.engine.core.common.job.Step with id f8b23151-8e0d-434b-ba41-321dcf3687c8 Oved, can someone from infra take a look on this?
Sure, Liran, can you take a look?
Also, Shira, can you test with latest 3.6.0 release?
Oved, i was able to reproduce this (easily) on latest master as well. just create a vm-pool with 3 vms, use template that has a disk. i could not reproduce this with template that has no disks, so it makes me believe there is a race around tasks/jobs infrastructure (how job is marked as finished) in my testing, i got the exception mentioned in comment 8 only in some of the cases.
reproduce this also in downstream : Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.11.master.el6
*** Bug 1256030 has been marked as a duplicate of this bug. ***
Not sure if this bz should handle also task for allocation vm from pool, but this task is stuck as well in 3.6.0-12.
(In reply to Ondra Machacek from comment #14) > Not sure if this bz should handle also task for allocation vm from pool, but > this task is stuck as well in 3.6.0-12. I'd treat it as a separate bug, since there is other known issue of RemovingVmPool which is also stuck and not related to the job/step mechanism. So that issue should be examine and handled separately.
(In reply to Moti Asayag from comment #15) > (In reply to Ondra Machacek from comment #14) > > Not sure if this bz should handle also task for allocation vm from pool, but > > this task is stuck as well in 3.6.0-12. > > I'd treat it as a separate bug, since there is other known issue of > RemovingVmPool which is also stuck and not related to the job/step mechanism. > So that issue should be examine and handled separately. Changing the title to reflect the right scope.
verified on : Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.18.el6
*** Bug 1304663 has been marked as a duplicate of this bug. ***
Shmuel, can you please take a look? Sefi reports that it happens again on 3.6.3 and I see it also on master
Separate bug 1310426 was created for this issue.