Created attachment 901402 [details] async_tasks Description of problem: When creating large amount of templates,the tasks info are kept in the psql table, once interrupting the procedure with vdsmd restart,the operation fails,but the tasks remain on the table(see image). engine=# SELECT task_id FROM async_tasks; task_id -------------------------------------- 0a52b2ac-a060-4ccf-8975-822832f2aa69 05d95bee-0afd-40dc-80f3-6bdab279a0a1 a20acce1-02fd-4bef-ac3a-598b2349f441 fffa8ef2-d89e-4f29-9a3b-28933d6daeac 7ca2550a-9e91-416b-adcb-8af2ca4eb648 8c08e5b2-71cf-43b8-a672-4d8b44829242 be0fc12b-a4c4-4c0e-bdb0-2b5358698bb4 (7 rows) Version-Release number of selected component (if applicable): rhevm-3.4.0-0.21.el6ev.noarch vdsm-4.14.7-3.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1.create 7 vm's + (2X disks) each 2.create templates from all vm's at the same time 3.restart vdsm daemon Actual results: operation fails,async_tasks are not cleared Expected results: operation should fail,async_tasks should be cleared Additional info:
please attach enigne.log ...
Created attachment 902496 [details] vdsm+engine logs as requested please pay attention to task's time of execute,some of the unclear tasks are from older tries. engine=# SELECT action_type,task_id,started_at FROM async_tasks; action_type | task_id | started_at -------------+--------------------------------------+---------------------------- 211 | 2cf0e4c6-9e08-4892-ae63-99a8bc96a570 | 2014-06-05 14:00:05.694+03 211 | 53db65ed-3e93-4fe2-9b06-ec64df17fa4a | 2014-06-05 14:00:05.847+03 211 | 73e9cc84-2ce4-4b6d-bf54-3a75485c173f | 2014-06-05 14:00:09.31+03 211 | 7f59fb97-c596-4311-90bd-1e322e885b1d | 2014-06-05 14:00:07.362+03 211 | 3e0ce088-b2d2-463f-a936-dbf021753c29 | 2014-06-05 14:00:11.167+03 211 | 0248ece2-db44-41d7-aff6-9e2187ee86cc | 2014-06-05 14:00:13.402+03 211 | 0a52b2ac-a060-4ccf-8975-822832f2aa69 | 2014-06-02 11:30:15.506+03 211 | 05d95bee-0afd-40dc-80f3-6bdab279a0a1 | 2014-06-02 11:30:15.505+03 211 | a20acce1-02fd-4bef-ac3a-598b2349f441 | 2014-06-02 11:30:15.593+03 211 | fffa8ef2-d89e-4f29-9a3b-28933d6daeac | 2014-06-02 11:30:15.739+03 211 | 7ca2550a-9e91-416b-adcb-8af2ca4eb648 | 2014-06-02 11:30:16.31+03 211 | 8c08e5b2-71cf-43b8-a672-4d8b44829242 | 2014-06-02 11:30:16.972+03 211 | be0fc12b-a4c4-4c0e-bdb0-2b5358698bb4 | 2014-06-02 11:30:18.432+03
The tasks are for remove-image operations on non-existing images: AddVmTemplate fails => trying to end with failure CreateImageTemplate => call RemoveImage => DeleteImageGroupVDSCommand returns an error that the image doesn't exist IIUC, in this case there is no task in VDSM for the RemoveImage, thus the task in the engine will not be removed.
(In reply to Arik from comment #3) > IIUC, in this case there is no task in VDSM for the RemoveImage, thus the > task in the engine will not be removed. If this is true, it's either a misuse of the existing infra or a bug in the said infra - in any event, it should be fixed.
The problem here is that a task holder is being persisted in RemoveImage, while getting from vdsm specific errors in task creation like ImageDoesNotExistInDomainError are still considered as success although task has not been created - In that case, the task placeholder won't be cleared from the async tasks table. We need to clear the placeholders in the end of each execution regardless to it's success (each flow should decide wether it succeeded or not) to avoid this issue in more flow. We can inspect the removal of the placeholder in remove image regardless (as the placeholders are less useful when creating one task) although we might use it for other benefits si i prefer to leave it there in the meanwhile.
Oved/Ravi - the provided patch handles this on an infra level - your feedback would be appreciated.
is this fixed on oVirt beta.2 also? or should we verify only on the downstream build?
as long as it's not a downstream fix only, which it isn't as you can see the fix was done on upstream ovirt, you can continue to verify on upstream beta. downstream build was very initial and doesn't contain all components. was done mostly for build purpose.
Created attachment 922091 [details] vdsm+engine logs bug reproduced on beta.2 engine=# SELECT task_id,action_type,status,vdsm_task_id from async_tasks; task_id | action_type | status | vdsm_task_id --------------------------------------+-------------+--------+-------------------------------------- 7b55facf-f955-487c-975f-b6a64a98ec80 | 201 | 2 | 00000000-0000-0000-0000-000000000000 23df9612-b7f6-45a3-b21f-0ea82c26bce7 | 201 | 2 | 00000000-0000-0000-0000-000000000000 8b1ea5f2-38ea-4135-8e11-15c75adaf521 | 201 | 2 | 00000000-0000-0000-0000-000000000000 (3 rows)
after further investigation,psql async_tasks isn't cleared due to different bugs which do not affect this one. we opened BZ #1126204 , BZ #1126205 to monitor current behavior. moving this bug to be verified on beta.2
RHEV-M 3.5.0 has been released, closing this bug.