Bug 1598996
Summary: | deleteVolume task is not cleared after a failed LSM due to power off VM during snapshot creation | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Elad <ebenahar> | ||||||
Component: | BLL.Storage | Assignee: | Benny Zlotnik <bzlotnik> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.2.4.1 | CC: | bugs, ebenahar, ybenshim | ||||||
Target Milestone: | ovirt-4.2.7 | Keywords: | Automation | ||||||
Target Release: | --- | Flags: | rule-engine:
ovirt-4.2+
|
||||||
Hardware: | x86_64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ovirt-engine-4.2.7 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-11-02 14:31:08 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Elad
2018-07-07 17:07:36 UTC
Do we have full logs for this? Created attachment 1459190 [details]
logs
Verified. Versions: ovirt-engine-4.2.6.3 vdsm-4.20.37-2 Moving back to ASSIGNED because this bug seems to still exist. Tested using: ovirt-engine-4.2.6.3-1.el7.noarch vdsm-4.20.37-3.git924eec4.el7.x86_64 While running the specified test case (5967) I found that there is a "deleteVolume" task that is not being cleared. [root@storage-ge16-vdsm3 ~]# vdsm-client Host getAllTasks { "48a1e000-9eff-48a0-9bf5-9152978854a2": { "verb": "deleteVolume", "code": 0, "state": "finished", "tag": "spm", "result": "", "message": "1 jobs completed successfully", "id": "48a1e000-9eff-48a0-9bf5-9152978854a2" } } According to the VDSM log, this task started at 2018-08-22 10:38:28: 2018-08-22 10:38:28,982+0300 DEBUG (jsonrpc/1) [storage.TaskManager.Task] (Task='48a1e000-9eff-48a0-9bf5 -9152978854a2') moving from state preparing -> state preparing (task:602) 2018-08-22 10:38:28,983+0300 INFO (jsonrpc/1) [vdsm.api] START deleteVolume(sdUUID='affb03c6-52c8-4355- be7d-aa8d91452601', spUUID='493b429f-d032-476a-9dad-bb18a4ef4151', imgUUID='8e617cf0-0693-412e-bcc5-9eb5 fe2037cb', volumes=['91ceae38-9a63-45c3-9091-9cf2ba819e3b'], postZero='true', force='false', discard=Tru e) from=::ffff:10.35.163.251,41464, flow_id=disks_syncAction_77f49a65-13b4-465c, task_id=48a1e000-9eff-4 8a0-9bf5-9152978854a2 (api:46) and still in the logs at 2018-08-22 11:10:45. 2018-08-22 11:10:45,270+0300 DEBUG (jsonrpc/3) [storage.TaskManager.Task] (Task='3e035f0d-d425-4dfb-ba1f -d9fefede9b4a') finished: {'allTasksStatus': {'48a1e000-9eff-48a0-9bf5-9152978854a2': {'code': 0, 'messa ge': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '48a1e 000-9eff-48a0-9bf5-9152978854a2'}}} (task:1201) Additional info: When I tried to deactivate this host (SPM), I got this error in the engine log (make sense): 2018-08-22 11:10:45,291+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-36) [49f1eb8e-2a86-4371-bf85-c64974dc5a69] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_3', pool id '493b429f-d032-476a-9dad-bb18a4ef4151' as there are uncleared tasks 'Task '48a1e000-9eff-48a0-9bf5-9152978854a2', status 'finished'' Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Created attachment 1477831 [details]
logs
Yosi, does this happen often? Were there runs in which task was cleared successfully? Looks like the culprit is this NPE: 2018-08-22 10:38:44,139+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-198283) [] [within thread]: endAction for action type CreateSnapshot threw an exception.: java.lang.NullPointerException at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_181] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_181] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78) (In reply to Benny Zlotnik from comment #8) > Yosi, does this happen often? Were there runs in which task was cleared > successfully? It happens in most of the cases. I didn't see this test case execution which it didn't occur. > > Looks like the culprit is this NPE: > 2018-08-22 10:38:44,139+03 ERROR > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] > (EE-ManagedThreadFactory-engine-Thread-198283) [] [within thread]: endAction > for action type CreateSnapshot threw an exception.: > java.lang.NullPointerException > at > org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper. > endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:] > at > org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl. > endAction(CommandCoordinatorImpl.java:347) [bll.jar:] > at > org.ovirt.engine.core.bll.tasks.CommandAsyncTask. > endCommandAction(CommandAsyncTask.java:160) [bll.jar:] > at > org.ovirt.engine.core.bll.tasks.CommandAsyncTask. > lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:] > at > org.ovirt.engine.core.utils.threadpool. > ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) > [utils.jar:] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [rt.jar:1.8.0_181] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [rt.jar:1.8.0_181] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: > 1149) [rt.jar:1.8.0_181] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: > 624) [rt.jar:1.8.0_181] > at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_181] > at > org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread. > run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] > at > org.jboss.as.ee.concurrent.service. > ElytronManagedThreadFactory$ElytronManagedThread. > run(ElytronManagedThreadFactory.java:78) deleteImage task is being cleared when LSM fails due to VM power off. deleteImage task starts: 2018-10-10 17:34:32,992+0300 DEBUG (jsonrpc/6) [storage.TaskManager.Task] (Task='97a1bac7-b21f-49ed-98f2-687257b60f6f') moving from state preparing -> state preparing (task:602) 2018-10-10 17:34:32,993+0300 INFO (jsonrpc/6) [vdsm.api] START deleteImage(sdUUID=u'2d633c7e-e417-4b4d-8b30-e973f1d6d870', spUUID=u'fb339a72-b298-451b-9a13-6554de8ef545', imgUUID=u'edc9bd45-2014-45a8-ab8d-d7c8d83b2f2b', postZero=u'false', force=u'false', discard=False) from=::ffff:10.35.161.127,51832, flow_id=vms_delete_fe902a07-01e2-4ee9, task_id=97a1bac7-b21f-49ed-98f2-687257b60f6f (api:46) And cleared right after: 2018-10-10 17:34:39,476+0300 INFO (jsonrpc/1) [vdsm.api] START clearTask(taskID=u'97a1bac7-b21f-49ed-98f2-687257b60f6f', spUUID=None, options=None) from=::ffff:10.35.161.127,51820, flow_id=vms_delete_fe902a07-0 1e2-4ee9, task_id=22a4ce37-b806-4def-b7fe-4a44eb4f9f3a (api:46) 2018-10-10 17:34:39,476+0300 DEBUG (jsonrpc/1) [storage.TaskManager] Entry. taskID: 97a1bac7-b21f-49ed-98f2-687257b60f6f (taskManager:172) ============= Used: rhvm-4.2.7.2-0.1.el7ev.noarch vdsm-4.20.42-1.el7ev.x86_64 This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |