Description of problem: Adding Disk task is stuck on Executing (Creating Volume) in engine, vdsm host (SPM) shows the task as finished Version-Release number of selected component (if applicable): 3.6.1-0.2.el6 How reproducible: 2 such tasks showed up in one run Steps to Reproduce: In my case, I executed all the storage API Tier 2 cases (for iSCSI, NFS and GlusterFS) Actual results: 2 tasks are showing up in progress on the engine while they show up as finished in the vdsm host Expected results: The task state should be in sync between the engine and the vdsm hosts Additional info: Here's the output from the SPM host: [root@lynx09 ~]# vdsClient -s 0 getAllTasks 7425d4d5-4727-4d36-ae39-3eda492472eb : verb = createVolume code = 0 state = finished tag = spm result = {'uuid': '2e5d8baa-8347-43c0-9038-5fee759ccbc8'} message = 1 jobs completed successfully id = 7425d4d5-4727-4d36-ae39-3eda492472eb 8e09d7f8-08b3-4977-aab2-f4b54b969575 : verb = createVolume code = 0 state = finished tag = spm result = {'uuid': '9cf56973-5821-4ac2-849e-830718db4085'} message = 1 jobs completed successfully id = 8e09d7f8-08b3-4977-aab2-f4b54b969575 See attached logs including the engine's db dump
Please provide logs. Also, for how long does it remain that way?
In addition, in how reproducible you need to specify whether it happens in EVERY run. If that happened once and you can't reproduce it then it means it doesn't reproduce much.
Created attachment 1101468 [details] Engine and vdsm logs plus engine DB dump
Oved, please find the attachment containing the logs and DB dump. It stayed this way for over 24 hours, I had to manually clean it up so I can use the environment for further tests. I ran into 2 such zombie tasks (within in hour) in one 23 hour run. The last time we hit this was a few months back.
(In reply to Gilad Lazarovich from comment #4) > Oved, please find the attachment containing the logs and DB dump. It stayed > this way for over 24 hours, I had to manually clean it up so I can use the > environment for further tests. I ran into 2 such zombie tasks (within in > hour) in one 23 hour run. The last time we hit this was a few months back. So if it reproduces once every few months, then I'm removing the automation blocker, and severity. Ravi - can you look at the logs and see what you can find?
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Removing also the regression flag. If it happens so rarely it might have been a race and also happened before.
Gilad - please contact Ravi directly in case this reproduces. We didn't see anything suspicious in the logs. Currently targeting to 4.0 as without a live reproduction we can't do much here.
Please re-open if reproduces.