1506373 – VM disk left in LOCKED state when added. Tasks seem to have completed on SPM host but not cleared on the engine side.

Bug 1506373 - VM disk left in LOCKED state when added. Tasks seem to have completed on SPM host but not cleared on the engine side.

Summary: VM disk left in LOCKED state when added. Tasks seem to have completed on SPM...

Keywords:
Status:	CLOSED DUPLICATE of bug 1429534
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.0.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	ovirt-4.1.8
Target Release:	---
Assignee:	Fred Rolland
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-25 19:39 UTC by Bimal Chollera
Modified:	2021-05-01 16:54 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-01 10:04:56 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Bimal Chollera 2017-10-25 19:39:24 UTC

Description of problem:
VM disk left in LOCKED state when added.

All the tasks are completed and cleared on the SPM host but on the RHV Engine side jobs remain in STARTED state and VAR__ACTION__ADD", "VAR__TYPE__DISK entity  remains in ACTIVE state even though volumes have been created.

Version-Release number of selected component (if applicable):

ovirt-engine-4.0.7.4-0.1.el7ev.noarch

Red Hat Enterprise Linux Server release 7.3 (Maipo)
vdsm-4.18.15.3-1.el7ev.x86_64

How reproducible:
On End user system

Steps to Reproduce:
1.  Create a VM
2.  Add disks to the VM.
3.

Actual results:

VM disks are in LOCKED state.

Expected results:

VM disks shouldn't be in LOCKED state.

Additional info:

Comment 5 Allon Mureinik 2017-10-26 15:20:28 UTC

The task is most definitely cleared:

2017-10-20 17:35:04,592 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-12) [f8f8d7f] SPMAsyncTask::ClearAsyncTask: Attempting to clear task 'ddd1a8aa-5ce1-4408-82f3-cad1dceb54a8'
2017-10-20 17:35:04,593 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-12) [f8f8d7f] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{runAsync='true', storagePoolId='586f12f5-02b5-0050-02f4-0000000001e0', ignoreFailoverLimit='false', taskId='ddd1a8aa-5ce1-4408-82f3-cad1dceb54a8'}), log id: 69482302
2017-10-20 17:35:04,593 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-12) [f8f8d7f] START, HSMClearTaskVDSCommand(HostName = hosted_engine_1, HSMTaskGuidBaseVDSCommandParameters:{runAsync='true', hostId='b0672110-7f4d-467f-a71a-5524cddc0307', taskId='ddd1a8aa-5ce1-4408-82f3-cad1dceb54a8'}), log id: 441d99f0
2017-10-20 17:35:05,608 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-12) [f8f8d7f] BaseAsyncTask::removeTaskFromDB: Removed task 'ddd1a8aa-5ce1-4408-82f3-cad1dceb54a8' from DataBase

The fact that the disk remains locked is definitely not OK, of course. Need to take a deeper look.

Comment 11 Fred Rolland 2017-10-30 13:23:35 UTC

Hi,

It looks like the issue is the same as :
https://bugzilla.redhat.com/show_bug.cgi?id=1429534

It is fixed in 4.1 with this patch:
https://gerrit.ovirt.org/#/c/74326/

Can I get thread dumps from the engine to make sure it the same issue?

The simplest method is by running 'kill -3 jboss-process-id"
where the jboss-process-id can be obtained by "ps -ef | grep jboss"
The output is redirected to /var/log/ovirt-engine/console.log and should be
kept aside, since it will get erased on next reboot.

Please take 5 thread dumps with 3 seconds interval between each.


BTW, restarting the engine should temporarily fix the issue. You can try to restart the engine and try to add a disk and see if it succeeds.
Please take the thread dumps before restarting the engine.

Comment 12 Marina Kalinin 2017-10-30 15:30:42 UTC

Thanks, Freddy.
Bimal, I created this kcs to have this process documented in general:
https://access.redhat.com/solutions/3227681

Comment 17 Fred Rolland 2017-11-01 08:45:19 UTC

Hi,

Thank you for the information.
After looking at the thread dump, it looks it is the same issue as
https://bugzilla.redhat.com/show_bug.cgi?id=1429534

As expected CommandCallbacksPoller is not in the threads, explaining that the Add Disk command did not finished. (same symptom as BZ1429534).

I suggest to mark this one as duplicate of BZ #1429534.

Did the customer tried to restart the engine and add a disk ? Did it work?

Comment 18 Allon Mureinik 2017-11-01 10:04:56 UTC

(In reply to Fred Rolland from comment #17)
> Hi,
> 
> Thank you for the information.
> After looking at the thread dump, it looks it is the same issue as
> https://bugzilla.redhat.com/show_bug.cgi?id=1429534
> 
> As expected CommandCallbacksPoller is not in the threads, explaining that
> the Add Disk command did not finished. (same symptom as BZ1429534).
> 
> I suggest to mark this one as duplicate of BZ #1429534.
Agreed.
I'm closing this BZ, as there's no development action here. We'll continue to monitor the customer case and offer assistance with manual recovery there if required.

*** This bug has been marked as a duplicate of bug 1429534 ***

Comment 20 Elad 2018-08-02 08:12:23 UTC

DUP bug 1429534 has qe_test_coverage+ so set this one to -

Note You need to log in before you can comment on or make changes to this bug.