Bug 1256786 - LSM fails with Locked disk during failed snapshot create
Summary: LSM fails with Locked disk during failed snapshot create
Keywords:
Status: CLOSED DUPLICATE of bug 1251956
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.0
Hardware: x86_64
OS: Unspecified
urgent
high
Target Milestone: ovirt-3.6.3
: 3.6.0
Assignee: Daniel Erez
QA Contact:
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-25 12:44 UTC by Kevin Alon Goldblatt
Modified: 2016-03-10 06:21 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-09-02 13:10:47 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine, vdsm, server logs (526.76 KB, application/x-gzip)
2015-08-25 12:47 UTC, Kevin Alon Goldblatt
no flags Details
engine, vdsm, server logs, screen shots (1.38 MB, application/x-gzip)
2015-08-25 14:54 UTC, Kevin Alon Goldblatt
no flags Details

Description Kevin Alon Goldblatt 2015-08-25 12:44:47 UTC
Description of problem:
Killed the qemu process of a VM during a LSM operation. The create snapshot operation fails and the disk remains locked

Version-Release number of selected component (if applicable):
rhevm-3.6.0-0.11.master.el6.noarch
vdsm-4.17.2-1.el7ev.noarch

How reproducible:
Ran this once

Steps to Reproduce:
1. Create VM eith 2 disks
2. Create a Snapshot
3. Add 4 new disks
4. Start the VM
5. Start LSM of Block preallocated disk to another Block domain
6. After the Disk is locked and during the Create snapshot operation - Kill the qemu process id of the vm - The LSM OPERATION DOES NOT COMPLETE and THE DISK REMAINS LOCKED

Actual results:
The LSM OPERATION DOES NOT COMPLETE and THE DISK REMAINS LOCKED

Expected results:
The LSM OPERATION should fail gracefully and all locks on disks should be removed.

Additional info:
2015-08-25 13:35:01,798 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler_Worker-14) [] Setting new tasks map. The map contains now 0 tasks
2015-08-25 13:35:01,798 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler_Worker-14) [] Cleared all tasks of pool 'ed6cff1c-7104-409f-bb18-53fb9d29e01d'.
2015-08-25 13:36:04,311 INFO  [org.ovirt.engine.core.bll.MoveDisksCommand] (ajp-/127.0.0.1:8702-1) [d644176] Running command: MoveDisksCommand internal: false. Entities affected :  ID: c00771e2-8787-4c1f-9d52-c88b97c0ab69 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER
2015-08-25 13:36:04,375 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (ajp-/127.0.0.1:8702-1) [d644176] Lock Acquired to object 'EngineLock:{exclusiveLocks='[c00771e2-8787-4c1f-9d52-c88b97c0ab69=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vm1_Disk3>]', sharedLocks='[52a7cbd6-07ec-4130-96a4-a8dad76b2fcb=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}'
2015-08-25 13:36:04,649 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-27) [d644176] Running command: LiveMigrateVmDisksCommand Task handler: LiveSnapshotTaskHandler internal: false. Entities affected :  ID: c00771e2-8787-4c1f-9d52-c88b97c0ab69 Type: DiskAction group DISK_LIVE_STORAGE_MIGRATION with role type USER
2015-08-25 13:36:04,955 WARN  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-27) [50c8009c] CanDoAction of action 'CreateAllSnapshotsFromVm' failed for user admin@internal. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_DISKS_LOCKED,$diskAliases vm1_Disk3
2015-08-25 13:36:04,997 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-27) [50c8009c] Lock freed to object 'EngineLock:{exclusiveLocks='[c00771e2-8787-4c1f-9d52-c88b97c0ab69=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vm1_Disk3>]', sharedLocks='[52a7cbd6-07ec-4130-96a4-a8dad76b2fcb=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}'
2015-08-25 13:36:42,817 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-13) [] START, DestroyVDSCommand(HostName = level1_vm01, DestroyVmVDSCommandParameters:{runAsync='true', hostId='22e9edd5-fa6a-49b0-b875-625143058955', vmId='52a7cbd6-07ec-4130-96a4-a8dad76b2fcb', force='false', secondsToWait='0', gracefully='false', reason=''}), log id: 5c26681
2015-08-25 13:36:42,821 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-13) [] FINISH, DestroyVDSCommand, log id: 5c26681
2015-08-25 13:36:42,850 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-13) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 is down. Exit message: User shut down from within the guest
2015-08-25 13:36:42,854 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-13) [] VM '52a7cbd6-07ec-4130-96a4-a8dad76b2fcb(vm1) is running in db and not running in VDS 'level1_vm01'
2015-08-25 13:36:42,873 INFO  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (org.ovirt.thread.pool-7-thread-35) [fc63110] Running command: ProcessDownVmCommand internal: true.
2015-08-25 14:09:26,638 INFO  [org.ovirt.engine.docs.utils.servlet.ContextSensitiveHelpMappingServlet] (ajp-/127.0.0.1:8702-7) [] Successfully read CSH mapping file '/usr/share/doc/rhevm-doc/manual/en-US/csh.conf.d/webadmin/10-webadmin-en-US.json'

Comment 1 Kevin Alon Goldblatt 2015-08-25 12:47:05 UTC
Created attachment 1066840 [details]
engine, vdsm, server logs

Comment 2 Kevin Alon Goldblatt 2015-08-25 14:39:58 UTC
Ran an addirional test simplifying the scenario:

1. Create a VM
2. Create a preallocated block disk
3. Start the VM
4. Start the LSM of the preallocated block disk to another block domain >>>> The fails to create and the LSM does not continue


Engine.log:
--------------
2015-08-25 16:04:10,453 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM 'aa8200ba-0696-4882-b0d7-b0c778288e0f'(vm3) moved from 'PoweringUp' --> 'Up'
2015-08-25 16:04:10,466 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-3) [] Correlation ID: 67417339, Job ID: 05ca91d8-5692-40b8-8f59-ea7285335d98, Call Stack: null, Custom Event ID: -1, Message: VM vm3 started on Host level1_vm01
2015-08-25 16:14:13,530 INFO  [org.ovirt.engine.core.bll.MoveDisksCommand] (ajp-/127.0.0.1:8702-2) [16b12391] Running command: MoveDisksCommand internal: false. Entities affected :  ID: de2a274d-604f-40c5-8cac-a4d4445e561f Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER
2015-08-25 16:14:13,667 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (ajp-/127.0.0.1:8702-2) [16b12391] Lock Acquired to object 'EngineLock:{exclusiveLocks='[de2a274d-604f-40c5-8cac-a4d4445e561f=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vm3_Disk1>]', sharedLocks='[aa8200ba-0696-4882-b0d7-b0c778288e0f=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}'
2015-08-25 16:14:13,961 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-2) [16b12391] Running command: LiveMigrateVmDisksCommand Task handler: LiveSnapshotTaskHandler internal: false. Entities affected :  ID: de2a274d-604f-40c5-8cac-a4d4445e561f Type: DiskAction group DISK_LIVE_STORAGE_MIGRATION with role type USER
2015-08-25 16:14:14,070 WARN  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-2) [2df09800] CanDoAction of action 'CreateAllSnapshotsFromVm' failed for user admin@internal. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_DISKS_LOCKED,$diskAliases vm3_Disk1
2015-08-25 16:14:14,088 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-2) [2df09800] Lock freed to object 'EngineLock:{exclusiveLocks='[de2a274d-604f-40c5-8cac-a4d4445e561f=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vm3_Disk1>]', sharedLocks='[aa8200ba-0696-4882-b0d7-b0c778288e0f=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}'
2015-08-25 16:33:50,798 INFO  [org.ovirt.engine.core.bll.OvfDataUpdater] (DefaultQuartzScheduler_Worker-71) [] Attempting to update VMs/Templates Ovf.
2015-08-25 16:33:50,801 INFO  [org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand] (DefaultQuartzScheduler_Worker-71) [39ee79e7] Running command: ProcessOvfUpdateForStoragePoolCommand internal: true. Entities affected :  ID: 8f5d5cf4-f455-4b6f-a2cc-a0d23b96ddd4 Type: StoragePool

Comment 3 Kevin Alon Goldblatt 2015-08-25 14:54:18 UTC
Created attachment 1066896 [details]
engine, vdsm, server logs, screen shots

Added new logs for the simplified scenario.

NOTE THERE IS A 3 HOUR DIFFERENCE BETWEEN THE ENGINE AND HOST LOGS

Engine: 17:53:27 IDT 2015
Host:   14:53:48 IDT 2015

Comment 4 Allon Mureinik 2015-08-25 17:48:23 UTC
Daniel - this sounds like the same issue as bug 1251956.
Can you take a look please?

Comment 5 Kevin Alon Goldblatt 2015-09-02 11:16:21 UTC
I have re run the scenario again with the latest build and the the problem no longer occurs. It seems this issue was fixed with the new version.

I used the following version to test this scenario:

rhevm-3.6.0-0.12.master.el6.noarch running on Rhel6.7
vdsm-4.17.3-1.el7ev.noarch running on Rhel7.2

Comment 6 Allon Mureinik 2015-09-02 13:10:47 UTC
(In reply to Kevin Alon Goldblatt from comment #5)
> I have re run the scenario again with the latest build and the the problem
> no longer occurs. It seems this issue was fixed with the new version.
This indeed seems to be duplicate of bug 1251956, closing.

*** This bug has been marked as a duplicate of bug 1251956 ***


Note You need to log in before you can comment on or make changes to this bug.