Description of problem: Killed the qemu process of a VM during a LSM operation. The create snapshot operation fails and the disk remains locked Version-Release number of selected component (if applicable): rhevm-3.6.0-0.11.master.el6.noarch vdsm-4.17.2-1.el7ev.noarch How reproducible: Ran this once Steps to Reproduce: 1. Create VM eith 2 disks 2. Create a Snapshot 3. Add 4 new disks 4. Start the VM 5. Start LSM of Block preallocated disk to another Block domain 6. After the Disk is locked and during the Create snapshot operation - Kill the qemu process id of the vm - The LSM OPERATION DOES NOT COMPLETE and THE DISK REMAINS LOCKED Actual results: The LSM OPERATION DOES NOT COMPLETE and THE DISK REMAINS LOCKED Expected results: The LSM OPERATION should fail gracefully and all locks on disks should be removed. Additional info: 2015-08-25 13:35:01,798 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler_Worker-14) [] Setting new tasks map. The map contains now 0 tasks 2015-08-25 13:35:01,798 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler_Worker-14) [] Cleared all tasks of pool 'ed6cff1c-7104-409f-bb18-53fb9d29e01d'. 2015-08-25 13:36:04,311 INFO [org.ovirt.engine.core.bll.MoveDisksCommand] (ajp-/127.0.0.1:8702-1) [d644176] Running command: MoveDisksCommand internal: false. Entities affected : ID: c00771e2-8787-4c1f-9d52-c88b97c0ab69 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER 2015-08-25 13:36:04,375 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (ajp-/127.0.0.1:8702-1) [d644176] Lock Acquired to object 'EngineLock:{exclusiveLocks='[c00771e2-8787-4c1f-9d52-c88b97c0ab69=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vm1_Disk3>]', sharedLocks='[52a7cbd6-07ec-4130-96a4-a8dad76b2fcb=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2015-08-25 13:36:04,649 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-27) [d644176] Running command: LiveMigrateVmDisksCommand Task handler: LiveSnapshotTaskHandler internal: false. Entities affected : ID: c00771e2-8787-4c1f-9d52-c88b97c0ab69 Type: DiskAction group DISK_LIVE_STORAGE_MIGRATION with role type USER 2015-08-25 13:36:04,955 WARN [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-27) [50c8009c] CanDoAction of action 'CreateAllSnapshotsFromVm' failed for user admin@internal. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_DISKS_LOCKED,$diskAliases vm1_Disk3 2015-08-25 13:36:04,997 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-27) [50c8009c] Lock freed to object 'EngineLock:{exclusiveLocks='[c00771e2-8787-4c1f-9d52-c88b97c0ab69=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vm1_Disk3>]', sharedLocks='[52a7cbd6-07ec-4130-96a4-a8dad76b2fcb=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2015-08-25 13:36:42,817 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-13) [] START, DestroyVDSCommand(HostName = level1_vm01, DestroyVmVDSCommandParameters:{runAsync='true', hostId='22e9edd5-fa6a-49b0-b875-625143058955', vmId='52a7cbd6-07ec-4130-96a4-a8dad76b2fcb', force='false', secondsToWait='0', gracefully='false', reason=''}), log id: 5c26681 2015-08-25 13:36:42,821 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-13) [] FINISH, DestroyVDSCommand, log id: 5c26681 2015-08-25 13:36:42,850 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-13) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 is down. Exit message: User shut down from within the guest 2015-08-25 13:36:42,854 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-13) [] VM '52a7cbd6-07ec-4130-96a4-a8dad76b2fcb(vm1) is running in db and not running in VDS 'level1_vm01' 2015-08-25 13:36:42,873 INFO [org.ovirt.engine.core.bll.ProcessDownVmCommand] (org.ovirt.thread.pool-7-thread-35) [fc63110] Running command: ProcessDownVmCommand internal: true. 2015-08-25 14:09:26,638 INFO [org.ovirt.engine.docs.utils.servlet.ContextSensitiveHelpMappingServlet] (ajp-/127.0.0.1:8702-7) [] Successfully read CSH mapping file '/usr/share/doc/rhevm-doc/manual/en-US/csh.conf.d/webadmin/10-webadmin-en-US.json'
Created attachment 1066840 [details] engine, vdsm, server logs
Ran an addirional test simplifying the scenario: 1. Create a VM 2. Create a preallocated block disk 3. Start the VM 4. Start the LSM of the preallocated block disk to another block domain >>>> The fails to create and the LSM does not continue Engine.log: -------------- 2015-08-25 16:04:10,453 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM 'aa8200ba-0696-4882-b0d7-b0c778288e0f'(vm3) moved from 'PoweringUp' --> 'Up' 2015-08-25 16:04:10,466 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-3) [] Correlation ID: 67417339, Job ID: 05ca91d8-5692-40b8-8f59-ea7285335d98, Call Stack: null, Custom Event ID: -1, Message: VM vm3 started on Host level1_vm01 2015-08-25 16:14:13,530 INFO [org.ovirt.engine.core.bll.MoveDisksCommand] (ajp-/127.0.0.1:8702-2) [16b12391] Running command: MoveDisksCommand internal: false. Entities affected : ID: de2a274d-604f-40c5-8cac-a4d4445e561f Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER 2015-08-25 16:14:13,667 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (ajp-/127.0.0.1:8702-2) [16b12391] Lock Acquired to object 'EngineLock:{exclusiveLocks='[de2a274d-604f-40c5-8cac-a4d4445e561f=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vm3_Disk1>]', sharedLocks='[aa8200ba-0696-4882-b0d7-b0c778288e0f=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2015-08-25 16:14:13,961 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-2) [16b12391] Running command: LiveMigrateVmDisksCommand Task handler: LiveSnapshotTaskHandler internal: false. Entities affected : ID: de2a274d-604f-40c5-8cac-a4d4445e561f Type: DiskAction group DISK_LIVE_STORAGE_MIGRATION with role type USER 2015-08-25 16:14:14,070 WARN [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-2) [2df09800] CanDoAction of action 'CreateAllSnapshotsFromVm' failed for user admin@internal. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_DISKS_LOCKED,$diskAliases vm3_Disk1 2015-08-25 16:14:14,088 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-2) [2df09800] Lock freed to object 'EngineLock:{exclusiveLocks='[de2a274d-604f-40c5-8cac-a4d4445e561f=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vm3_Disk1>]', sharedLocks='[aa8200ba-0696-4882-b0d7-b0c778288e0f=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2015-08-25 16:33:50,798 INFO [org.ovirt.engine.core.bll.OvfDataUpdater] (DefaultQuartzScheduler_Worker-71) [] Attempting to update VMs/Templates Ovf. 2015-08-25 16:33:50,801 INFO [org.ovirt.engine.core.bll.ProcessOvfUpdateForStoragePoolCommand] (DefaultQuartzScheduler_Worker-71) [39ee79e7] Running command: ProcessOvfUpdateForStoragePoolCommand internal: true. Entities affected : ID: 8f5d5cf4-f455-4b6f-a2cc-a0d23b96ddd4 Type: StoragePool
Created attachment 1066896 [details] engine, vdsm, server logs, screen shots Added new logs for the simplified scenario. NOTE THERE IS A 3 HOUR DIFFERENCE BETWEEN THE ENGINE AND HOST LOGS Engine: 17:53:27 IDT 2015 Host: 14:53:48 IDT 2015
Daniel - this sounds like the same issue as bug 1251956. Can you take a look please?
I have re run the scenario again with the latest build and the the problem no longer occurs. It seems this issue was fixed with the new version. I used the following version to test this scenario: rhevm-3.6.0-0.12.master.el6.noarch running on Rhel6.7 vdsm-4.17.3-1.el7ev.noarch running on Rhel7.2
(In reply to Kevin Alon Goldblatt from comment #5) > I have re run the scenario again with the latest build and the the problem > no longer occurs. It seems this issue was fixed with the new version. This indeed seems to be duplicate of bug 1251956, closing. *** This bug has been marked as a duplicate of bug 1251956 ***