Description of problem: Snapshot delete on HSM fails - works fine on SPM Version-Release number of selected component (if applicable): vdsm-4.18.999-759.git435a852.el7.centos.x86_64 SPM vdsm-4.18.17-1.el7ev.x86_64 HSM rhevm-4.0.6-0.1.el7ev.noarch How reproducible: Every tine Steps to Reproduce: 1. Create vm with disk on SPM and run 2. Create snapshot 3. Delete snapshot - snapshot link is deleted in /rhev/data-center 4. Stop vm 5. Start the vm on HSM 6.Create a new snapshot 7. Delete the snapshot - snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center Actual results: snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center Expected results: Additional info: SPM Before ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 With snapshots ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ ├── 79463314-d607-4f93-82e9-372c84697c1f -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/79463314-d607-4f93-82e9-372c84697c1f │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 After deleting snapshot ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 HSM Before ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ ├── 1ac5ad49-3e63-454a-954b-6f074acd4ed6 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/1ac5ad49-3e63-454a-954b-6f074acd4ed6 With snapshots ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ ├── 1ac5ad49-3e63-454a-954b-6f074acd4ed6 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/1ac5ad49-3e63-454a-954b-6f074acd4ed6 │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 After deleting snapshots ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ ├── 1ac5ad49-3e63-454a-954b-6f074acd4ed6 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/1ac5ad49-3e63-454a-954b-6f074acd4ed6 │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 SHOULD HAVE BEEN DELETED From engine.log ----------------------------------------------------------------- 2016-11-24 16:42:40,181 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [] Correlation ID: 5e2a8d2d, Job ID: f03fe7bf-9e3a-433b-8356-8c11d6bd9 cb1, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'snapper1' deletion for VM 'bbbb' was initiated by admin@internal-authz. 2016-11-24 16:42:40,184 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (pool-5-thread-3) [21f25490] Running command: RemoveDiskCommand internal: true. Entities affected : ID: aab7e02f-229f-4bf3-8942-634858e873f5 Type: DiskAction group DELETE_DISK with role type USER 2016-11-24 16:42:40,191 INFO [org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand] (pool-5-thread-3) [7ea6c4ea] Running command: RemoveImageCommand internal: true. Entities aff ected : ID: 38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e Type: Storage 2016-11-24 16:42:40,228 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-5-thread-3) [7ea6c4ea] START, DeleteImageGroupVDSCommand( DeleteImageGroupVDSComma ndParameters:{runAsync='true', storagePoolId='f30058ff-5799-45b2-a272-22b2d198aa16', ignoreFailoverLimit='false', storageDomainId='38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e', imageGroupId='aab7e 02f-229f-4bf3-8942-634858e873f5', postZeros='false', forceDelete='false'}), log id: 223961f5
Created attachment 1225888 [details] server, vdsm, engine.log Added logs
Umm... I'm thinking you got the wrong bug as 1321018 is a review request...
I see quite a few of these in the HSM's log, not sure what to make of them ATM: periodic/1::ERROR::2016-11-24 18:04:48,962::executor::232::Executor::(_execute_task) Unhandled exception in Task(callable=<BlockjobMonitor vm=20a5e29e-5576-4e2b-8901-4147bfba13ab at 0x2f8fe50>, timeout=7.5) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 230, in _execute_task task.callable() File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 302, in __call__ self._execute() File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 373, in _execute self._vm.updateVmJobs() File "/usr/share/vdsm/virt/vm.py", line 4506, in updateVmJobs self._vmJobs = self.queryBlockJobs() File "/usr/share/vdsm/virt/vm.py", line 4529, in queryBlockJobs drive = self._findDriveByUUIDs(storedJob['disk']) File "/usr/share/vdsm/virt/vm.py", line 3248, in _findDriveByUUIDs raise LookupError("No such drive: '%s'" % drive) LookupError: No such drive: '{'domainID': '38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e', 'imageID': '2df48dca-c4d1-431d-81e6-08676f7a469f', 'volumeID': '1ac5ad49-3e63-454a-954b-6f074acd4ed6', 'poolID': 'f30058ff-5799-45b2-a272-22b2d198aa16'}'
(In reply to Allon Mureinik from comment #3) > I see quite a few of these in the HSM's log, not sure what to make of them > ATM: > > periodic/1::ERROR::2016-11-24 > 18:04:48,962::executor::232::Executor::(_execute_task) Unhandled exception > in Task(callable=<BlockjobMonitor vm=20a5e29e-5576-4e2b-8901-4147bfba13ab at > 0x2f8fe50>, timeout=7.5) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 230, in > _execute_task > task.callable() > File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 302, > in __call__ > self._execute() > File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 373, > in _execute > self._vm.updateVmJobs() > File "/usr/share/vdsm/virt/vm.py", line 4506, in updateVmJobs > self._vmJobs = self.queryBlockJobs() > File "/usr/share/vdsm/virt/vm.py", line 4529, in queryBlockJobs > drive = self._findDriveByUUIDs(storedJob['disk']) > File "/usr/share/vdsm/virt/vm.py", line 3248, in _findDriveByUUIDs > raise LookupError("No such drive: '%s'" % drive) > LookupError: No such drive: '{'domainID': > '38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e', 'imageID': > '2df48dca-c4d1-431d-81e6-08676f7a469f', 'volumeID': > '1ac5ad49-3e63-454a-954b-6f074acd4ed6', 'poolID': > 'f30058ff-5799-45b2-a272-22b2d198aa16'}' This is related, but should not be the root cause This task periodically update the block jobs info to be reported to Engine using getVmStats() and friends. This error means that the block job tracking information is stale. This is bad news, but it is compatible with the reported state.
can you please include more of the logs, even at the beginning the err is already happening.
Added a new set of logs: NB! To help search the scenario in the logs in engine with: echo "testone_start" >> engine.log echo "testone_end" >> engine.log NB! To help search the scenario in the logs in SPM and HSM with: echo "testone_start" >> vdsm.log echo "testone_end" >> vdsm.log Verified again with the following code: --------------------------------------------------- vdsm-4.18.999-759.git435a852.el7.centos.x86_64 SPM vdsm-4.18.17-1.el7ev.x86_64 HSM ovirt-engine-4.0.6-0.1.el7ev.noarch How reproducible: Every tine Steps to Reproduce: 1. Create vm with disk on SPM and run 2. Create snapshot 3. Delete snapshot - snapshot link is deleted in /rhev/data-center 4. Stop vm 5. Start the vm on HSM 6.Create a new snapshot 7. Delete the snapshot - snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center Actual results: snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center Expected results: Additional info: HSM Before ├── adabaeb2-09c7-407c-a93d-5144f52b2a4f │ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462 With snapshots ├── adabaeb2-09c7-407c-a93d-5144f52b2a4f │ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462 │ │ │ └── 93d39a43-7d81-40dd-9060-f33857d7bfd7 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/93d39a43-7d81-40dd-9060-f33857d7bfd7 After deleting snapshots ├── adabaeb2-09c7-407c-a93d-5144f52b2a4f │ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462 │ │ │ └── 93d39a43-7d81-40dd-9060-f33857d7bfd7 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/93d39a43-7d81-40dd-9060-f33857d7bfd7 ^^^^^^SHOULD HAVE BEEN DELETED^^^^^^^ From Engine.log ------------------------------- ternal: true. Entities affected : ID: 00000000-0000-0000-0000-000000000000 Type: Storage 2016-12-06 19:34:10,181 INFO [org.ovirt.engine.core.bll.RemoveMemoryVolumesCommand] (pool-5-thread-7) [1489e1f1] Running command: RemoveMemoryVolumesCommand internal: true. 2016-12-06 19:34:10,217 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-15) [] Correlation ID: 5cab2dba, Job ID: 9056c6db-b868-4807-8776-ba8c48183a0b, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'sns_hms_testone' deletion for VM 'testone' was initiated by admin@internal-authz. 2016-12-06 19:34:10,254 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (pool-5-thread-7) [65a94a76] Lock Acquired to object 'EngineLock:{exclusiveLocks='[40c3733e-b0b4-4f72-aa09-a903dacba9cd=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_REMOVED$DiskName snapshot_memory>]', sharedLocks='null'}' 2016-12-06 19:34:10,273 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (pool-5-thread-7) [65a94a76] Running command: RemoveDiskCommand internal: true. Entities affected : ID: 40c3733e-b0b4-4f72-aa09-a903dacba9cd Type: DiskAction group DELETE_DISK with role type USER
Created attachment 1228662 [details] server, vdsm, engine.log added new logs
This will only be included in v4.18.19 and onwards, moving back to MODIFIED
(In reply to Tal Nisan from comment #8) > This will only be included in v4.18.19 and onwards, moving back to MODIFIED Tal, 4.18.20 has been released in 4.0.6 RC4
wasn't it supposed to be released today? It was moved to ON_QA and QE had 4.18.18 as the latest version
Verified with the following version: ---------------------------------------------- vdsm-4.18.20-1.el7ev.x86_64 SPM vdsm-4.18.20-1.el7ev.x86_64 HSM rhevm-4.0.6.3-0.1.el7ev.noarch How reproducible: Every tine Verified with the following scenario: ---------------------------------------------- Steps to Reproduce: 1. Create vm with disk on SPM and run 2. Create snapshot 3. Delete snapshot - snapshot link is deleted in /rhev/data-center 4. Stop vm 5. Start the vm on HSM 6.Create a new snapshot 7. Delete the snapshot - snapshot successfully deleted and links deleted Actual results: snapshot is deleted and link removed in /rhev/data-center