Bug 1399688
| Summary: | Snapshot delete on HSM fails | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Kevin Alon Goldblatt <kgoldbla> | ||||||
| Component: | Core | Assignee: | Ala Hino <ahino> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Kevin Alon Goldblatt <kgoldbla> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 4.18.15.2 | CC: | amureini, bugs, fromani, gklein, hobbes1069, kgoldbla, michal.skrivanek, tnisan, ylavi | ||||||
| Target Milestone: | ovirt-4.0.6 | Keywords: | Regression | ||||||
| Target Release: | 4.18.20 | Flags: | rule-engine:
ovirt-4.0.z+
rule-engine: blocker+ |
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | storage | ||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-01-18 07:27:21 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1321018 | ||||||||
| Attachments: |
|
||||||||
Created attachment 1225888 [details]
server, vdsm, engine.log
Added logs
Umm... I'm thinking you got the wrong bug as 1321018 is a review request... I see quite a few of these in the HSM's log, not sure what to make of them ATM:
periodic/1::ERROR::2016-11-24 18:04:48,962::executor::232::Executor::(_execute_task) Unhandled exception in Task(callable=<BlockjobMonitor vm=20a5e29e-5576-4e2b-8901-4147bfba13ab at 0x2f8fe50>, timeout=7.5)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 230, in _execute_task
task.callable()
File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 302, in __call__
self._execute()
File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 373, in _execute
self._vm.updateVmJobs()
File "/usr/share/vdsm/virt/vm.py", line 4506, in updateVmJobs
self._vmJobs = self.queryBlockJobs()
File "/usr/share/vdsm/virt/vm.py", line 4529, in queryBlockJobs
drive = self._findDriveByUUIDs(storedJob['disk'])
File "/usr/share/vdsm/virt/vm.py", line 3248, in _findDriveByUUIDs
raise LookupError("No such drive: '%s'" % drive)
LookupError: No such drive: '{'domainID': '38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e', 'imageID': '2df48dca-c4d1-431d-81e6-08676f7a469f', 'volumeID': '1ac5ad49-3e63-454a-954b-6f074acd4ed6', 'poolID': 'f30058ff-5799-45b2-a272-22b2d198aa16'}'
(In reply to Allon Mureinik from comment #3) > I see quite a few of these in the HSM's log, not sure what to make of them > ATM: > > periodic/1::ERROR::2016-11-24 > 18:04:48,962::executor::232::Executor::(_execute_task) Unhandled exception > in Task(callable=<BlockjobMonitor vm=20a5e29e-5576-4e2b-8901-4147bfba13ab at > 0x2f8fe50>, timeout=7.5) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 230, in > _execute_task > task.callable() > File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 302, > in __call__ > self._execute() > File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 373, > in _execute > self._vm.updateVmJobs() > File "/usr/share/vdsm/virt/vm.py", line 4506, in updateVmJobs > self._vmJobs = self.queryBlockJobs() > File "/usr/share/vdsm/virt/vm.py", line 4529, in queryBlockJobs > drive = self._findDriveByUUIDs(storedJob['disk']) > File "/usr/share/vdsm/virt/vm.py", line 3248, in _findDriveByUUIDs > raise LookupError("No such drive: '%s'" % drive) > LookupError: No such drive: '{'domainID': > '38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e', 'imageID': > '2df48dca-c4d1-431d-81e6-08676f7a469f', 'volumeID': > '1ac5ad49-3e63-454a-954b-6f074acd4ed6', 'poolID': > 'f30058ff-5799-45b2-a272-22b2d198aa16'}' This is related, but should not be the root cause This task periodically update the block jobs info to be reported to Engine using getVmStats() and friends. This error means that the block job tracking information is stale. This is bad news, but it is compatible with the reported state. can you please include more of the logs, even at the beginning the err is already happening. Added a new set of logs:
NB! To help search the scenario in the logs in engine with:
echo "testone_start" >> engine.log
echo "testone_end" >> engine.log
NB! To help search the scenario in the logs in SPM and HSM with:
echo "testone_start" >> vdsm.log
echo "testone_end" >> vdsm.log
Verified again with the following code:
---------------------------------------------------
vdsm-4.18.999-759.git435a852.el7.centos.x86_64 SPM
vdsm-4.18.17-1.el7ev.x86_64 HSM
ovirt-engine-4.0.6-0.1.el7ev.noarch
How reproducible:
Every tine
Steps to Reproduce:
1. Create vm with disk on SPM and run
2. Create snapshot
3. Delete snapshot - snapshot link is deleted in /rhev/data-center
4. Stop vm
5. Start the vm on HSM
6.Create a new snapshot
7. Delete the snapshot - snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center
Actual results:
snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center
Expected results:
Additional info:
HSM
Before
├── adabaeb2-09c7-407c-a93d-5144f52b2a4f
│ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462
With snapshots
├── adabaeb2-09c7-407c-a93d-5144f52b2a4f
│ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462
│ │ │ └── 93d39a43-7d81-40dd-9060-f33857d7bfd7 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/93d39a43-7d81-40dd-9060-f33857d7bfd7
After deleting snapshots
├── adabaeb2-09c7-407c-a93d-5144f52b2a4f
│ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462
│ │ │ └── 93d39a43-7d81-40dd-9060-f33857d7bfd7 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/93d39a43-7d81-40dd-9060-f33857d7bfd7
^^^^^^SHOULD HAVE BEEN DELETED^^^^^^^
From Engine.log
-------------------------------
ternal: true. Entities affected : ID: 00000000-0000-0000-0000-000000000000 Type: Storage
2016-12-06 19:34:10,181 INFO [org.ovirt.engine.core.bll.RemoveMemoryVolumesCommand] (pool-5-thread-7) [1489e1f1] Running command: RemoveMemoryVolumesCommand internal: true.
2016-12-06 19:34:10,217 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-15) [] Correlation ID: 5cab2dba, Job ID: 9056c6db-b868-4807-8776-ba8c48183a0b, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'sns_hms_testone' deletion for VM 'testone' was initiated by admin@internal-authz.
2016-12-06 19:34:10,254 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (pool-5-thread-7) [65a94a76] Lock Acquired to object 'EngineLock:{exclusiveLocks='[40c3733e-b0b4-4f72-aa09-a903dacba9cd=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_REMOVED$DiskName snapshot_memory>]', sharedLocks='null'}'
2016-12-06 19:34:10,273 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (pool-5-thread-7) [65a94a76] Running command: RemoveDiskCommand internal: true. Entities affected : ID: 40c3733e-b0b4-4f72-aa09-a903dacba9cd Type: DiskAction group DELETE_DISK with role type USER
Created attachment 1228662 [details]
server, vdsm, engine.log
added new logs
This will only be included in v4.18.19 and onwards, moving back to MODIFIED (In reply to Tal Nisan from comment #8) > This will only be included in v4.18.19 and onwards, moving back to MODIFIED Tal, 4.18.20 has been released in 4.0.6 RC4 wasn't it supposed to be released today? It was moved to ON_QA and QE had 4.18.18 as the latest version Verified with the following version: ---------------------------------------------- vdsm-4.18.20-1.el7ev.x86_64 SPM vdsm-4.18.20-1.el7ev.x86_64 HSM rhevm-4.0.6.3-0.1.el7ev.noarch How reproducible: Every tine Verified with the following scenario: ---------------------------------------------- Steps to Reproduce: 1. Create vm with disk on SPM and run 2. Create snapshot 3. Delete snapshot - snapshot link is deleted in /rhev/data-center 4. Stop vm 5. Start the vm on HSM 6.Create a new snapshot 7. Delete the snapshot - snapshot successfully deleted and links deleted Actual results: snapshot is deleted and link removed in /rhev/data-center |
Description of problem: Snapshot delete on HSM fails - works fine on SPM Version-Release number of selected component (if applicable): vdsm-4.18.999-759.git435a852.el7.centos.x86_64 SPM vdsm-4.18.17-1.el7ev.x86_64 HSM rhevm-4.0.6-0.1.el7ev.noarch How reproducible: Every tine Steps to Reproduce: 1. Create vm with disk on SPM and run 2. Create snapshot 3. Delete snapshot - snapshot link is deleted in /rhev/data-center 4. Stop vm 5. Start the vm on HSM 6.Create a new snapshot 7. Delete the snapshot - snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center Actual results: snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center Expected results: Additional info: SPM Before ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 With snapshots ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ ├── 79463314-d607-4f93-82e9-372c84697c1f -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/79463314-d607-4f93-82e9-372c84697c1f │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 After deleting snapshot ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 HSM Before ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ ├── 1ac5ad49-3e63-454a-954b-6f074acd4ed6 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/1ac5ad49-3e63-454a-954b-6f074acd4ed6 With snapshots ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ ├── 1ac5ad49-3e63-454a-954b-6f074acd4ed6 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/1ac5ad49-3e63-454a-954b-6f074acd4ed6 │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 After deleting snapshots ├── 2df48dca-c4d1-431d-81e6-08676f7a469f │ │ │ ├── 1ac5ad49-3e63-454a-954b-6f074acd4ed6 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/1ac5ad49-3e63-454a-954b-6f074acd4ed6 │ │ │ └── a0a14c70-2b3b-453d-a94d-9da790b7f036 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/a0a14c70-2b3b-453d-a94d-9da790b7f036 SHOULD HAVE BEEN DELETED From engine.log ----------------------------------------------------------------- 2016-11-24 16:42:40,181 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [] Correlation ID: 5e2a8d2d, Job ID: f03fe7bf-9e3a-433b-8356-8c11d6bd9 cb1, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'snapper1' deletion for VM 'bbbb' was initiated by admin@internal-authz. 2016-11-24 16:42:40,184 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (pool-5-thread-3) [21f25490] Running command: RemoveDiskCommand internal: true. Entities affected : ID: aab7e02f-229f-4bf3-8942-634858e873f5 Type: DiskAction group DELETE_DISK with role type USER 2016-11-24 16:42:40,191 INFO [org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand] (pool-5-thread-3) [7ea6c4ea] Running command: RemoveImageCommand internal: true. Entities aff ected : ID: 38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e Type: Storage 2016-11-24 16:42:40,228 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-5-thread-3) [7ea6c4ea] START, DeleteImageGroupVDSCommand( DeleteImageGroupVDSComma ndParameters:{runAsync='true', storagePoolId='f30058ff-5799-45b2-a272-22b2d198aa16', ignoreFailoverLimit='false', storageDomainId='38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e', imageGroupId='aab7e 02f-229f-4bf3-8942-634858e873f5', postZeros='false', forceDelete='false'}), log id: 223961f5