Bug 1399688
Summary: | Snapshot delete on HSM fails | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Kevin Alon Goldblatt <kgoldbla> | ||||||
Component: | Core | Assignee: | Ala Hino <ahino> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Kevin Alon Goldblatt <kgoldbla> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.18.15.2 | CC: | amureini, bugs, fromani, gklein, hobbes1069, kgoldbla, michal.skrivanek, tnisan, ylavi | ||||||
Target Milestone: | ovirt-4.0.6 | Keywords: | Regression | ||||||
Target Release: | 4.18.20 | Flags: | rule-engine:
ovirt-4.0.z+
rule-engine: blocker+ |
||||||
Hardware: | x86_64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-01-18 07:27:21 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1321018 | ||||||||
Attachments: |
|
Description
Kevin Alon Goldblatt
2016-11-29 14:57:24 UTC
Created attachment 1225888 [details]
server, vdsm, engine.log
Added logs
Umm... I'm thinking you got the wrong bug as 1321018 is a review request... I see quite a few of these in the HSM's log, not sure what to make of them ATM: periodic/1::ERROR::2016-11-24 18:04:48,962::executor::232::Executor::(_execute_task) Unhandled exception in Task(callable=<BlockjobMonitor vm=20a5e29e-5576-4e2b-8901-4147bfba13ab at 0x2f8fe50>, timeout=7.5) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 230, in _execute_task task.callable() File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 302, in __call__ self._execute() File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 373, in _execute self._vm.updateVmJobs() File "/usr/share/vdsm/virt/vm.py", line 4506, in updateVmJobs self._vmJobs = self.queryBlockJobs() File "/usr/share/vdsm/virt/vm.py", line 4529, in queryBlockJobs drive = self._findDriveByUUIDs(storedJob['disk']) File "/usr/share/vdsm/virt/vm.py", line 3248, in _findDriveByUUIDs raise LookupError("No such drive: '%s'" % drive) LookupError: No such drive: '{'domainID': '38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e', 'imageID': '2df48dca-c4d1-431d-81e6-08676f7a469f', 'volumeID': '1ac5ad49-3e63-454a-954b-6f074acd4ed6', 'poolID': 'f30058ff-5799-45b2-a272-22b2d198aa16'}' (In reply to Allon Mureinik from comment #3) > I see quite a few of these in the HSM's log, not sure what to make of them > ATM: > > periodic/1::ERROR::2016-11-24 > 18:04:48,962::executor::232::Executor::(_execute_task) Unhandled exception > in Task(callable=<BlockjobMonitor vm=20a5e29e-5576-4e2b-8901-4147bfba13ab at > 0x2f8fe50>, timeout=7.5) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 230, in > _execute_task > task.callable() > File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 302, > in __call__ > self._execute() > File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 373, > in _execute > self._vm.updateVmJobs() > File "/usr/share/vdsm/virt/vm.py", line 4506, in updateVmJobs > self._vmJobs = self.queryBlockJobs() > File "/usr/share/vdsm/virt/vm.py", line 4529, in queryBlockJobs > drive = self._findDriveByUUIDs(storedJob['disk']) > File "/usr/share/vdsm/virt/vm.py", line 3248, in _findDriveByUUIDs > raise LookupError("No such drive: '%s'" % drive) > LookupError: No such drive: '{'domainID': > '38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e', 'imageID': > '2df48dca-c4d1-431d-81e6-08676f7a469f', 'volumeID': > '1ac5ad49-3e63-454a-954b-6f074acd4ed6', 'poolID': > 'f30058ff-5799-45b2-a272-22b2d198aa16'}' This is related, but should not be the root cause This task periodically update the block jobs info to be reported to Engine using getVmStats() and friends. This error means that the block job tracking information is stale. This is bad news, but it is compatible with the reported state. can you please include more of the logs, even at the beginning the err is already happening. Added a new set of logs: NB! To help search the scenario in the logs in engine with: echo "testone_start" >> engine.log echo "testone_end" >> engine.log NB! To help search the scenario in the logs in SPM and HSM with: echo "testone_start" >> vdsm.log echo "testone_end" >> vdsm.log Verified again with the following code: --------------------------------------------------- vdsm-4.18.999-759.git435a852.el7.centos.x86_64 SPM vdsm-4.18.17-1.el7ev.x86_64 HSM ovirt-engine-4.0.6-0.1.el7ev.noarch How reproducible: Every tine Steps to Reproduce: 1. Create vm with disk on SPM and run 2. Create snapshot 3. Delete snapshot - snapshot link is deleted in /rhev/data-center 4. Stop vm 5. Start the vm on HSM 6.Create a new snapshot 7. Delete the snapshot - snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center Actual results: snapshot remains LOCKED and snapshot link is NOT deleted in /rhev/data-center Expected results: Additional info: HSM Before ├── adabaeb2-09c7-407c-a93d-5144f52b2a4f │ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462 With snapshots ├── adabaeb2-09c7-407c-a93d-5144f52b2a4f │ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462 │ │ │ └── 93d39a43-7d81-40dd-9060-f33857d7bfd7 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/93d39a43-7d81-40dd-9060-f33857d7bfd7 After deleting snapshots ├── adabaeb2-09c7-407c-a93d-5144f52b2a4f │ │ │ ├── 7ee69716-54b1-4be1-828b-fbc6760a8462 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/7ee69716-54b1-4be1-828b-fbc6760a8462 │ │ │ └── 93d39a43-7d81-40dd-9060-f33857d7bfd7 -> /dev/38b9ece5-6e9b-47c8-8f4c-3edc3e20d72e/93d39a43-7d81-40dd-9060-f33857d7bfd7 ^^^^^^SHOULD HAVE BEEN DELETED^^^^^^^ From Engine.log ------------------------------- ternal: true. Entities affected : ID: 00000000-0000-0000-0000-000000000000 Type: Storage 2016-12-06 19:34:10,181 INFO [org.ovirt.engine.core.bll.RemoveMemoryVolumesCommand] (pool-5-thread-7) [1489e1f1] Running command: RemoveMemoryVolumesCommand internal: true. 2016-12-06 19:34:10,217 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-15) [] Correlation ID: 5cab2dba, Job ID: 9056c6db-b868-4807-8776-ba8c48183a0b, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'sns_hms_testone' deletion for VM 'testone' was initiated by admin@internal-authz. 2016-12-06 19:34:10,254 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (pool-5-thread-7) [65a94a76] Lock Acquired to object 'EngineLock:{exclusiveLocks='[40c3733e-b0b4-4f72-aa09-a903dacba9cd=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_REMOVED$DiskName snapshot_memory>]', sharedLocks='null'}' 2016-12-06 19:34:10,273 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (pool-5-thread-7) [65a94a76] Running command: RemoveDiskCommand internal: true. Entities affected : ID: 40c3733e-b0b4-4f72-aa09-a903dacba9cd Type: DiskAction group DELETE_DISK with role type USER Created attachment 1228662 [details]
server, vdsm, engine.log
added new logs
This will only be included in v4.18.19 and onwards, moving back to MODIFIED (In reply to Tal Nisan from comment #8) > This will only be included in v4.18.19 and onwards, moving back to MODIFIED Tal, 4.18.20 has been released in 4.0.6 RC4 wasn't it supposed to be released today? It was moved to ON_QA and QE had 4.18.18 as the latest version Verified with the following version: ---------------------------------------------- vdsm-4.18.20-1.el7ev.x86_64 SPM vdsm-4.18.20-1.el7ev.x86_64 HSM rhevm-4.0.6.3-0.1.el7ev.noarch How reproducible: Every tine Verified with the following scenario: ---------------------------------------------- Steps to Reproduce: 1. Create vm with disk on SPM and run 2. Create snapshot 3. Delete snapshot - snapshot link is deleted in /rhev/data-center 4. Stop vm 5. Start the vm on HSM 6.Create a new snapshot 7. Delete the snapshot - snapshot successfully deleted and links deleted Actual results: snapshot is deleted and link removed in /rhev/data-center |