Bug 1207290
| Summary: | [engine-backend] Live merge failure (VM with disks on block and file) after a successful merge | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Elad <ebenahar> | ||||
| Component: | ovirt-engine | Assignee: | Adam Litke <alitke> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Aharon Canan <acanan> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.5.1 | CC: | acanan, alitke, amureini, ecohen, gklein, lpeer, lsurette, rbalakri, Rhev-m-bugs, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.5.1 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-03-31 20:18:18 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Reproduced again. It seems that it happens while trying to merge the last created snapshot after all the newer snapshots were merged. Adam - this looks like a dup of an issue you're already handling, no? Can you please take a look? When looking at the vdsm log I see the following: Thread-5396::INFO::2015-03-30 17:55:17,974::vm::6089::vm.Vm::(tryPivot) vmId=`42829cb9-9d04-4ef6-8719-c0079abee6df`::Requesting pivot to complete active layer commit (job 3a4ce665-7836-490f-9020-780c609ea9c1) Thread-5396::INFO::2015-03-30 17:55:18,205::vm::6101::vm.Vm::(tryPivot) vmId=`42829cb9-9d04-4ef6-8719-c0079abee6df`::Pivot completed (job 3a4ce665-7836-490f-9020-780c609ea9c1) Thread-5396::INFO::2015-03-30 17:55:18,205::vm::6108::vm.Vm::(run) vmId=`42829cb9-9d04-4ef6-8719-c0079abee6df`::Synchronizing volume chain after live merge (job 3a4ce665-7836-490f-9020-780c609ea9c1) Thread-5396::DEBUG::2015-03-30 17:55:18,363::vm::5979::vm.Vm::(_syncVolumeChain) vmId=`42829cb9-9d04-4ef6-8719-c0079abee6df`::vdsm chain: [u'7fe5061b-d4d3-47a7-813e-2693eb3fce2e', u'ed77872b-b305-4812-9107-25c190c57354'], libvirt chain: [u'7fe5061b-d4d3-47a7-813e-2693eb3fce2e', u'ed77872b-b305-4812-9107-25c190c57354'] Merge job 3a4ce665-7836-490f-9020-780c609ea9c1 was an active layer commit and _syncVolumeChain tells us that after the pivot the same two volUUIDs exist in the chain reported to us by libvirt. This is definitely the race described in bug 1207808. *** This bug has been marked as a duplicate of bug 1207808 *** |
Created attachment 1008557 [details] logs from engine, vdsm, images table from db, pg dump, lvm output and /rhev/data-center tree Description of problem: I tried to live delete a snapshot for a VM that had 4 disks, 2 of them located on a NFS domain and 2 on FC domain. This was a second live merge after the first had succeeded. This second attempt failed on engine. Looking in the snapshot overview for these domains, I saw that the snapshot disks that located on the FC domain where removed successfully, while the snapshot disks located on the NFS domain were in status 'Illegal'. Version-Release number of selected component (if applicable): rhev 3.5.1 vt14.1 rhel7.1 vdsm-4.16.12.1-3.el7ev.x86_64 libvirt-daemon-1.2.8-16.el7_1.2.x86_64 qemu-kvm-rhev-2.1.2-23.el7_1.1.x86_64 How reproducible: Tested once Steps to Reproduce: 1. Created a VM with a disk on FC domain attached, installed OS. Attached 3 more disks, 2 from NFS domain and 1 from FC domain 2. Created 2 snapshots for the VM with all the disks. 3. Live removed successfully the first created snapshot 4. Tried to live remove the second created snapshot Actual results: The snapshot removal reported as failed on engine: 2015-03-30 17:56:38,129 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-41) [10c9ee97] All Live Merge child commands have completed, status FAILED 2015-03-30 17:56:48,809 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (DefaultQuartzScheduler_Worker-59) [10c9ee97] Ending command with failure: org.ovirt.engine.core.bll.RemoveSnapshotCommand 2015-03-30 17:56:48,881 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-59) [10c9ee97] Correlation ID: 10c9ee97, Call Stack: null, Custom Event ID: -1, Message: Failed to delete snapshot '2' for VM 'vm-2'. After the failure, the 2 snapshot disks, which attached to the VM and reside on the NFS domain were reported as 'Illegal' while the 2 snapshot disks, which attached to the VM and reside on the FC domain didn't exist anymore, they were successfully removed. Expected results: Live merge should succeed Additional info: attached: logs from engine, vdsm, images table from db, pg dump, lvm output and /rhev/data-center tree