Bug 891610
| Summary: | 3.2 - engine: live snapshot fails due to race on multiple move of disks (live storage migration) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Yeela Kaplan <ykaplan> |
| Component: | vdsm | Assignee: | Eduardo Warszawski <ewarszaw> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Dafna Ron <dron> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.2.0 | CC: | abaron, bazulay, cpelland, hateya, iheim, lpeer, oourfali, scohen, ykaul |
| Target Milestone: | --- | ||
| Target Release: | 3.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | storage | ||
| Fixed In Version: | vdsm-4.10.2-4.0 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 876558 | Environment: | |
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 915537 | ||
*** Bug 891609 has been marked as a duplicate of this bug. *** verified on sf10 with vdsm-4.10.2-11.0.el6ev.x86_64 and libvirt-0.10.2-18.el6_4.eblake.2.x86_64 3.2 has been released 3.2 has been released 3.2 has been released |
Description of problem: createVolume is not finished on spm when we send prepareVolume which will fail because volume does not exist yet. Version-Release number of selected component (if applicable): si24.1 How reproducible: 100% Steps to Reproduce: 1. create 3 domains in iscsi 2. create a template with 15GB thin provision disk and OS installed and create 20 pool vm's on one domain and 2 more vm's as clone on a second domain 3. run the vms on 2 hosts and have them write (opening explorer in the vms will be enough no need for heavy writing) 4. from vm's tab -> disks -> move each of the vm's disks to the 3ed domain Actual results: some of the vms fail to create the live snapshot because prepareVolume was sent to hsm before the createVolume finished creating the volume. Expected results: we should not send prepareVolume before confirming that createVolume completed successfully. Additional info:logs since there are a lot of tasks running at the same time and we already debugged here is the relevant info: in spm log, create volume is Thread-4416:: Task is 11b6930d-5f8c-435e-9de4-46c6cc34684f prepareVolume for same action in hsm is on Thread-4275: this is the error with the volume that failed: VolumeMetadataReadError: Error while processing volume meta data: ('missing offset tag on volume 710f7022-29ae-40cd-9b41-3a8a22fd8cc1',) engine log: this is the SnapshotVDSCommand for the vm: 2012-11-14 12:15:36,360 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) START, SnapshotVDSCommand(HostName = gold-vdsd, HostId = 2d81a26a-2c20-11e2-aeab-001a4a169741, vmId=3d393cd1-666e-4283-a2b6-ce99e74656f4), log id: 57fa069c and here is the failure in engine log: 2012-11-14 12:15:47,105 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) FINISH, SnapshotVDSCommand, log id: 57fa069c 2012-11-14 12:15:47,105 ERROR [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (pool-4-thread-23) Wasnt able to live snpashot due to error: VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, rolling back.