Bug 1447437
| Summary: | "No such drive" error after pivot for one disk causes the engine to fail live merge for other disks | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Gordon Watson <gwatson> | ||||
| Component: | ovirt-engine | Assignee: | Ala Hino <ahino> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Kevin Alon Goldblatt <kgoldbla> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.0.7 | CC: | ahino, amureini, gwatson, kgoldbla, kshukla, lsurette, ratamir, rbalakri, Rhev-m-bugs, srevivo, tnisan, ykaul, ylavi | ||||
| Target Milestone: | ovirt-4.2.0 | Keywords: | ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1456437 (view as bug list) | Environment: | |||||
| Last Closed: | 2017-06-25 09:05:08 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1438850 | ||||||
| Bug Blocks: | 1456437 | ||||||
| Attachments: |
|
||||||
Verified with the following code: ------------------------------------------------------------- ovirt-engine-4.2.0-0.0.master.20170612192318.gitf773263.el7.centos.noarch vdsm-4.20.0-1049.gite64d80e.el7.centos.x86_64 Verified with the the following scenario: ------------------------------------------------------------ 1. Create VM2 with 5 disks , thin and preallocated, block and nfs 2. Create 3 snapshots, snapper1, snapper2, snapper3, snapper3 3. Start the VM2 4. Delete snap14_2 a few seconds later restart the vdsm on the host . The delete fails 5. Try deleting same snapshot snap14_2 >>>>> Live Merge fails and snapshot remains locked after vdsm restarts Actual Result: ---------------------- Live Merge fails and snapshot remains locked after vdsm restarts Expected Result: ---------------------- Once VDSM restarts the snapshot should not be locked and allow another delete opperation Moving to ASSIGNED (In reply to Kevin Alon Goldblatt from comment #18) > Verified with the following code: Correction!! Scenario was as follows > ------------------------------------------------------------- > ovirt-engine-4.2.0-0.0.master.20170612192318.gitf773263.el7.centos.noarch > vdsm-4.20.0-1049.gite64d80e.el7.centos.x86_64 > > Verified with the the following scenario: > ------------------------------------------------------------ > 1. Create VM2 with 5 disks , thin and preallocated, block and nfs > 2. Create 3 snapshots, snapper1, snapper2, snapper3, snapper3 > 3. Start the VM2 > 4. Delete snapper2 a few seconds later restart the vdsm on the host . The > delete fails > 5. Try deleting same snapshot snapper2 >>>>> Live Merge fails and snapshot > remains locked after vdsm restarts > > Actual Result: > ---------------------- > Live Merge fails and snapshot remains locked after vdsm restarts > > Expected Result: > ---------------------- > Once VDSM restarts the snapshot should not be locked and allow another > delete opperation > > Moving to ASSIGNED Created attachment 1289641 [details]
server, engine, vdsm logs
adding logs
Engine.log: ----------------- 2017-06-20 15:28:24,321+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-14) [] EVENT_ID: USER_REMOVE_SNAPSHOT(342), Snapshot 'snapper2' deletion for VM 'vm2' was ini tiated by admin@internal-authz. 2017-06-20 15:28:24,424+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Executing Live Merge command step 'EXTE ND' 2017-06-20 15:28:24,544+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Executing Live Merge command step 'EXTE ND' 2017-06-20 15:28:24,594+03 INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-5-thread-6) [194425b9-62e1-4e8f-8283-61b991d62b7d] Running command: MergeExtendCommand internal: true. Entities affected : I D: ab217695-7e38-4684-90c7-9c356b941caa Type: Storage 2017-06-20 15:28:24,595+03 INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-5-thread-6) [194425b9-62e1-4e8f-8283-61b991d62b7d] Base and top image sizes are the same; no image size update required 2017-06-20 15:28:24,683+03 INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-5-thread-7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Running command: MergeExtendCommand internal: true. Entities affected : I D: ab217695-7e38-4684-90c7-9c356b941caa Type: Storage 2017-06-20 15:28:24,683+03 INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-5-thread-7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Base and top image sizes are the same; no image size update required 2017-06-20 15:28:24,687+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Executing Live Merge command step 'EXTE ND' 2017-06-20 15:28:24,773+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Executing Live Merge command step 'EXTE ND' . RESTARTED VDSM HERE!!! . 2017-06-20 15:28:56,738+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (DefaultQuartzScheduler3) [194425b9-62e1-4e8f-8283-61b991d62b7d] Command 'RemoveSnapshotSingleDiskLive' (id: 'a85586e7-7c19-4b58-b49b-dd01d9750f0f') waiting on child command id: '0621d23d-3727-4342-a93a-7ad877d976a6' type:'MergeStatus' to complete VDSM.LOG: -------------------------- 2017-06-20 15:28:26,734+0300 INFO (jsonrpc/7) [virt.vm] (vmId='f6737775-556c-41e1-a28d-57bf4aa68870') Starting merge with jobUUID=u'fbe8a053-d8dc-4513-bef1-c772572daf3d', original chain=f19841f6-54ba-4cfd-9a20-af63194f4f70 < e4825cca-0ad1-4f5f-9eb3-401ccce36259 < b0980700-ecc2-456d-989e-6f6c24c949f9 < d1fb4a1e-57e6-4088-b1ff-40fab92f1209 < e98beb7e-211d-4e02-b97b-7a7c4c62d2a9 (top), disk='sda', base='sda[3]', top='sda[2]', bandwidth=0, flags=8 (vm:5086) . . . Kevin,
The original bug describes a use case where in the engine log we see:
2017-04-25 19:58:21,515 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [53366d34] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=-32603, message=No such drive: '{'domainID': u'02a218cc-de64-4911-8d98-4128f6937c02', 'imageID': u'e6582210-11d1-4ddd-b9d2-
That might be caused due to the following error seen at Vdsm log:
libvirtError: block copy still active: disk 'vda' not ready for pivot yet
In the logs you attached, I see neither of these errors.
We may be hitting a different issue and if so, let's open a dedicated bug for it and provide a detailed and minimal set of steps to reproduce.
As this bug depends on BZ 1438850 that was already modified, and as we believe root cause of the two BZs is the same (LiveMerge fails with libvirtError: Block copy still active. Disk not ready for pivot), I am closing this bug. No additional info is needed here |
Description of problem: During a live merge, after the pivot had completed for one of four disks, a LookupError ("No such drive") occurred while checking the block job, but then the merge completed successfully. However, the merges for two of the other disks failed on the engine due to "No such drive", even though this error was for another disk. These two merges completed successfully on the host. The resultant status was that the original leaf volumes for these two disks did not get removed. Version-Release number of selected component (if applicable): RHV 4.0.7 RHVH 4.0-7.1 vdsm-4.18.24-3.el7 libvirt-2.0.0-10.el7_3.5 How reproducible: Not. Steps to Reproduce: 1. 2. 3. Actual results: Two "No such drive" LookupErrors occurred while checking a block job after the pivot had completed, causing the engine to "fail" merges for two other disks. Expected results: Even if LookupError occurs while checking the block job for one disk, the engine shouldn't fail live merges of other disks. Additional info: