Bug 1679355
| Summary: | Left over volume after failed disk move | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | bill.james <bill.james> | ||||
| Component: | Gluster | Assignee: | Yaniv Kaul <ykaul> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | SATHEESARAN <sasundar> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | --- | CC: | bugs, sabose, tnisan | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-09-04 12:57:46 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Gluster | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
moved some more disks tonight. 15 worked fine, 1 failed. Main thing I need is to know how to clean up failed disks so I can try move again. since haven't heard anything I did some experimenting.
Mounted destination gluster volume via NFS and removed previous failed move (37db52be-89bb-4867-9854-97f215ecd3a2, awsnms).
Tried disk move again. Failed.
2019-02-27 14:06:40,304-08 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [a26c9ab5-425c-4d0d-8cfb-af5be48cb877] Command 'VmReplicateDiskFinishVDSCommand(HostName = ovirt9.j2noc.com, VmReplicateDiskParameters:{runAsync='true', hostId='ad1b0f7f-99b1-48eb-b113-acbc57ec280b', vmId='6cc68abc-4263-4b5f-81ab-c967fd4169e2', storagePoolId='00000001-0001-0001-0001-0000000002c5', srcStorageDomainId='22df0943-c131-4ed8-ba9c-05923afcf8e3', targetStorageDomainId='22df0943-c131-4ed8-ba9c-05923afcf8e3', imageGroupId='37db52be-89bb-4867-9854-97f215ecd3a2', imageId='426bd122-eb9c-4c2f-ac1f-229a0e207aec'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive replication error, code = 55
vdsm.log:
2019-02-27 14:06:39,116-0800 ERROR (jsonrpc/6) [virt.vm] (vmId='6cc68abc-4263-4b5f-81ab-c967fd4169e2')
Replication job not found (drive: 'vda', srcDisk: {u'device': u'disk', u'poolID': u'00000001-0001-000
1-0001-0000000002c5', u'volumeID': u'426bd122-eb9c-4c2f-ac1f-229a0e207aec', u'domainID': u'22df0943-c1
31-4ed8-ba9c-05923afcf8e3', u'imageID': u'37db52be-89bb-4867-9854-97f215ecd3a2'}, job: {}) (vm:3828)
2019-02-27 14:06:39,128-0800 INFO (jsonrpc/6) [vdsm.api] FINISH diskReplicateFinish return={'status':
{'message': 'Drive replication error', 'code': 55}} from=::ffff:10.144.110.101,52116, flow_id=a26c9ab
5-425c-4d0d-8cfb-af5be48cb877 (api:52)
Don't know why its calling ILLEGAL:
2019-02-27 14:06:56,473-0800 INFO (merge/6cc68abc) [vdsm.api] START imageSyncVolumeChain(sdUUID=u'22d
f0943-c131-4ed8-ba9c-05923afcf8e3', imgUUID=u'37db52be-89bb-4867-9854-97f215ecd3a2', volUUID=u'426bd12
2-eb9c-4c2f-ac1f-229a0e207aec', newChain=[u'11c58d32-be15-42aa-b782-657ca1510ccc']) from=internal, tas
k_id=70a9f22b-0f6f-4855-804e-2fb2912d8436 (api:46)
2019-02-27 14:06:56,554-0800 INFO (merge/6cc68abc) [storage.Image] Current chain=11c58d32-be15-42aa-b
782-657ca1510ccc < 426bd122-eb9c-4c2f-ac1f-229a0e207aec (top) (image:1266)
2019-02-27 14:06:56,554-0800 INFO (merge/6cc68abc) [storage.Image] Unlinking subchain: [u'426bd122-eb
9c-4c2f-ac1f-229a0e207aec'] (image:1276)
2019-02-27 14:06:56,570-0800 INFO (merge/6cc68abc) [storage.Image] Leaf volume 426bd122-eb9c-4c2f-ac1
f-229a0e207aec is being removed from the chain. Marking it ILLEGAL to prevent data corruption (image:1
284)
2019-02-27 14:06:56,570-0800 INFO (merge/6cc68abc) [storage.VolumeManifest] sdUUID=22df0943-c131-4ed8
-ba9c-05923afcf8e3 imgUUID=37db52be-89bb-4867-9854-97f215ecd3a2 volUUID = 426bd122-eb9c-4c2f-ac1f-229a
0e207aec legality = ILLEGAL (volume:398)
I'll try again later tonight with VM shutdown.
moving the VM when down worked fine. Is there some kind of limitation that oVirt can't move a disk image over say 15G while Vm is live or a 1G network connection? I tried moving a few more disks on live systems, all the ones over 20G failed after trying for between 40-70 minutes. Is your network saturated during the move? Tal, do you have anything to add? (In reply to Sahina Bose from comment #4) > Is your network saturated during the move? > Tal, do you have anything to add? No, most likely it's exactly that A similar bug exist also not on Gluster storage - bug 1520546 *** This bug has been marked as a duplicate of bug 1520546 *** |
Created attachment 1536879 [details] engine.log and vdsm from each gluster node. Description of problem: I tried moving some disks from one gluster volume to another. 8 worked, 6 failed. I can't retry the move because ovirt says: 2019-02-14 21:36:49,450-08 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler4) [2d9789d1] BaseAsyncTask::logEndTaskFailure: Task '2a0e703b-0239-41f8-a920 -50c1ae096590' (Parent Command 'CreateImagePlaceholder', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Volume already exists: ('d33e8048-a4b4-4b85-bf44-20be65b854f2',)', Version-Release number of selected component (if applicable): ovirt-engine-4.1.8.2-1.el7.centos.noarch glusterfs-3.8.15-2.el7.x86_64 vdsm-4.19.43-1.el7.centos.x86_64 How reproducible: not sure, 8 worked, 6 failed. Not sure why some failed. Steps to Reproduce: 1. create VM with disk volume in Gluster storage domain 2. Move disk to a different Gluster storage domain with VM running. 3. Actual results: 2019-02-14 21:34:03,079-08 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQ uartzScheduler10) [7adfa09d-d0d6-4478-9a6a-c505535e325b] Command 'LiveMigrateVmDisks' id: '4e1628ad-3396-48 6d-ae69-5702a0173e6f' child commands '[6e15830b-3fb6-42a6-bb36-3251f8fd8c25, 2984f152-666f-4862-bbf2-a36ce7 bd1985, 15606702-7708-4bac-8039-6c63005518e4]' executions were completed, status 'FAILED' Expected results: Successful move of disk. Additional info: