Bug 1958032
Summary: | Live Storage Migration fails because replication filled the destination volume before extension. | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> |
Component: | ovirt-engine | Assignee: | Pavel Bar <pbar> |
Status: | CLOSED ERRATA | QA Contact: | sshmulev |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4.6 | CC: | aefrat, ahadas, apinnick, bzlotnik, dfodor, eshames, lsurette, mavital, nsoffer, pbar, sfishbai, srevivo, ycui |
Target Milestone: | ovirt-4.5.1 | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.5.1.2 | Doc Type: | Bug Fix |
Doc Text: |
Previously, live storage migration could fail if the destination volume filled up before it was extended. In the current release, the initial size of the destination volume is larger and the extension is no longer required.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-07-14 12:54:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Germano Veit Michel
2021-05-07 04:53:51 UTC
Nir, this should be fixed in 4.4.5 right? (In reply to Eyal Shenitzky from comment #2) > Nir, this should be fixed in 4.4.5 right? The "reproducer" above with the sleep(30) was done on 4.4.6, the extension request and the replication run in parallel. I think you might be talking about BZ1796415, which is a similar problem on the merge command, here its the replicate command. (In reply to Eyal Shenitzky from comment #2) > Nir, this should be fixed in 4.4.5 right? No, this is not fixed yet. I agree with Germano that this is similar to the issue we had in live merge. We are doing initial extend after the operation was started, and the operation can fail with ENOSPC before the volume was extended. But in this flow we have a much simpler solution - engine should create the target volume with initial_size=3g so we don't need the initial extend. Vdsm should change to extend only if the volume size is one was not extended yet (.e.g. working with older engine). (In reply to Avihai from comment #7) > After discussion with Arik this bug might be resolved by > https://bugzilla.redhat.com/show_bug.cgi?id=2051997. > Can you confirm? The changes we did for bug 2051997 is to use bigger chunk size and extend earlier. This will help to extend the volumes during the mirror, but unfortunately this will not help with the first extend. According to Germano report, the mirror failed in 2.5 seconds: 2021-05-07 14:29:38,501+1000 Disk replication started 2021-05-07 14:29:41,101+1000 Disk replication failed Extend takes 2-6 seconds in normal conditions, or more if the host is overloaded with I/O. The issue seems to be that when mirror fails with ENOSPC, the job fail instead of pausing the vm: "event": "BLOCK_JOB_COMPLETED", "data": {"device": "copy-sda-libvirt-3-format", "len": 842399744, "offset": 267714560, "speed": 0, "type": "mirror", "error": "No space left on device"} Maybe it is possible to change libvirt/qemu so the vm will be paused and the mirror job will continue when the guest is unpaused, or maybe a new API can resume the mirror job. But getting such fixes for RHEL 8 is unlikely. I think the best way to avoid this issue is to do the first extend on engine side - engine already measure the top volume, and create the disk size with initial size: requiredSize += SizeConverter.BYTES_IN_GB; return Math.min(requiredSize, sourceImage.getSize()); When the source volume is new snapshot, required size is ~0 and the actual initial size is 1g - like a new snapshot created by vdsm. This is of course wrong for currnet vdsm, and should be 2.5g. When vdsm is during live storage migration, it double the chunk size, so the first extend will extend the volume to: 2.5g + 2 * 2.5g -> 7.5g To avoid the need to do the first extend in vdsm, engine will allocate this size for the new volume. To avoid double extend in vdsm, engine will send new argument to vdsm: extend: false Old vdsm will ignore this flag and try to extend the volume before the mirror. New vdsm will skip the initial extension since the volumes are already extended. The new behavior will be used only for cluster version >= 4.7. (In reply to Shir Fishbain from comment #6) > Nir, please provide a clear verification scenario Germano "reproduced" the by adding 30 seconds sleep in vdsm. This is not a valid way to reproduce the real issue. To reproduce, you need: - fast storage (e.g. flash based iSCSI or FC stoarge) - if using iSCSI fast network to storage (e.g. 10g) - Run VM that writes lot of data to storage during merge we can use this script from bug 2051997 https://bugzilla-attachments.redhat.com/attachment.cgi?id=1862576 Steps to reproduce: 1. Create new 50g thin disk 2. Start writing script in vm, writing to the new disk 3. Start live storage migration to another storage domain on same server The live merge is likely to fail when staring to mirror the disk to the new storage (see comment 0 for example failure logs). You can repeat the live storage migration several times. With current system, creating the 1g destination volume, writing more than 100 MiB/s during live storage migration is likely to fail. I think you need to repeat the test multiple times increasing the write rate, until you find the limits of the current system. When this is fixed the limits should be similar the the limits of writing to thin disk without pausing. > Is it possible to reproduce this bug? It should be reproducible based on the code. Vdsm PR posted, this is the easy part, but it must be done first so vdsm in oVirt 4.5 always support the option to skip the initial extend. The engine changes may be harder, and if we cannot get them in 4.5.0, they can be added in 4.5.1. On engine side, we need: 1. Add new configuration for volume chunk size, duplicating vdsm irs:volume_utilization_chunk_mb. Duplicating the configration is bad but until the configuration is moved to engine, we have no other way to use the right size for creating new volumes. 2. When creating the snapshot during live storage migration, create the snapshot with initial size of 3 * irs:volume_utilization_chunk_mb. currently the snapshot is created with default size of 1 * irs:volume_utilization_chunk_mb. 3. When creating the destination volume part of cloning image structure, create the active volume at the same initial size as the active volume in the source. (3 * chunk size). 4. Pass needExtend=false when calling VM.diskReplicateStart. (In reply to Nir Soffer from comment #8) > Old vdsm will ignore this flag and try to extend the volume before the > mirror. > New vdsm will skip the initial extension since the volumes are already > extended. > > The new behavior will be used only for cluster version >= 4.7. Makes sense but if that new flow only happens on cluster version >= 4.7 then we don't care about old VDSM (In reply to Nir Soffer from comment #10) > On engine side, we need: > > 1. Add new configuration for volume chunk size, duplicating vdsm > irs:volume_utilization_chunk_mb. Duplicating the configration > is bad but until the configuration is moved to engine, we have > no other way to use the right size for creating new volumes. > > 2. When creating the snapshot during live storage migration, create > the snapshot with initial size of 3 * irs:volume_utilization_chunk_mb. > currently the snapshot is created with default size of > 1 * irs:volume_utilization_chunk_mb. > > 3. When creating the destination volume part of cloning image structure, > create the active volume at the same initial size as the active volume > in the source. (3 * chunk size). > > 4. Pass needExtend=false when calling VM.diskReplicateStart. IIRC the initial size of snapshots is only set on block storage but that's fine because this issue can only happen on block storage, right? (In reply to Arik from comment #11) > IIRC the initial size of snapshots is only set on block storage but that's > fine because this issue can only happen on block storage, right? Oh wait, can we migrate a disk from file-storage to block-storage? (In reply to Arik from comment #12) > (In reply to Arik from comment #11) > > IIRC the initial size of snapshots is only set on block storage but that's > > fine because this issue can only happen on block storage, right? > > Oh wait, can we migrate a disk from file-storage to block-storage? Sure, but in this case the initial size of the snapshot does not matter, since initial size is used only on block storage. Same applies to migrating from block to file - in this case the source snapshot size need to use the bigger initial size, but the destination snapshot size does not matter. On file storage the snapshot can be created without initial size, but even if engine will send the argument, it is ignored by vdsm. *** Bug 1993839 has been marked as a duplicate of this bug. *** Verified. Versions: vdsm-4.50.1.3-1.el8ev oviert-engine-4.5.1.2-0.11.el8ev Could reproduce it on a non fixed version (engine-4.4.10.7-0.4.el8ev) #VDSM: 2022-06-21 09:22:41,577+0000 ERROR (jsonrpc/1) [virt.vm] (vmId='9bf54187-017a-4dc5-91cd-43c514b20e1c') Replication job unfinished (drive: 'sda', srcDisk: {'imageID': '121514a0-6e33-4a8a-9285-76c0bd78434f', 'poolID': '98214c50-85b4-4c14-bd90-83c668aa553a', 'volumeID': 'a1146f5d-269e-4ada-96f1-7f3b4580f42c', 'device': 'disk', 'domainID': '7e30eb41-8018-4ea6-b02c-37326beb89b1'}, job: {'type': 2, 'bandwidth': 0, 'cur': 1794113536, 'end': 2778726400}) (vm:4499) #Engine: 2022-06-21 09:22:41,580Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-93) [2f7fecf9-fd37-44d9-829b-1525dd54e22a] Failed in 'VmReplicateDiskFinishVDS' method 2022-06-21 09:22:41,594Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-93) [2f7fecf9-fd37-44d9-829b-1525dd54e22a] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command VmReplicateDiskFinishVDS failed: Resource unavailable 2022-06-21 09:22:41,601Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-93) [2f7fecf9-fd37-44d9-829b-1525dd54e22a] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_2, VmReplicateDiskParameters:{hostId='1934590a-e3da-4c30-9fcf-e21223baa468', vmId='9bf54187-017a-4dc5-91cd-43c514b20e1c', storagePoolId='98214c50-85b4-4c14-bd90-83c668aa553a', srcStorageDomainId='7e30eb41-8018-4ea6-b02c-37326beb89b1', targetStorageDomainId='3f2141c0-b836-4922-8d35-ca2152184708', imageGroupId='121514a0-6e33-4a8a-9285-76c0bd78434f', imageId='a1146f5d-269e-4ada-96f1-7f3b4580f42c'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Resource unavailable, code = 40 #'needExtend' doesn't exist in this version on both vdsm and engine On the fixed version there were no errors in VDSM or engine logs. Engine: #imageInitialSizeInBytes='8053063680' 2022-06-21 12:45:04,239+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-49) [1326c67f-9b5e-4665-9574-5a5be8860229] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='02f48b00-bb6e-42af-885e-19f0675db775', ignoreFailoverLimit='false', storageDomainId='8c939c62-7d98-4e65-bbd8-d52f7ea469c1', imageGroupId='c1a1f9ca-1c15-48a3-8382-76be9e638d65', imageSizeInBytes='32212254720', volumeFormat='COW', newImageId='d1bfbb37-f05f-455d-baf2-25586313c420', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='8053063680', imageId='23aa16b7-30c2-4796-b42c-5f1de8250b9f', sourceImageGroupId='c1a1f9ca-1c15-48a3-8382-76be9e638d65', shouldAddBitmaps='false', legal='true', sequenceNumber='2', bitmap='null'}), log id: 727da77c #needExtend='false': 2022-06-21 12:45:40,982+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-72) [1326c67f-9b5e-4665-9574-5a5be8860229] START, VmReplicateDiskStartVDSCommand(HostName = host_mixed_2, VmReplicateDiskParameters:{hostId='06b8c39e-ad71-47c5-baae-0898cdfe2ed1', vmId='c38cd2be-6d12-46e0-9688-d7e40d6fb8e6', storagePoolId='02f48b00-bb6e-42af-885e-19f0675db775', srcStorageDomainId='8c939c62-7d98-4e65-bbd8-d52f7ea469c1', targetStorageDomainId='947e4aa9-85a6-4b28-8f10-dfbcb08e665c', imageGroupId='c1a1f9ca-1c15-48a3-8382-76be9e638d65', imageId='d1bfbb37-f05f-455d-baf2-25586313c420', diskType='null', needExtend='false'}), log id: 779863e7 #initialSize='8053063680: 2022-06-21 12:45:04,244+0300 INFO (jsonrpc/6) [vdsm.api] START createVolume(sdUUID='8c939c62-7d98-4e65-bbd8-d52f7ea469c1', spUUID='02f48b00-bb6e-42af-885e-19f0675db775', imgUUID='c1a1f9ca-1c15-48a3-8382-76be9e638d65', size='32212254720', volFormat=4, preallocate=2, diskType='DATA', volUUID='d1bfbb37-f05f-455d-baf2-25586313c420', desc='', srcImgUUID='c1a1f9ca-1c15-48a3-8382-76be9e638d65', srcVolUUID='23aa16b7-30c2-4796-b42c-5f1de8250b9f', initialSize='8053063680', addBitmaps=False, legal=True, sequence=2, bitmap=None) from=::ffff:10.46.12.196,36674, flow_id=1326c67f-9b5e-4665-9574-5a5be8860229, task_id=b3ea4d03-64a2-4ef4-9b78-8eee6307f7cf (api:48) #needExtend='false': 2022-06-21 12:45:40,987+0300 INFO (jsonrpc/7) [api.virt] START diskReplicateStart(srcDisk={'imageID': 'c1a1f9ca-1c15-48a3-8382-76be9e638d65', 'poolID': '02f48b00-bb6e-42af-885e-19f0675db775', 'volumeID': 'd1bfbb37-f05f-455d-baf2-25586313c420', 'device': 'disk', 'domainID': '8c939c62-7d98-4e65-bbd8-d52f7ea469c1'}, dstDisk={'imageID': 'c1a1f9ca-1c15-48a3-8382-76be9e638d65', 'poolID': '02f48b00-bb6e-42af-885e-19f0675db775', 'volumeID': 'd1bfbb37-f05f-455d-baf2-25586313c420', 'device': 'disk', 'domainID': '947e4aa9-85a6-4b28-8f10-dfbcb08e665c'}, needExtend=False) from=::ffff:10.46.12.196,39640, flow_id=1326c67f-9b5e-4665-9574-5a5be8860229, vmId=c38cd2be-6d12-46e0-9688-d7e40d6fb8e6 (api:48) Checked flows for: ISCSI -> ISCSI NFS -> ISCSI Thin provisioned disk (30G) and also a small one (1G) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5555 Due to QE capacity, we are not going to cover this issue in our automation |