Description of problem: This reminds me of BZ1796415, where the merge operation would fill the parent volume with data before it was extended, failing the merge. But here the problem is on Live Storage Migration. First the code, we see VDSM starting replication to dst volume on line 4304, and then later at line 4317 asking for extension of that volume to match the source volume. lib/vdsm/virt/vm.py 4267 def diskReplicateStart(self, srcDisk, dstDisk): ... 4296 try: 4297 diskType = replica.get("diskType") 4298 replica['path'] = self.cif.prepareVolumePath(replica) 4299 if diskType != replica["diskType"]: 4300 # Disk type was detected or modified when preparing the volume. 4301 # Persist it so migration can continue after vdsm crash. 4302 self._updateDiskReplica(drive) 4303 try: 4304 self._startDriveReplication(drive) ... 4314 if drive.chunked or drive.replicaChunked: 4315 try: 4316 capacity, alloc, physical = self.getExtendInfo(drive) 4317 self.extendDriveVolume(drive, drive.volumeID, physical, 4318 capacity) 4319 except Exception: 4320 self.log.exception("Initial extension request failed for %s", 4321 drive.name) 4322 4323 return {'status': doneCode} If the storage is fast and the VM is writing quickly, this can lead to failed LSM. I can somewhat reproduce this on lower end hardware by adding an artificial delay (30s) on the extension. Adding it on the same function delays the return of ReplicateStart to the engine but does the job of allowing more time between replication start and extension, so that even on slower storage it shows up. ~~~ 4314 time.sleep(30) 4315 if drive.chunked or drive.replicaChunked: 4316 try: 4317 capacity, alloc, physical = self.getExtendInfo(drive) 4318 self.extendDriveVolume(drive, drive.volumeID, physical, 4319 capacity) 4320 except Exception: 4321 self.log.exception("Initial extension request failed for %s", 4322 drive.name) 4323 4324 return {'status': doneCode} ~~~ Now the logs: 1. Engine sends ReplicateStart 2021-05-07 14:29:38,501+1000 INFO (jsonrpc/3) [api.virt] START diskReplicateStart(srcDisk={'imageID': '1bdd3140-acdf-4bbe-ab79-559a2af8b684', 'poolID': 'f280e0f2-adfa-11eb-97b9-5254000000ff', 'volumeID': '82422848-c417-438d-8df8-1ce059ae71db', 'device': 'disk', 'domainID': '7eb7cfce-ae07-4f41-91c4-b53e81973c90'}, dstDisk={'imageID': '1bdd3140-acdf-4bbe-ab79-559a2af8b684', 'poolID': 'f280e0f2-adfa-11eb-97b9-5254000000ff', 'volumeID': '82422848-c417-438d-8df8-1ce059ae71db', 'device': 'disk', 'domainID': '15cb0997-f933-49cf-a6b4-91b038f1715f'}) from=::ffff:192.168.100.253,43640, flow_id=6a3aac32-34be-445c-8900-9b9839c644a4, vmId=6c3b7f5c-31c1-48e2-be6d-a6614cdec12a (api:48) 2. VDSM processes that, and starts qemu disk replication through libvirt and the replication starts running 2021-05-07 04:29:39.184+0000: 56383: debug : qemuProcessHandleJobStatusChange:1011 : job 'copy-sda-libvirt-3-format'(domain: 0x7fd750010070,CentOS) state changed to 'created'(1) 3. 2 seconds later it fails: 2021-05-07 14:29:41,101+1000 ERROR (libvirt/events) [virt.vm] (vmId='6c3b7f5c-31c1-48e2-be6d-a6614cdec12a') Block job (untracked) type COPY for drive sda has failed (vm:5667) 4. Because it ran out of space on destination: 2021-05-07 04:29:40.900+0000: 56383: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"timestamp": {"seconds": 1620361780, "microseconds": 899090}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "copy-sda-libvirt-3-format", "len": 842399744, "offset": 267714560, "speed": 0, "type": "mirror", "error": "No space left on device"}}] 5. Extension request goes out, too late. 2021-05-07 14:30:09,221+1000 INFO (jsonrpc/3) [vdsm.api] START sendExtendMsg(spUUID='f280e0f2-adfa-11eb-97b9-5254000000ff', volDict={'domainID': '15cb0997-f933-49cf-a6b4-91b038f1715f', 'imageID': '1bdd3140-acdf-4bbe-ab79-559a2af8b684', 'name': 'sda', 'newSize': 4294967296, 'poolID': 'f280e0f2-adfa-11eb-97b9-5254000000ff', 'volumeID': '82422848-c417-438d-8df8-1ce059ae71db', 'clock': <Clock(total=0.00*, extend-replica=0.00*)>, 'callback': None}, newSize=4294967296, callbackFunc=<bound method Vm.__afterReplicaExtension of <vdsm.virt.vm.Vm object at 0x7f59c0025438>>) from=::ffff:192.168.100.253,43640, flow_id=6a3aac32-34be-445c-8900-9b9839c644a4, task_id=fc92ea4b-fa9c-464f-81c2-47178e9f1ab4 (api:48) Version-Release number of selected component (if applicable): vdsm-4.40.60.3-1.el8ev.x86_64 How reproducible: * Only managed to do it with artificial delay, my storage (single SSD) is not fast enough. Steps to Reproduce: 1. Add delay above or use some very fast storage 2. Make VM write to disk quickly 3. Start LSM to block storage Actual results: * Fails under load Expected results: * Succeeds under load
Nir, this should be fixed in 4.4.5 right?
(In reply to Eyal Shenitzky from comment #2) > Nir, this should be fixed in 4.4.5 right? The "reproducer" above with the sleep(30) was done on 4.4.6, the extension request and the replication run in parallel. I think you might be talking about BZ1796415, which is a similar problem on the merge command, here its the replicate command.
(In reply to Eyal Shenitzky from comment #2) > Nir, this should be fixed in 4.4.5 right? No, this is not fixed yet. I agree with Germano that this is similar to the issue we had in live merge. We are doing initial extend after the operation was started, and the operation can fail with ENOSPC before the volume was extended. But in this flow we have a much simpler solution - engine should create the target volume with initial_size=3g so we don't need the initial extend. Vdsm should change to extend only if the volume size is one was not extended yet (.e.g. working with older engine).
(In reply to Avihai from comment #7) > After discussion with Arik this bug might be resolved by > https://bugzilla.redhat.com/show_bug.cgi?id=2051997. > Can you confirm? The changes we did for bug 2051997 is to use bigger chunk size and extend earlier. This will help to extend the volumes during the mirror, but unfortunately this will not help with the first extend. According to Germano report, the mirror failed in 2.5 seconds: 2021-05-07 14:29:38,501+1000 Disk replication started 2021-05-07 14:29:41,101+1000 Disk replication failed Extend takes 2-6 seconds in normal conditions, or more if the host is overloaded with I/O. The issue seems to be that when mirror fails with ENOSPC, the job fail instead of pausing the vm: "event": "BLOCK_JOB_COMPLETED", "data": {"device": "copy-sda-libvirt-3-format", "len": 842399744, "offset": 267714560, "speed": 0, "type": "mirror", "error": "No space left on device"} Maybe it is possible to change libvirt/qemu so the vm will be paused and the mirror job will continue when the guest is unpaused, or maybe a new API can resume the mirror job. But getting such fixes for RHEL 8 is unlikely. I think the best way to avoid this issue is to do the first extend on engine side - engine already measure the top volume, and create the disk size with initial size: requiredSize += SizeConverter.BYTES_IN_GB; return Math.min(requiredSize, sourceImage.getSize()); When the source volume is new snapshot, required size is ~0 and the actual initial size is 1g - like a new snapshot created by vdsm. This is of course wrong for currnet vdsm, and should be 2.5g. When vdsm is during live storage migration, it double the chunk size, so the first extend will extend the volume to: 2.5g + 2 * 2.5g -> 7.5g To avoid the need to do the first extend in vdsm, engine will allocate this size for the new volume. To avoid double extend in vdsm, engine will send new argument to vdsm: extend: false Old vdsm will ignore this flag and try to extend the volume before the mirror. New vdsm will skip the initial extension since the volumes are already extended. The new behavior will be used only for cluster version >= 4.7.
(In reply to Shir Fishbain from comment #6) > Nir, please provide a clear verification scenario Germano "reproduced" the by adding 30 seconds sleep in vdsm. This is not a valid way to reproduce the real issue. To reproduce, you need: - fast storage (e.g. flash based iSCSI or FC stoarge) - if using iSCSI fast network to storage (e.g. 10g) - Run VM that writes lot of data to storage during merge we can use this script from bug 2051997 https://bugzilla-attachments.redhat.com/attachment.cgi?id=1862576 Steps to reproduce: 1. Create new 50g thin disk 2. Start writing script in vm, writing to the new disk 3. Start live storage migration to another storage domain on same server The live merge is likely to fail when staring to mirror the disk to the new storage (see comment 0 for example failure logs). You can repeat the live storage migration several times. With current system, creating the 1g destination volume, writing more than 100 MiB/s during live storage migration is likely to fail. I think you need to repeat the test multiple times increasing the write rate, until you find the limits of the current system. When this is fixed the limits should be similar the the limits of writing to thin disk without pausing. > Is it possible to reproduce this bug? It should be reproducible based on the code.
Vdsm PR posted, this is the easy part, but it must be done first so vdsm in oVirt 4.5 always support the option to skip the initial extend. The engine changes may be harder, and if we cannot get them in 4.5.0, they can be added in 4.5.1. On engine side, we need: 1. Add new configuration for volume chunk size, duplicating vdsm irs:volume_utilization_chunk_mb. Duplicating the configration is bad but until the configuration is moved to engine, we have no other way to use the right size for creating new volumes. 2. When creating the snapshot during live storage migration, create the snapshot with initial size of 3 * irs:volume_utilization_chunk_mb. currently the snapshot is created with default size of 1 * irs:volume_utilization_chunk_mb. 3. When creating the destination volume part of cloning image structure, create the active volume at the same initial size as the active volume in the source. (3 * chunk size). 4. Pass needExtend=false when calling VM.diskReplicateStart.
(In reply to Nir Soffer from comment #8) > Old vdsm will ignore this flag and try to extend the volume before the > mirror. > New vdsm will skip the initial extension since the volumes are already > extended. > > The new behavior will be used only for cluster version >= 4.7. Makes sense but if that new flow only happens on cluster version >= 4.7 then we don't care about old VDSM (In reply to Nir Soffer from comment #10) > On engine side, we need: > > 1. Add new configuration for volume chunk size, duplicating vdsm > irs:volume_utilization_chunk_mb. Duplicating the configration > is bad but until the configuration is moved to engine, we have > no other way to use the right size for creating new volumes. > > 2. When creating the snapshot during live storage migration, create > the snapshot with initial size of 3 * irs:volume_utilization_chunk_mb. > currently the snapshot is created with default size of > 1 * irs:volume_utilization_chunk_mb. > > 3. When creating the destination volume part of cloning image structure, > create the active volume at the same initial size as the active volume > in the source. (3 * chunk size). > > 4. Pass needExtend=false when calling VM.diskReplicateStart. IIRC the initial size of snapshots is only set on block storage but that's fine because this issue can only happen on block storage, right?
(In reply to Arik from comment #11) > IIRC the initial size of snapshots is only set on block storage but that's > fine because this issue can only happen on block storage, right? Oh wait, can we migrate a disk from file-storage to block-storage?
(In reply to Arik from comment #12) > (In reply to Arik from comment #11) > > IIRC the initial size of snapshots is only set on block storage but that's > > fine because this issue can only happen on block storage, right? > > Oh wait, can we migrate a disk from file-storage to block-storage? Sure, but in this case the initial size of the snapshot does not matter, since initial size is used only on block storage. Same applies to migrating from block to file - in this case the source snapshot size need to use the bigger initial size, but the destination snapshot size does not matter. On file storage the snapshot can be created without initial size, but even if engine will send the argument, it is ignored by vdsm.
*** Bug 1993839 has been marked as a duplicate of this bug. ***
Verified. Versions: vdsm-4.50.1.3-1.el8ev oviert-engine-4.5.1.2-0.11.el8ev Could reproduce it on a non fixed version (engine-4.4.10.7-0.4.el8ev) #VDSM: 2022-06-21 09:22:41,577+0000 ERROR (jsonrpc/1) [virt.vm] (vmId='9bf54187-017a-4dc5-91cd-43c514b20e1c') Replication job unfinished (drive: 'sda', srcDisk: {'imageID': '121514a0-6e33-4a8a-9285-76c0bd78434f', 'poolID': '98214c50-85b4-4c14-bd90-83c668aa553a', 'volumeID': 'a1146f5d-269e-4ada-96f1-7f3b4580f42c', 'device': 'disk', 'domainID': '7e30eb41-8018-4ea6-b02c-37326beb89b1'}, job: {'type': 2, 'bandwidth': 0, 'cur': 1794113536, 'end': 2778726400}) (vm:4499) #Engine: 2022-06-21 09:22:41,580Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-93) [2f7fecf9-fd37-44d9-829b-1525dd54e22a] Failed in 'VmReplicateDiskFinishVDS' method 2022-06-21 09:22:41,594Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-93) [2f7fecf9-fd37-44d9-829b-1525dd54e22a] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command VmReplicateDiskFinishVDS failed: Resource unavailable 2022-06-21 09:22:41,601Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-93) [2f7fecf9-fd37-44d9-829b-1525dd54e22a] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_2, VmReplicateDiskParameters:{hostId='1934590a-e3da-4c30-9fcf-e21223baa468', vmId='9bf54187-017a-4dc5-91cd-43c514b20e1c', storagePoolId='98214c50-85b4-4c14-bd90-83c668aa553a', srcStorageDomainId='7e30eb41-8018-4ea6-b02c-37326beb89b1', targetStorageDomainId='3f2141c0-b836-4922-8d35-ca2152184708', imageGroupId='121514a0-6e33-4a8a-9285-76c0bd78434f', imageId='a1146f5d-269e-4ada-96f1-7f3b4580f42c'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Resource unavailable, code = 40 #'needExtend' doesn't exist in this version on both vdsm and engine On the fixed version there were no errors in VDSM or engine logs. Engine: #imageInitialSizeInBytes='8053063680' 2022-06-21 12:45:04,239+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-49) [1326c67f-9b5e-4665-9574-5a5be8860229] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='02f48b00-bb6e-42af-885e-19f0675db775', ignoreFailoverLimit='false', storageDomainId='8c939c62-7d98-4e65-bbd8-d52f7ea469c1', imageGroupId='c1a1f9ca-1c15-48a3-8382-76be9e638d65', imageSizeInBytes='32212254720', volumeFormat='COW', newImageId='d1bfbb37-f05f-455d-baf2-25586313c420', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='8053063680', imageId='23aa16b7-30c2-4796-b42c-5f1de8250b9f', sourceImageGroupId='c1a1f9ca-1c15-48a3-8382-76be9e638d65', shouldAddBitmaps='false', legal='true', sequenceNumber='2', bitmap='null'}), log id: 727da77c #needExtend='false': 2022-06-21 12:45:40,982+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-72) [1326c67f-9b5e-4665-9574-5a5be8860229] START, VmReplicateDiskStartVDSCommand(HostName = host_mixed_2, VmReplicateDiskParameters:{hostId='06b8c39e-ad71-47c5-baae-0898cdfe2ed1', vmId='c38cd2be-6d12-46e0-9688-d7e40d6fb8e6', storagePoolId='02f48b00-bb6e-42af-885e-19f0675db775', srcStorageDomainId='8c939c62-7d98-4e65-bbd8-d52f7ea469c1', targetStorageDomainId='947e4aa9-85a6-4b28-8f10-dfbcb08e665c', imageGroupId='c1a1f9ca-1c15-48a3-8382-76be9e638d65', imageId='d1bfbb37-f05f-455d-baf2-25586313c420', diskType='null', needExtend='false'}), log id: 779863e7 #initialSize='8053063680: 2022-06-21 12:45:04,244+0300 INFO (jsonrpc/6) [vdsm.api] START createVolume(sdUUID='8c939c62-7d98-4e65-bbd8-d52f7ea469c1', spUUID='02f48b00-bb6e-42af-885e-19f0675db775', imgUUID='c1a1f9ca-1c15-48a3-8382-76be9e638d65', size='32212254720', volFormat=4, preallocate=2, diskType='DATA', volUUID='d1bfbb37-f05f-455d-baf2-25586313c420', desc='', srcImgUUID='c1a1f9ca-1c15-48a3-8382-76be9e638d65', srcVolUUID='23aa16b7-30c2-4796-b42c-5f1de8250b9f', initialSize='8053063680', addBitmaps=False, legal=True, sequence=2, bitmap=None) from=::ffff:10.46.12.196,36674, flow_id=1326c67f-9b5e-4665-9574-5a5be8860229, task_id=b3ea4d03-64a2-4ef4-9b78-8eee6307f7cf (api:48) #needExtend='false': 2022-06-21 12:45:40,987+0300 INFO (jsonrpc/7) [api.virt] START diskReplicateStart(srcDisk={'imageID': 'c1a1f9ca-1c15-48a3-8382-76be9e638d65', 'poolID': '02f48b00-bb6e-42af-885e-19f0675db775', 'volumeID': 'd1bfbb37-f05f-455d-baf2-25586313c420', 'device': 'disk', 'domainID': '8c939c62-7d98-4e65-bbd8-d52f7ea469c1'}, dstDisk={'imageID': 'c1a1f9ca-1c15-48a3-8382-76be9e638d65', 'poolID': '02f48b00-bb6e-42af-885e-19f0675db775', 'volumeID': 'd1bfbb37-f05f-455d-baf2-25586313c420', 'device': 'disk', 'domainID': '947e4aa9-85a6-4b28-8f10-dfbcb08e665c'}, needExtend=False) from=::ffff:10.46.12.196,39640, flow_id=1326c67f-9b5e-4665-9574-5a5be8860229, vmId=c38cd2be-6d12-46e0-9688-d7e40d6fb8e6 (api:48) Checked flows for: ISCSI -> ISCSI NFS -> ISCSI Thin provisioned disk (30G) and also a small one (1G)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5555
Due to QE capacity, we are not going to cover this issue in our automation