Description of problem: If a VM with qcow2 disk went into paused mode because of lack of space in the storage domain and if the user tries to extend the disk, it will fail while doing the "qemu-img" resize. The engine will report it as "disk was successfully updated to 0 GB". When the disk "allocation" reaches near to the physical storage of the VM, the vdsm tried to extend the disk which got failed with the error below since the storage domain was not having enough space. === 2018-08-06 20:00:30,256+0530 ERROR (mailbox-spm/0) [storage.SPM.Messages.Extend] processRequest: Exception caught while trying to extend volume: 5083c58f-2f08-478e-9874-8761ac2d51c3 in domain: 0c975612-ce0a-49b7-9927-41d4c46538e7 (mailbox:172) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line 166, in processRequest pool.extendVolume(volume['domainID'], volume['volumeID'], size) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1420, in extendVolume sdCache.produce(sdUUID).extendVolume(volumeUUID, size, isShuttingDown) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1564, in extendVolume lvm.extendLV(self.sdUUID, volumeUUID, size) # , isShuttingDown) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1232, in extendLV % (vgName, lvName, free_extents, needed_extents)) VolumeGroupSizeError: Volume Group not big enough: ('Not enough free extents for extending LV 0c975612-ce0a-49b7-9927-41d4c46538e7/5083c58f-2f08-478e-9874-8761ac2d51c3 (free=4, needed=8)',) === The VM went into paused mode after this with ENOSPC. === 2018-08-06 20:00:13,825+0530 INFO (libvirt/events) [virt.vm] (vmId='0f7ccc40-35b3-4535-8845-d9a2641e51ba') abnormal vm stop device ua-96c03262-118e-4ea9-a277-054b07965f9d error enospc (vm:5075) === The VM was made down and some disks were cleaned up to get some free space in the storage domain. After that, the disk was extended by 20 GB which got failed with the error below. === 2018-08-06 20:15:39,806+0530 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='f6c6b7b7-a627-4c26-81b7-97ae6b5fd9b7') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in updateVolumeSize File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 758, in updateVolumeSize qemuimg.resize(volPath, newSizeBytes, qemuImgFormat) File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 333, in resize _run_cmd(cmd) File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 409, in _run_cmd raise cmdutils.Error(cmd, rc, out, err) Error: Command ['/usr/bin/qemu-img', 'resize', '-f', 'qcow2', u'/rhev/data-center/mnt/blockSD/0c975612-ce0a-49b7-9927-41d4c46538e7/images/96c03262-118e-4ea9-a277-054b07965f9d/5083c58f-2f08-478e-9874-8761ac2d51c3', '21474836480'] failed with rc=1 out='' err='qemu-img: Failed to grow the L1 table: No space left on device\n' === Since it's a qcow2 disk, the lv will not be extended. However, the image may need more space to allocate new L1/refcount table. Both the database and engine metadata is updated with 0 GB size for the disk. === engine=# select log_time,message from audit_log where message like '%GlanceDisk%' order by log_time desc; log_time | message ----------------------------+-------------------------------------------------------------------------------------------------- 2018-08-06 10:56:19.048-04 | VM rhel_vm GlanceDisk-9a3800a disk was updated by admin@internal-authz. 2018-08-06 10:56:18.999-04 | Size of the disk 'GlanceDisk-9a3800a' was successfully updated to 0 GB by admin@internal-authz. engine=# select image_guid,size from images where image_guid = '5083c58f-2f08-478e-9874-8761ac2d51c3'; image_guid | size --------------------------------------+------ 5083c58f-2f08-478e-9874-8761ac2d51c3 | 0 (1 row) dd if=/dev/0c975612-ce0a-49b7-9927-41d4c46538e7/metadata bs=512 skip=5 count=1 DOMAIN=0c975612-ce0a-49b7-9927-41d4c46538e7 CTIME=1533565674 FORMAT=COW DISKTYPE=DATA LEGALITY=LEGAL SIZE=0 VOLTYPE=LEAF DESCRIPTION= IMAGE=96c03262-118e-4ea9-a277-054b07965f9d PUUID=99d38ffe-806a-4631-a28f-61196d6b5147 MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF === This is because we are setting the size to 0 before "qemuimg.resize" and the size is reset back to the original image size (obtained from qemuimg.info) only if the "qemuimg.resize" is a success. === lib/vdsm/storage/hsm.py 756 # Uncommit the current size 757 volToExtend.setSize(0) 758 qemuimg.resize(volPath, newSizeBytes, qemuImgFormat) 759 roundedSizeBytes = qemuimg.info(volPath, 760 qemuImgFormat)['virtualsize'] 761 finally: 762 volToExtend.teardown(sdUUID, volUUID) 763 764 volToExtend.setSize( 765 (roundedSizeBytes + SECTOR_SIZE - 1) / SECTOR_SIZE) 766 767 return dict(size=str(roundedSizeBytes)) ==== The portal will also shows the disk size as 0 GB. Version-Release number of selected component (if applicable): rhvm-4.2.3.5-0.1.el7.noarch vdsm-4.20.27.1-1.el7ev.x86_64 How reproducible: Observed in the customer environment running 4.1. I was able to reproduce this in my lab environment running 4.2 multiple times although it's not always reproducible. I think it depends upon how much qcow2 disk is allocated vs the size of the lv. So if we extend the image, it looks like it need to create new L1/refcount tables and fails if it LV doesn't have enough space. Steps to Reproduce: 1. Start a VM with qcow2 disk. 2. Decrease the amount of free space in storage domain to 1 GB by adding some other pre-allocated disks. 3. Start writing the data from the VM using dd command. The VM will go into paused mode. 4. Shutdown the VM, get more space on storage and try to extend the disk. It would probably fail with the above error. Actual results: Extending the disk size got failed and the size was set to 0 GB in the database and metadata. Expected results: I think we should detect that the image needs more backend LV size and extend accordingly. Or we should capture the error correctly and reflect that in manager portal and the size shouldn't be set to 0. Additional info:
Nijin, Can you please supply also the engine log?
(In reply to Eyal Shenitzky from comment #2) > Nijin, > > Can you please supply also the engine log? I was unable to reproduce the bug with the following steps: 1) Create a VM with 2 disks (OS, empty thin-provisioned block-based disk) 2) Create a large preallocated block-based disk that filled the storage domain (only 1 GB remained) 3) Run the VM 4) Start dd operation (4 GB) to the empty thin-provisioned block-based disk 5) When the VM entered to pause mode, shut down the VM 6) Remove the large preallocated disk 7) Select to extend the empty thin-provisioned block-based disk Actual results: Managed to extend the disk, VM started and works well. Am I missing something?
(In reply to Eyal Shenitzky from comment #3) > (In reply to Eyal Shenitzky from comment #2) > > Nijin, > > > > Can you please supply also the engine log? > > I was unable to reproduce the bug with the following steps: > > 1) Create a VM with 2 disks (OS, empty thin-provisioned block-based disk) > 2) Create a large preallocated block-based disk that filled the storage > domain (only 1 GB remained) > 3) Run the VM > 4) Start dd operation (4 GB) to the empty thin-provisioned block-based disk > 5) When the VM entered to pause mode, shut down the VM > 6) Remove the large preallocated disk > 7) Select to extend the empty thin-provisioned block-based disk > > Actual results: > Managed to extend the disk, VM started and works well. > > Am I missing something? I tried again and I was able to reproduce it. Followed the same steps as yours. Created a 3 GB thin provisioned disk and attached to the VM. The actual size is 1 GB and the virtual size is 3 GB. Add a large pre-allocated disk in the storage domain. Storage domain now only has < 1 GB space. Started writing data to the disk using command "dd if=/dev/urandom bs=1M of=/dev/sdb" from the VM. VM went to the paused mode and I powered it down after that. Reclaimed the space in storage domain by deleting the large pre-allocated disk. Extended the thin provisioned disk by 10 GB. Failed with the error below. === 2018-09-05 11:54:37,908+0530 ERROR (jsonrpc/4) [storage.Dispatcher] FINISH updateVolumeSize error=Command ['/usr/bin/qemu-img', 'resize', '-f', 'qcow2', u'/rhev/data-center/mnt/blockSD/c21260fa-bc27-4bd3-9841-99da4e2d0402/images/81209358-1712-4fd9-a2ff-d41c7f19b7c8/2849de7f-fc48-40bc-af2f-a62ed008e9ba', '13958643712'] failed with rc=1 out='' err='qemu-img: Failed to grow the L1 table: No space left on device\n' (dispatcher:86) 2018-09-05 04:52:41,629-04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3355) [] EVENT_ID: USER_EXTEND_DISK_SIZE_SUCCESS(371), Size of the disk 'nijin_bz1612978_Disk1' was successfully updated to 0 GB by admin@internal-authz. === Tried 4 times with the steps above and was able to reproduce the issue every time. This particular VM is available in my environment and an extension to this disk will always fail with the mentioned error. Please let me know if you need the credential to my test environment if you want to have a look. Attaching both engine and vdsm log.
Thanks, I managed to reproduce it also. When you try to extend the disk for the second time, does it works for you?
We need to understand why resize behave in this way. Looks like the author of this code wanted to keep the disk in invalid state if qemu-img resize failed. I think this behavior exists in several places in vdsm. This behavior looks useless when resize failed because we don't have enough space, but I'm not sure that we get useful error code from qemu-img about the failure, so we may not be able to detect this case. Kevin, can we use an image safely after qemu-img resize fail? Can we detect errors meaning that the disk is corrupted and should not be used, or errors like ENOSPC which should be safe?
To be honest, I don't understand the whole scenario to begin with. If you get ENOSPC on the host and the VM is stopped, then you need to make space for the image file to grow, not extend the virtual disk size. Obviously, if you already got ENOSPC, there is no space left to store new data in the image, so increasing the virtual disk size will fail as well. What you really need to do is extend the LV the image is stored on. Second, maybe even more importantly, the 'qemu-img resize' operation in this context isn't only misguided, but actively dangerous: The image is already opened read-write by the VM, and doing a 'qemu-img resize' on it sounds like a great way to corrupt it. For local files, image locking should prevent this in current QEMU versions, but for everything else we have to rely on the user (or RHV) not to do stupid things. Finally, the size being reset to 0 seems to be a RHV-internal thing. I am pretty sure that 'qemu-img info' will still show the real size. As long as you didn't resume the VM afterwards (which has old metadata cached and would continue to use it), there are even chances that it's not completely corrupted. However, let me repeat that the possible corruption is not from the failed 'qemu-img resize' (this is always safe), but from opening the image read-write from two processes. When QEMU detects image corruption in a qcow2 image, it flags it as such and refuses to open it again until you repair it. This will be visible in the log files with messages like "Marking image as corrupt: ..." (when first detection corruption) or "qcow2: Image is corrupt; cannot be opened read/write" (when trying to open an image that is marked as corrupt). So QEMU already takes care that an image that shouldn't be used really isn't used.
Thanks for the quick reply Kevin! We do not use qemu-img resize on a live image. When resizing live image we use the proper libvirt api. The flow is: - set image size in RHV metadata to 0 - call qemu-img resize - update image size in RHV metadata to actual size after resize Based comment 8, qemu-img resize is always safe, and qemu is already handling bad images, so I think we should change the resize flow to: - call qemu-img resize - update image size in RHV metadata to actual size Same flow exists when resizing live image, and we also set image size to 0 before the operation. We should fix that flow as well.
(In reply to Kevin Wolf from comment #8) > To be honest, I don't understand the whole scenario to begin with. If you > get ENOSPC on the host and the VM is stopped, then you need to make space > for the image file to grow, not extend the virtual disk size. Obviously, if > you already got ENOSPC, there is no space left to store new data in the > image, so increasing the virtual disk size will fail as well. What you > really need to do is extend the LV the image is stored on. > The scenario is that 'qemu-img resize' failed after performing the following steps: 1) Create a VM with thin-provisioned block-based disk and run it 2) Create a new preallocated disk (that not related to the VM) that fill the block-based storage domain 4) Start dd operation to the empty thin-provisioned block-based disk -> it will cause to the VM to enter to 'pause' mode 5) When the VM entered to pause mode, shut down the VM 6) Remove the large preallocated disk -> make space for the disk resize 7) Select to extend the thin-provisioned block-based disk The question here is why the qemu-img resize failing? > Second, maybe even more importantly, the 'qemu-img resize' operation in this > context isn't only misguided, but actively dangerous: The image is already > opened read-write by the VM, and doing a 'qemu-img resize' on it sounds like > a great way to corrupt it. For local files, image locking should prevent > this in current QEMU versions, but for everything else we have to rely on > the user (or RHV) not to do stupid things. > > Finally, the size being reset to 0 seems to be a RHV-internal thing. I am > pretty sure that 'qemu-img info' will still show the real size. As long as > you didn't resume the VM afterwards (which has old metadata cached and would > continue to use it), there are even chances that it's not completely > corrupted. However, let me repeat that the possible corruption is not from > the failed 'qemu-img resize' (this is always safe), but from opening the > image read-write from two processes. > > When QEMU detects image corruption in a qcow2 image, it flags it as such and > refuses to open it again until you repair it. This will be visible in the > log files with messages like "Marking image as corrupt: ..." (when first > detection corruption) or "qcow2: Image is corrupt; cannot be opened > read/write" (when trying to open an image that is marked as corrupt). So > QEMU already takes care that an image that shouldn't be used really isn't > used.
(In reply to Eyal Shenitzky from comment #10) > The question here is why the qemu-img resize failing? If I understand the scenario correctly, it is failing because the block device on which the qcow2 image is stored is full, and you didn't resize that block device.
> If I understand the scenario correctly, it is failing because the block > device on which the qcow2 image is stored is full, and you didn't resize > that block device. But in step 6 the preallocated disk that filled the storage removed so the resize should succeed
Where did you resize the LV identified by /rhev/data-center/mnt/blockSD/0c975612-ce0a-49b7-9927-41d4c46538e7/images/96c03262-118e-4ea9-a277-054b07965f9d/5083c58f-2f08-478e-9874-8761ac2d51c3? I can't see this in any of your steps. You can remove as many other things as you want, as long as the LV is full, it doesn't help.
Eyal, can you create a reproduce script that will make your question more clear? You can create a block device using losetup, then create a vg and some lvs. Then simulate the flow using qemu-img on the lvs.
I am not sure how it will help, maybe I am missing the meaning of re-size. If the LV resides on a storage domain that has empty space (for e.g 10gb storage domain and 8gb LV) so the resize should succeed, and if the storage doesn't have the required space the re-size will be failed. So in our case, if I understand correctly after we 'cleaned' the storage domain and reclaimed the space from it we should be able to resize the disk and there is no reason for 'qemu-img resize' to fail.
Please take Nir's advice and create a reproducer script that can clarify the exact steps you're making on the QEMU level. Other than that, I can only repeat that 'qemu-img resize' resizes the virtual disk size as seen by the guest, not anything related to host storage. So if you got an ENOSPC from the host, 'qemu-img resize' is not a solution for the ENOSPC condition. Instead, you need to make sure that QEMU can write more data to the host storage. In the typical case, this means that the LV needs to be extended to allow the image file to grow. It's also thinkable that you get ENOSPC from an (overcommited) thin provisioned LV, then you need to make space for that thin provisioned LV to allocate new blocks. What is common to both cases is that the storage is freed at the LVM layer in the host, QEMU is not involved and can only resume the VM after you made more space available and the failed request can be retried.
The bug was reproduced by the following steps: 1) Create a VM with thin-provisioned block-based disk and run the VM. 2) Create a new preallocated disk (that not related to the VM) that fill the block-based storage domain. 4) Start dd operation to the empty thin-provisioned block-based disk. For 1-4 steps I ran the test case (automation):TestNoSpaceLeftOnStorageDomain 5) When the VM entered to pause mode, shut down the VM. 6) Remove the large preallocated disk. 7) Extend the thin-provisioned block-based disk. The fixing of this bug was to see if error "USER_FAILED_UPDATE_VM_DISK" appears in engine log. 2018-11-26 19:22:21,211+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-12681) [] EVENT_ID: USER_FAILED_UPDATE_VM_DISK(89), Failed to update VM vm_TestNoSpaceLeftOnStorag_2619094984 disk disk_TestNoSpaceLeftOnStorag_2619102434 (User: admin@internal-authz). Following this bug, I opened a new bug: https://bugzilla.redhat.com/show_bug.cgi?id=1643112 Bug 1643112 - The disk size return to 0 GB after resizing in the second time and the extend operation failed.
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ] For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ] For more info please contact: rhv-devops
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1077
sync2jira