Hide Forgot
Created attachment 994728 [details] vdsm and engine logs Description of problem: Performing dd to 2GB disk bs=1M count=2048 will cause the actual disk size will be 3GB Version-Release number of selected component (if applicable): vt13.11 How reproducible: 100% Steps to Reproduce: 1. create 2GB thin 2. dd to it 'dd if=/dev/urandom of=/dev/<disk_logical_name>' bs=1M count=2048' 3. Actual results: explained above Expected results: Additional info:
Nir, IIUC, https://gerrit.ovirt.org/#/c/38088/ should resolve this. Is this right?
(In reply to Allon Mureinik from comment #1) > Nir, IIUC, https://gerrit.ovirt.org/#/c/38088/ should resolve this. Is this > right? Right, moving these patches to this bug.
This example in the description is not a bug but expected behavior of the system. When working with qcow2 format, we need *more* space then the virtual size to allow the guest to use the entire virtual size of the disk. For example, if we create a 2G disk, qemu may need up to 2.2G for storing 2G of data on the device. The actual amount of additional space is tricky the compute, and vdsm is using an estimate of 10% when computing the size of qcow2 volumes. Currently vdsm extend chunk is 1G (or 2G during live storage migartion), so the disk is extended to 3G. If we limit the disk size to the virtual size (http://gerrit.ovirt.org/38088), and a vm is trying to fill up the disk, the vm will pause without a way to resume it, since qemu cannot complete the write operation. The current code allows such write to complete by extending the disk when the free space is bellow the configured watermark limit (default 512MB) We cannot change this behavior. For images under 10G, we can optimize the allocation and allocate less than one chunk (1G) but this is low priority change. The real bug here is that in 3.5, disk extend is *unlimited*. This is not a problem normal conditions, but it is a problem if the extend logic is broken, as seen in bug 1176673. Moving to ASSIGNED since the suggested patches are incorrect. We need first to fix the limit in master before we can port the fix to 3.5. Lowering severity as this is not an issue in normal conditions, and only a nice to have property.
The new patches should fix this issue correctly: 1. https://gerrit.ovirt.org/38178 change the limit to allow up to 10% extra allocation (rounded to next lv extent) 2. https://gerrit.ovirt.org/38179 avoid pointless extension requests required if we limit the disk size
Testing: Successful write: 1. Add second 1G disk to vm 2. On the guest, run dd if=/dev/zero of=/dev/vdb bs=8M count=128 The operation must succeed The disk should be extended to about 1.12G: vm should not pause Failed write: 1. Add second 1G disk to vm 2. On the guest, run dd if=/dev/zero of=/dev/vdb bs=8M count=129 The operation should fail in the guest with "No space left on device" The disk should be extended to about 1.12G: vm should not pause To check volume size, use lvm: pvscan --cache lvs vgname/lvname You can repeat both tests with bigger disk (e.g. 8G), writing more data (count=1024). The volume will be extended up to about 9G.
Nir, I removed the abandoned 3.5 backports. For 3.6, I see two patches on master that are merged. Is there anything else we're waiting for, or can this bug be moved to MODIFIED?
(In reply to Allon Mureinik from comment #7) > Nir, I removed the abandoned 3.5 backports. > For 3.6, I see two patches on master that are merged. > Is there anything else we're waiting for, or can this bug be moved to > MODIFIED? I think we are done.
Verified on oVirt 3.6.0.3 the qcow does not get extended beyond configured limit, though it's final size is bigger than the actual size reported.
(In reply to Nir Soffer from comment #4) > This example in the description is not a bug but expected behavior of the > system. > > When working with qcow2 format, we need *more* space then the virtual size > to allow the guest to use the entire virtual size of the disk. > > For example, if we create a 2G disk, qemu may need up to 2.2G for storing > 2G of data on the device. The actual amount of additional space is tricky > the compute, and vdsm is using an estimate of 10% when computing the size > of qcow2 volumes. Currently vdsm extend chunk is 1G (or 2G during live > storage migartion), so the disk is extended to 3G. > > If we limit the disk size to the virtual size > (http://gerrit.ovirt.org/38088), > and a vm is trying to fill up the disk, the vm will pause without a way > to resume it, since qemu cannot complete the write operation. > > The current code allows such write to complete by extending the disk when > the free space is bellow the configured watermark limit (default 512MB) > We cannot change this behavior. > > For images under 10G, we can optimize the allocation and allocate less than > one chunk (1G) but this is low priority change. > > The real bug here is that in 3.5, disk extend is *unlimited*. This is not > a problem normal conditions, but it is a problem if the extend logic is > broken, as seen in bug 1176673. > > Moving to ASSIGNED since the suggested patches are incorrect. We need first > to fix the limit in master before we can port the fix to 3.5. > > Lowering severity as this is not an issue in normal conditions, and only > a nice to have property. 1GB extension is acceptable - but in my case (rhev-3.2) its much larger. Virtual Machine which has been allocated a OS disk of 25GB (Thin), is showing that the "actual size" is 58GB. __Without Snapshot__ See snapshots attached.
Created attachment 1048665 [details] query
Created attachment 1048666 [details] no-snapshots
Created attachment 1048667 [details] single-disk
(In reply to Anand Nande from comment #10) > 1GB extension is acceptable - but in my case (rhev-3.2) > its much larger. > Virtual Machine which has been allocated a OS disk of 25GB (Thin), is > showing that the "actual size" is 58GB. __Without Snapshot__ This is a known issue in versions before 3.6. In 3.6, we limit the extend size to 1.1 * virtual size.
Customers question : Is there a way to reclaim this space?
(In reply to Anand Nande from comment #15) > Customers question : Is there a way to reclaim this space? You should be able to reclaim this, since the vm cannot use more then the virtual size. You can use lvm commands to shrink the lv to virtual size * 1.1 (lvm will round the value using 128MiB chunks). Assuming virtual disk size of 25GiB, we should resize to 27.5GiB (rounding up the next GiB for simplicity) lvreduce -L28G /dev/vgname/lvname Note that no other host should access this vg while you make this change. The safest way to do this is to shutdown all hosts that can access this storage domain. If you cannot allow downtime, you will have to stop the host running as spm and stop engine (so it cannot elect new spm) before doing this change. Please have good backup before doing this.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0376.html