Description of problem: On fast storage - XtreamIO - extending the actual disk size get an error "VM has been paused due to no Storage space error." How reproducible: always Steps to Reproduce: * Started a single VM * Run fio that writes 10 files of 1 GB each, then delete them. Actual results: * Before i started the fio actual disk size of the disk was 5 GB * when i start the fio at some point i get the error message : "VM has been paused due to no Storage space error." * After several seconds the VM is back online * The actual disk size of the disk is increasing by 10 GB * The fio writing and deleting is succeed * I have repeated the fio test 4 times, each time the same results with the error and actual disk size reaches 45 GB. * From the fifth time on, the error disappears and actual disk size stabilizes on 47 GB. Expected results: No error should appears Additional info: topology : storage - XtreamIO 1 host 1 engine 1 VM fio configuration : ( also tried with dd but the same results ) [global] rw=write size=1g directory=/home/fio_results thread=10 unified_rw_reporting=1 group_reporting iodepth=10 error at the vdsm log : libvirtEventLoop::INFO::2016-12-21 17:31:29,347::vm::4041::virt.vm:onIOError) vmId=`99866c78-ede5-4707-86e4-12c1d627fa0c`::abnormal vm stop device virtio-disk0 error enospc libvirtEventLoop::INFO::2016-12-21 17:31:29,347::vm::4877::virt.vm:_logGuestCpuStatus) vmId=`99866c78-ede5-4707-86e4-12c1d627fa0c`::CPU stopped: onIOError
The workaround is to change the relevant parameters of the watermark - WHEN we look to extend and by how much we extend. An interesting idea is to look if the storage is thinly provisioned - if it is, assume we can extend more aggressively.
Yaniv - we probably should document it if it's not already documented in some KB.
Note: also, having an initial bigger disk would have probably prevented this altogether (since the watermark would have reached later, and would have given VDSM even more time to extend).
From : (which is tightly coupled / duplicate of this one): [irs] volume_utilization_percent = 50 volume_utilization_chunk_mb = 1024 In the case above, the user can: 1. Extend the initial image size (makes sense anyway - say, 10G, not 1G) 2. Change the above threshold to lower and higher, respectively.
Can you validate that the tweaking in comment 8 resolves your issue?
I changed the thresholds at vdsm.conf to [irs] volume_utilization_percent = 15 volume_utilization_chunk_mb = 4048 The initial disk size was extended to 5 GB from 1 GB. After it i run the load, there where 3 extends, so the disk size grow to 17 GB. VM was not paused. So the configuration fix the issue, BTW, also found relevant history correspondent of the issue : http://lists.ovirt.org/pipermail/users/2016-May/039803.html
Perhaps the initial values are sub-optimal. If we initially have 1GB, and the watermark is on 50%, it means we have 500MB at most to be filled before we choke. Considering we may not even get the notification ASAP, we have ~5 seconds in a 100MB/sec write (which is not very very fast!) till we run out of space and pause. I suggest we change at least the extend size to 2GB or so, to give us room to breath - I don't expect the initial extend to be very quick, since I don't expect in real life scenario to grow so much right away (but perhaps I'm wrong - if it's data dump? if it's OS installation, the decompression of installation files ensures IO isn't that fast).
*** Bug 1461536 has been marked as a duplicate of this bug. ***
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
What was fixed changed here? Did you change defaults vdsm.conf? Please provide clear validation instructions ( Also I see comment 10 from 2017 that the conf change was already validated back then.)