Bug 1408594
| Summary: | VM would pause if it writes faster than the time it takes to lvextend its LV (happens with qcow2 over fast block storage) | ||
|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | guy chen <guchen> |
| Component: | General | Assignee: | Benny Zlotnik <bzlotnik> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | mlehrer |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | --- | CC: | aefrat, bugs, dagur, ebenahar, frolland, guchen, lsurette, lsvaty, mlehrer, srevivo, tnisan, ycui |
| Target Milestone: | ovirt-4.3.3 | Keywords: | Performance |
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.3+
|
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: |
Please add the relevant info on fine tuning to the documentation
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-04-29 13:57:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The workaround is to change the relevant parameters of the watermark - WHEN we look to extend and by how much we extend. An interesting idea is to look if the storage is thinly provisioned - if it is, assume we can extend more aggressively. Yaniv - we probably should document it if it's not already documented in some KB. Note: also, having an initial bigger disk would have probably prevented this altogether (since the watermark would have reached later, and would have given VDSM even more time to extend). From : (which is tightly coupled / duplicate of this one): [irs] volume_utilization_percent = 50 volume_utilization_chunk_mb = 1024 In the case above, the user can: 1. Extend the initial image size (makes sense anyway - say, 10G, not 1G) 2. Change the above threshold to lower and higher, respectively. Can you validate that the tweaking in comment 8 resolves your issue? I changed the thresholds at vdsm.conf to [irs] volume_utilization_percent = 15 volume_utilization_chunk_mb = 4048 The initial disk size was extended to 5 GB from 1 GB. After it i run the load, there where 3 extends, so the disk size grow to 17 GB. VM was not paused. So the configuration fix the issue, BTW, also found relevant history correspondent of the issue : http://lists.ovirt.org/pipermail/users/2016-May/039803.html Perhaps the initial values are sub-optimal. If we initially have 1GB, and the watermark is on 50%, it means we have 500MB at most to be filled before we choke. Considering we may not even get the notification ASAP, we have ~5 seconds in a 100MB/sec write (which is not very very fast!) till we run out of space and pause. I suggest we change at least the extend size to 2GB or so, to give us room to breath - I don't expect the initial extend to be very quick, since I don't expect in real life scenario to grow so much right away (but perhaps I'm wrong - if it's data dump? if it's OS installation, the decompression of installation files ensures IO isn't that fast). *** Bug 1461536 has been marked as a duplicate of this bug. *** This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. What was fixed changed here? Did you change defaults vdsm.conf? Please provide clear validation instructions ( Also I see comment 10 from 2017 that the conf change was already validated back then.) |
Description of problem: On fast storage - XtreamIO - extending the actual disk size get an error "VM has been paused due to no Storage space error." How reproducible: always Steps to Reproduce: * Started a single VM * Run fio that writes 10 files of 1 GB each, then delete them. Actual results: * Before i started the fio actual disk size of the disk was 5 GB * when i start the fio at some point i get the error message : "VM has been paused due to no Storage space error." * After several seconds the VM is back online * The actual disk size of the disk is increasing by 10 GB * The fio writing and deleting is succeed * I have repeated the fio test 4 times, each time the same results with the error and actual disk size reaches 45 GB. * From the fifth time on, the error disappears and actual disk size stabilizes on 47 GB. Expected results: No error should appears Additional info: topology : storage - XtreamIO 1 host 1 engine 1 VM fio configuration : ( also tried with dd but the same results ) [global] rw=write size=1g directory=/home/fio_results thread=10 unified_rw_reporting=1 group_reporting iodepth=10 error at the vdsm log : libvirtEventLoop::INFO::2016-12-21 17:31:29,347::vm::4041::virt.vm:onIOError) vmId=`99866c78-ede5-4707-86e4-12c1d627fa0c`::abnormal vm stop device virtio-disk0 error enospc libvirtEventLoop::INFO::2016-12-21 17:31:29,347::vm::4877::virt.vm:_logGuestCpuStatus) vmId=`99866c78-ede5-4707-86e4-12c1d627fa0c`::CPU stopped: onIOError