Bug 1408594
Summary: | VM would pause if it writes faster than the time it takes to lvextend its LV (happens with qcow2 over fast block storage) | ||
---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | guy chen <guchen> |
Component: | General | Assignee: | Benny Zlotnik <bzlotnik> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | mlehrer |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | --- | CC: | aefrat, bugs, dagur, ebenahar, frolland, guchen, lsurette, lsvaty, mlehrer, srevivo, tnisan, ycui |
Target Milestone: | ovirt-4.3.3 | Keywords: | Performance |
Target Release: | --- | Flags: | rule-engine:
ovirt-4.3+
|
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: |
Please add the relevant info on fine tuning to the documentation
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-04-29 13:57:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
guy chen
2016-12-25 12:52:23 UTC
The workaround is to change the relevant parameters of the watermark - WHEN we look to extend and by how much we extend. An interesting idea is to look if the storage is thinly provisioned - if it is, assume we can extend more aggressively. Yaniv - we probably should document it if it's not already documented in some KB. Note: also, having an initial bigger disk would have probably prevented this altogether (since the watermark would have reached later, and would have given VDSM even more time to extend). From : (which is tightly coupled / duplicate of this one): [irs] volume_utilization_percent = 50 volume_utilization_chunk_mb = 1024 In the case above, the user can: 1. Extend the initial image size (makes sense anyway - say, 10G, not 1G) 2. Change the above threshold to lower and higher, respectively. Can you validate that the tweaking in comment 8 resolves your issue? I changed the thresholds at vdsm.conf to [irs] volume_utilization_percent = 15 volume_utilization_chunk_mb = 4048 The initial disk size was extended to 5 GB from 1 GB. After it i run the load, there where 3 extends, so the disk size grow to 17 GB. VM was not paused. So the configuration fix the issue, BTW, also found relevant history correspondent of the issue : http://lists.ovirt.org/pipermail/users/2016-May/039803.html Perhaps the initial values are sub-optimal. If we initially have 1GB, and the watermark is on 50%, it means we have 500MB at most to be filled before we choke. Considering we may not even get the notification ASAP, we have ~5 seconds in a 100MB/sec write (which is not very very fast!) till we run out of space and pause. I suggest we change at least the extend size to 2GB or so, to give us room to breath - I don't expect the initial extend to be very quick, since I don't expect in real life scenario to grow so much right away (but perhaps I'm wrong - if it's data dump? if it's OS installation, the decompression of installation files ensures IO isn't that fast). *** Bug 1461536 has been marked as a duplicate of this bug. *** This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. What was fixed changed here? Did you change defaults vdsm.conf? Please provide clear validation instructions ( Also I see comment 10 from 2017 that the conf change was already validated back then.) |