| Summary: | [lvm] performance degradation when executing lots of lvextend operations | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | ||||||
| Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> | ||||||
| Status: | CLOSED CANTFIX | QA Contact: | Corey Marthaler <cmarthal> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.2 | CC: | abaron, agk, coughlan, cpelland, danken, ddumas, dwysocha, hateya, heinzm, iheim, jbrassow, lyarwood, mgoldboi, plyons, prajnoha, prockai, thornber, tvvcox, yeylon, ykaul, zkabelac | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | 6.2 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | storage | ||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2013-06-28 17:38:46 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 756082, 773650, 773651, 773665, 773677, 773696, 960054 | ||||||||
| Attachments: |
|
||||||||
Created attachment 516278 [details]
lvm - script.
Once this patchset: http://www.redhat.com/archives/lvm-devel/2011-July/msg00061.html hits upstream, situation should become much better. But some further improvement might be needed. But of course pushing 18000 extents to kernel is not a cost-free operation. Yes, if the extents being added are contiguous to the existing extents, then that avoids adding a new segment and the metadata doesn't grow. I think the extension by 1M is simply too small and happens too often - much bigger chunks needs to be allocated and far less frequently - I could think of certain allocation strategies (new configurable policy) to divide available free extents into some clusters to avoid quick fragmentation, but that would probably only make the test to take much longer till it hits the same heavily fragment state of noncontinuous segments - so some online pvmove defrag daemon would need to be cleaning too many segments (with such small 1M extents). Also since rotational disks do not have the same performance over the whole surface, the allocation policy would be probably even more complex if some some fairness is expected.... Could you please check whether thin-provisioning target fits the task better? (In reply to comment #13) > I think the extension by 1M is simply too small and happens too often - much In real world scenario we use 1024MB extensions, but that pretty much guarantees that no 2 extents would be made to the same LV. The number of extents the LVM supports before slowing down to a crawl is constant regardless of the size, so in a large environment reaching 18TB would reach this state. These numbers are not far fetched. Note that the problem was hit using RHEV (with the 1GB extensions) and Haim simply provided an easy way to simulate it to avoid testing on a complex system. > bigger chunks needs to be allocated and far less frequently - I could think of > certain allocation strategies (new configurable policy) to divide available > free extents into some clusters to avoid quick fragmentation, but that would > probably only make the test to take much longer till it hits the same heavily > fragment state of noncontinuous segments - so some online pvmove defrag daemon > would need to be cleaning too many segments (with such small 1M extents). Is there a utility that can run on the MD and make it more efficient? Defrag of the data itself is out of the question of course when we're talking about 1GB extensions. > > Also since rotational disks do not have the same performance over the whole > surface, the allocation policy would be probably even more complex if some some > fairness is expected.... We're assuming enterprise storage here so that is not interesting. > > Could you please check whether thin-provisioning target fits the task better? a. It does not as we work in a clustered environment. b. it is not production ready yet (In reply to comment #16) > (In reply to comment #13) > > I think the extension by 1M is simply too small and happens too often - much > > In real world scenario we use 1024MB extensions, but that pretty much > guarantees that no 2 extents would be made to the same LV. The number of > extents the LVM supports before slowing down to a crawl is constant regardless > of the size, so in a large environment reaching 18TB would reach this state. > These numbers are not far fetched. > > Note that the problem was hit using RHEV (with the 1GB extensions) and Haim > simply provided an easy way to simulate it to avoid testing on a complex > system. > > > bigger chunks needs to be allocated and far less frequently - I could think of > > certain allocation strategies (new configurable policy) to divide available > > free extents into some clusters to avoid quick fragmentation, but that would > > probably only make the test to take much longer till it hits the same heavily > > fragment state of noncontinuous segments - so some online pvmove defrag daemon > > would need to be cleaning too many segments (with such small 1M extents). > > Is there a utility that can run on the MD and make it more efficient? > Defrag of the data itself is out of the question of course when we're talking > about 1GB extensions. > > > > > Also since rotational disks do not have the same performance over the whole > > surface, the allocation policy would be probably even more complex if some some > > fairness is expected.... > > We're assuming enterprise storage here so that is not interesting. > > > > > Could you please check whether thin-provisioning target fits the task better? > > a. It does not as we work in a clustered environment. > b. it is not production ready yet I discussed this with Joe and although in 6.3 we will be able to utilize thin-p target, that would still not solve the problem. This is something we discussed before - a request for a different allocation policy in LVM2. |
Created attachment 516277 [details] lvm - scale logs. Description of problem: we see a performance degradation in lvm when executing lots (18000) extends overall. on each lvextend, metadata size grows, which apparently, effect performance. after 18K extends, MD size is 3.6M. in the beginning of the test, each lv extends takes 2-3 seconds, but after each iteration (round of extending 500 lvs) it grows exponentially with 1-2 seconds, whereas, in the end of the test, it takes 16 seconds on each lvextend. also, on each end of extend round, i have executed set of commands which are vital for our hypervisor management system (vdsm) functionality, which consist of the following: - lvs -o +all - vgs -o +all - vgck - /sbin/multipath in the beginning of the test, it took 8 seconds to run all these commands (overall), but after 36 rounds, it takes 23 seconds. attached please find a reproducer script, and logs indicating of the problem. Test Parameters: - 1 PV (7T) - 1 VG - mdSize = 128m - 500 LVs - lv size = 1g, 0.1k tags. - extend size - 1m Test Case: - extend each lv 36 times (500 lvs * 36 extends = 18000). script info: - script requires path to pv and desired number of lvs to create. - inside, there is a section of configuration, which consist of lv size, MD size, extend count, and so on and so forth.