Bug 1322732
Summary: | [Scale] heketi-cli: Volume creation failed with error "metadata too large for circular buffer" | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Neha <nerawat> | ||||
Component: | heketi | Assignee: | Luis Pabón <lpabon> | ||||
Status: | CLOSED ERRATA | QA Contact: | Neha <nerawat> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.1 | CC: | hchiramm, lpabon, madam, mliyazud, pprakash, rcyriac, sashinde, zkabelac | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | RHGS Container Converged 1.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-08-04 04:50:26 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1332128 | ||||||
Attachments: |
|
Description
Neha
2016-03-31 08:59:05 UTC
Added bug: https://github.com/heketi/heketi/issues/268 It looks like the default metadatasize for PVs is too small: https://www.redhat.com/archives/linux-lvm/2010-November/msg00088.html May need to increase it to something like 64MB, which (I'm guessing here) should allow for about ~2-3K LVs. PR https://github.com/heketi/heketi/pull/275 is available. The change sets the metadata size to 128MB. We need to find verification that this value is correct Checked with #lvm community and they mentioned that RHEV/Ovirt uses 128M metadata size. I have confirmed in this email, but still haven't found the code http://lists.ovirt.org/pipermail/users/2013-July/015435.html . I will move forward with this change using 128M size. Althought this solution is used by ovirt, latest LVM autoresizes the metadata. See lvm.conf thin_pool_autoextend_threshold This is an example of mixing apples and oranges. So please spend a few minutes reading the man page (and yes it's been me answering your question on freenode...) -- lvm2 metadata size - is the preallocated buffer to store metadata for a volume group and it's typically located in front of disk/device. So for the case you want to keep less then 5K LVs - 8MB metadata size should ok - and it's quite mandatory to get this size correct properly set when you create/initialize your PV/VG - change of this size later is not supported by lvm2 tools and it's non-trivial. -- thin-pool metadata is the size to keep information about thin-volume within a single thin-pool - by default lvm2 targets for 128MB - but it could be easily created bigger if you know in front you need more. And yes thin-pool metadata could be easily resized online and could be automatically resized when threshold is reached. Now - using thousands of active thin volumes from a single thin-pool is not advised - although it would be interesting to get any feedback about some workload comparison with native (non-thin) volumes. Thanks Zdenek. Our model is as follows: On each single disk, we create a PV, and then one VG on top of it. On the VG, we create a thin pool for each individual LV. Here is an example diagram: +-----------+ +-----+ | | |Brick| | Brick B | |A | | XFS | |XFS | +----------------------------+ | | | | ThinP B | ThinP A | | | | +-----------------------------------+ | | | VG | +-----------------------------------+ | | | PV | +-----------------------------------+ | | | Disk | | | +-----------------------------------+ In this case do not forget counting with far bigger metadata size. 1 single thin LV + thin-pool lead to 4 LVs stored in metadata - so the size grow much faster if you would be using just 1 linear LV. And as said earlier - when number of LVs in lvm2 metadata grows too much - the code process will go noticeable slower (lvm2 is not really a database tool ATM, and never been designed for 10K volumes or more....) Also archiving and backup in /etc/lvm/archive might become noticable.... So for lots of LVs - pvcreate --metadatasize For lots of thinLVs in a single thin-pool lvcreate --poolmetadatasize Created attachment 1156679 [details]
Heketi Logs
Prasanth , It seems the heketi rpm/image you are using doesn't have fix for this issue, in logs. Here its taking default metadatasize. pvcreate --dataalignment 256K /dev/vdd After fix, metadatasize is "128M" and I was able to create 256+ volumes with it. pvcreate --metadatasize=128M --dataalignment=256K /dev/sda (In reply to Prasanth from comment #13) > Created attachment 1156679 [details] > Heketi Logs Creating 183 thin-pools + thin-volumes - creates 4 volumes! Let's approximate 3KB of lvm2 metadata space per 4 volumes. And you easily run out of ~500KB max metadata size you get with 1M --metadatasize. You could check at any point used space with 'vgcfgbackup' or looking at archived size in /etc/lvm/archive in case archiving is enabled. (In this case I'd strongly advice to disable archiving and clear event content of /etc/lvm/archive directory) For using this amount of LVs in a single VG - you simply have to create big metadata in the 'pvcreate' time. e.g.: pvcreate --metadatasize 64M /dev/sdX vgcreate vg /dev/sdX once the PV is 'pvcreated' you can't change metadata size. Surely no bug on lvm2 side here, default 1MB size is simply not big enough. (In reply to Neha from comment #14) > Prasanth , > > It seems the heketi rpm/image you are using doesn't have fix for this issue, > in logs. Here its taking default metadatasize. > > pvcreate --dataalignment 256K /dev/vdd > > > After fix, metadatasize is "128M" and I was able to create 256+ volumes with > it. > > pvcreate --metadatasize=128M --dataalignment=256K /dev/sda It looks to me that, the issue is different than the fix mentioned here. That said, in c#11 Zdenek mentioned that, if we are creating lots of thin LVs we need to have bigger 'poolmetadatasize' . iic, we are hitting that limit here. This is fixed in the the server since 1.0.3. This should be moved to ON_QA . able to create 260 volumes on a single pv using current heketi image. Moving it to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1498.html |