Bug 1322732

Summary: [Scale] heketi-cli: Volume creation failed with error "metadata too large for circular buffer"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Neha <nerawat>
Component: heketiAssignee: Luis Pabón <lpabon>
Status: CLOSED ERRATA QA Contact: Neha <nerawat>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: hchiramm, lpabon, madam, mliyazud, pprakash, rcyriac, sashinde, zkabelac
Target Milestone: ---Keywords: ZStream
Target Release: RHGS Container Converged 1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-04 04:50:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1332128    
Attachments:
Description Flags
Heketi Logs none

Description Neha 2016-03-31 08:59:05 UTC
Description of problem:
Created around 150+ volumes using heketi-cli. After 150 its failing due to metadatasize.

heketi-cli volume create -name=vol161 -size=100 -durability="replicate" -replica=2
Error: Process exited with: 5. Reason was:  ()


Heketi logs:

[sshexec] ERROR 2016/03/31 14:22:26 /builddir/build/BUILD/heketi-6563551111f7178b679866e85a0682325929d037/src/github.com/heketi/heketi/utils/ssh/ssh.go:155: Failed to run command [sudo lvcreate --poolmetadatasize 262144K -c 256K -L 52428800K -T vg_b2895b77ebe7ba3ba064544d15e54653/tp_99f7a9139e07e7f041afc4141016b4d4 -V 52428800K -n brick_99f7a9139e07e7f041afc4141016b4d4] on <node-4 ip>:22: Err[Process exited with: 5. Reason was:  ()]: Stdout []: Stderr [  VG vg_b2895b77ebe7ba3ba064544d15e54653 metadata too large for circular buffer
  Failed to write VG vg_b2895b77ebe7ba3ba064544d15e54653.
]


Version-Release number of selected component (if applicable):
heketi-1.0.2-1

How reproducible:
After 150+ volumes

Steps to Reproduce:
Try to create volume using heketi-cli

Actual results:
Failing with "Error: Process exited with: 5. Reason was:  ()"

Expected results:
Volume creation should be successful.

Additional info:
Will add heketi logs and setup details.

Comment 4 Luis Pabón 2016-03-31 11:47:09 UTC
Added bug: https://github.com/heketi/heketi/issues/268

Comment 5 Luis Pabón 2016-03-31 12:34:09 UTC
It looks like the default metadatasize for PVs is too small:  https://www.redhat.com/archives/linux-lvm/2010-November/msg00088.html

May need to increase it to something like 64MB, which (I'm guessing here) should allow for about ~2-3K LVs.

Comment 6 Luis Pabón 2016-04-11 18:49:57 UTC
PR https://github.com/heketi/heketi/pull/275 is available.  The change sets the metadata size to 128MB.

We need to find verification that this value is correct

Comment 7 Luis Pabón 2016-04-11 19:13:09 UTC
Checked with #lvm community and they mentioned that RHEV/Ovirt uses 128M metadata size.  I have confirmed in this email, but still haven't found the code http://lists.ovirt.org/pipermail/users/2013-July/015435.html .  I will move forward with this change using 128M size.

Comment 8 Luis Pabón 2016-04-11 19:22:38 UTC
Althought this solution is used by ovirt, latest LVM autoresizes the metadata.  See lvm.conf  thin_pool_autoextend_threshold

Comment 9 Zdenek Kabelac 2016-04-12 10:57:52 UTC
This is an example of mixing apples and oranges.

So please spend a few minutes reading the man page (and yes it's been me answering your question on freenode...)

--

lvm2 metadata size -  is the preallocated buffer to store metadata for a volume group and it's typically located in front of disk/device.

So for the case you want to keep less then 5K LVs -  8MB metadata size should ok - and it's quite mandatory to get this size correct properly set when you create/initialize your PV/VG  - change of this size later is not supported by lvm2 tools and it's non-trivial.

--

thin-pool metadata is the size to keep information about thin-volume within a single thin-pool - by default lvm2 targets for 128MB - but it could be easily created bigger if you know in front you need more.
And yes thin-pool metadata could be easily resized online and could be automatically resized when threshold is reached.

Now - using thousands of active thin volumes from a single thin-pool is not advised - although it would be interesting to get any feedback about some workload comparison with native (non-thin) volumes.

Comment 10 Luis Pabón 2016-04-12 13:03:02 UTC
Thanks Zdenek.
  Our model is as follows: On each single disk, we create a PV, and then one VG on top of it.  On the VG, we create a thin pool for each individual LV.  Here is an example diagram:

          +-----------+   +-----+         
          |           |   |Brick|         
          |  Brick B  |   |A    |         
          |  XFS      |   |XFS  |         
          +----------------------------+  
          |               |            |  
          |   ThinP B     |   ThinP A  |  
          |               |            |  
   +-----------------------------------+  
   |                                   |  
   |            VG                     |  
   +-----------------------------------+  
   |                                   |  
   |            PV                     |  
   +-----------------------------------+  
   |                                   |  
   |            Disk                   |  
   |                                   |  
   +-----------------------------------+

Comment 11 Zdenek Kabelac 2016-04-13 08:28:33 UTC
In this case do not forget counting with far bigger metadata size.

1 single thin LV + thin-pool lead to 4 LVs stored in metadata - 
so the size grow much faster if you would be using just 1 linear LV.

And as said earlier -  when number of LVs in lvm2 metadata grows too much - the code process will go noticeable slower (lvm2 is not really a database tool ATM,
and never been designed for 10K volumes or more....)

Also archiving and backup in /etc/lvm/archive might become noticable....


So for lots of  LVs -   pvcreate --metadatasize

For lots of thinLVs in a single thin-pool lvcreate --poolmetadatasize

Comment 13 Prasanth 2016-05-12 13:24:05 UTC
Created attachment 1156679 [details]
Heketi Logs

Comment 14 Neha 2016-05-12 14:54:39 UTC
Prasanth ,

It seems the heketi rpm/image you are using doesn't have fix for this issue, in logs. Here its taking default metadatasize.

pvcreate --dataalignment 256K /dev/vdd


After fix, metadatasize is "128M" and I was able to create 256+ volumes with it.

pvcreate --metadatasize=128M --dataalignment=256K /dev/sda

Comment 15 Zdenek Kabelac 2016-05-12 15:19:55 UTC
(In reply to Prasanth from comment #13)
> Created attachment 1156679 [details]
> Heketi Logs

Creating 183 thin-pools + thin-volumes  - creates 4 volumes!
Let's approximate  3KB of lvm2 metadata space per 4 volumes.
And you easily run out of ~500KB max metadata size you get with 1M --metadatasize.

You could check at any point used space with 'vgcfgbackup' or looking
at archived size in /etc/lvm/archive in case archiving is enabled.
(In this case I'd strongly advice to disable archiving and clear
event content of /etc/lvm/archive directory)


For using this amount of LVs in a single VG - you simply have to create big metadata in the 'pvcreate' time.

e.g.:  

pvcreate  --metadatasize 64M  /dev/sdX
vgcreate vg  /dev/sdX

once the PV is 'pvcreated' you can't change metadata size.

Surely no bug on lvm2 side here, default 1MB size is simply not big enough.

Comment 16 Humble Chirammal 2016-05-13 07:01:44 UTC
(In reply to Neha from comment #14)
> Prasanth ,
> 
> It seems the heketi rpm/image you are using doesn't have fix for this issue,
> in logs. Here its taking default metadatasize.
> 
> pvcreate --dataalignment 256K /dev/vdd
> 
> 
> After fix, metadatasize is "128M" and I was able to create 256+ volumes with
> it.
> 
> pvcreate --metadatasize=128M --dataalignment=256K /dev/sda

It looks to me that, the issue is different than the fix mentioned here. That said, in c#11 Zdenek mentioned that, if we are creating lots of thin LVs we need to have bigger 'poolmetadatasize' . iic, we are hitting that limit here.

Comment 17 Luis Pabón 2016-05-24 18:55:39 UTC
This is fixed in the the server since 1.0.3. This should be moved to ON_QA

Comment 18 Luis Pabón 2016-05-24 18:56:15 UTC
.

Comment 23 Neha 2016-06-30 14:01:33 UTC
able to create 260 volumes on a single pv using current heketi image.

Moving it to verified.

Comment 25 errata-xmlrpc 2016-08-04 04:50:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1498.html