Bug 1403321 - thinpool taking too long to autoextend
Summary: thinpool taking too long to autoextend
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.9
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-09 16:08 UTC by Roman Bednář
Modified: 2017-01-03 08:24 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-03 08:24:45 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Roman Bednář 2016-12-09 16:08:42 UTC
I noticed our thinpool autoextend tests are crashing occasionally. After further digging I found that it's being caused by autoextension of thin pool being slow sometimes. It can be easily observed while lvcreate is ran in a loop and autoextension starts happening while the loop is still running, 'lvcreate' starts failing as shown below.

There is a possibility of performance issue, or maybe our configuration is just lacking some tuneup.


++++++++++++++++++++++manual reproducer++++++++++++++++++++++

# lvs -a
  LV              VG            Attr       LSize   Pool Origin  Data%  Meta%  Move Log Cpy%Sync Convert
  POOL            snapper_thinp twi-aot---   1.20g              59.48  19.92                           
  [POOL_tdata]    snapper_thinp Twi-ao----   1.20g                                                     
  [POOL_tmeta]    snapper_thinp ewi-ao----   2.00m                                                     
  [lvol0_pmspare] snapper_thinp ewi-------   4.00m                    
  ....

Start creating lvs/snaps in a loop (might work as well without snapshot creation):

#for i in {101..500}; do lvcreate -ay -V 10M -T snapper_thinp/POOL -n virt$i && lvcreate -ay  -s snapper_thinp/virt$i -n snap$i;done
...
  Logical volume "virt116" created.
  Logical volume "snap116" created.
  Logical volume "virt117" created.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Logical volume "virt124" created.
  Logical volume "snap124" created.
  Logical volume "virt125" created.
...

Thin pool is actually extending correctly but causes errors shown here while doing so.


++++++++++++++++++++++automated test++++++++++++++++++++++

SCENARIO - [verify_auto_extension_of_pool_meta]
Recreating VG and PVs to decrease PE size for smaller pool mda device
Create virt origin and snap volumes until the meta area is filled past the auto extend threshold
Enabling thin_pool_autoextend_threshold
Making pool volume
lvcreate  --thinpool POOL -L 1G --profile thin-performance --zero n --poolmetadatasize 2M snapper_thinp

Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts


Making origin volume
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate  -V 1G -T snapper_thinp/POOL -n other1
lvcreate  -V 1G -T snapper_thinp/POOL -n other2
lvcreate  -V 1G -T snapper_thinp/POOL -n other3
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other4
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other5
Creating origin/snap number:
	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84   Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
couldn't create additional snap snap84

 <fail name="snapper_thinp_verify_auto_extension_of_pool_meta" id="lvm,snapper_thinp,verify_auto_extension_of_pool_meta,lvm_single" pid="12503" time="Fri Dec  9 14:38:10 2016" type="cmd" duration="207" ec="1" />
 PAN2: ALL STOP!!!
 ------------------- Summary ---------------------
 Testcase                                 Result    
 --------                                 ------    
 snapper_thinp_verify_auto_extension      FAIL  


NOTE:

1) So far I was not able to reproduce this while running the same test with zeroing, need more runs to verify this.

2) While running a sanity check (thin_check) on thin meta while zeroing is set a warning is printed:
   'WARNING: Pool zeroing and large 8.00 MiB chunk size slows down provisioning.'
   So I assume zeroing is not recommended in this case.



2.6.32-676.el6.x86_64

lvm2-2.02.143-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
lvm2-libs-2.02.143-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
lvm2-cluster-2.02.143-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
udev-147-2.73.el6_8.2    BUILT: Tue Aug 30 15:17:19 CEST 2016
device-mapper-1.02.117-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
device-mapper-libs-1.02.117-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
device-mapper-event-1.02.117-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
device-mapper-event-libs-1.02.117-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 14:58:09 CET 2016
cmirror-2.02.143-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016

Comment 6 Roman Bednář 2016-12-16 14:55:07 UTC
I finally managed to reproduce it on 6.8 so it's really not a regression, we just missed it in previous releases because it happens quite rarely. Our regression runs could pass easily without us noticing. 

But I have to agree with Marian on this, lvcreate commands should wait and retry in this case for sure or at least visibly warn about what's going on.

Zeroing was not used in the tests, as already mentioned.

Moving to assigned as TestBlocker since we can't get through regression run without having a workaround for the test case. If you still prefer having it as RFE, let me know I'll create a new BZ if needed.

Comment 7 Zdenek Kabelac 2016-12-16 15:21:38 UTC
I do not agree.

Thin-pool is  configured and documented to not allow to create thin-volume when it's above threshold.  This  HAS race inside:

i.e. you write - then you run  'fstrim' and in parallel  lvcreate.

Depending on 'timing' you will see different occupancy of thin-pool and rejection or acceptance of lvcreate operation.

Same applies on speed of resize operation.

Basically you are operating thin-pool on configured border fullness - and lvm2 may reject further filling of thin-pool.

lvm2 can WAIT in lvcreate and HOLD lock - since resize clearly would not be able to take place - so it would need to run in  'loop' ??.

Other option would be to implement 'lvcreate' that is resizing thin-pool - but that's another huge  cave of troubles  (i.e. PVS on lvcreate are for what)

For such case it's fair and correct to reject operation.

Admin should be aware he is over the threshold and fix workflow.


Possible improvement could be to support several different level of threshold - 1 for resize, 1 for reject of lvcreate, 1 for pool shutdown....
But we are not yet there....

Comment 8 Zdenek Kabelac 2017-01-03 08:24:45 UTC
Closing this bug - as it's not a but - it works as designed.

If there is wish for new behavior,  try to open and specify new RFE.

(But IMHO not really worth the effort - there is always some race on this road,
you would need to probably 'freeze' whole system and do a deep test on frozen
state to give  'definitive' answer for  allow/deny of lvcreate....)


Note You need to log in before you can comment on or make changes to this bug.