Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1403321

Summary:	thinpool taking too long to autoextend
Product:	Red Hat Enterprise Linux 6	Reporter:	Roman Bednář <rbednar>
Component:	lvm2	Assignee:	LVM and device-mapper development team <lvm-team>
lvm2 sub component:	Thin Provisioning (RHEL6)	QA Contact:	cluster-qe <cluster-qe>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	low
Priority:	unspecified	CC:	agk, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, prockai, thornber, zkabelac
Version:	6.9	Keywords:	Reopened, TestBlocker
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-01-03 08:24:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Roman Bednář 2016-12-09 16:08:42 UTC

I noticed our thinpool autoextend tests are crashing occasionally. After further digging I found that it's being caused by autoextension of thin pool being slow sometimes. It can be easily observed while lvcreate is ran in a loop and autoextension starts happening while the loop is still running, 'lvcreate' starts failing as shown below.

There is a possibility of performance issue, or maybe our configuration is just lacking some tuneup.


++++++++++++++++++++++manual reproducer++++++++++++++++++++++

# lvs -a
  LV              VG            Attr       LSize   Pool Origin  Data%  Meta%  Move Log Cpy%Sync Convert
  POOL            snapper_thinp twi-aot---   1.20g              59.48  19.92                           
  [POOL_tdata]    snapper_thinp Twi-ao----   1.20g                                                     
  [POOL_tmeta]    snapper_thinp ewi-ao----   2.00m                                                     
  [lvol0_pmspare] snapper_thinp ewi-------   4.00m                    
  ....

Start creating lvs/snaps in a loop (might work as well without snapshot creation):

#for i in {101..500}; do lvcreate -ay -V 10M -T snapper_thinp/POOL -n virt$i && lvcreate -ay  -s snapper_thinp/virt$i -n snap$i;done
...
  Logical volume "virt116" created.
  Logical volume "snap116" created.
  Logical volume "virt117" created.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
  Logical volume "virt124" created.
  Logical volume "snap124" created.
  Logical volume "virt125" created.
...

Thin pool is actually extending correctly but causes errors shown here while doing so.


++++++++++++++++++++++automated test++++++++++++++++++++++

SCENARIO - [verify_auto_extension_of_pool_meta]
Recreating VG and PVs to decrease PE size for smaller pool mda device
Create virt origin and snap volumes until the meta area is filled past the auto extend threshold
Enabling thin_pool_autoextend_threshold
Making pool volume
lvcreate  --thinpool POOL -L 1G --profile thin-performance --zero n --poolmetadatasize 2M snapper_thinp

Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts


Making origin volume
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate  -V 1G -T snapper_thinp/POOL -n other1
lvcreate  -V 1G -T snapper_thinp/POOL -n other2
lvcreate  -V 1G -T snapper_thinp/POOL -n other3
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other4
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other5
Creating origin/snap number:
	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84   Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
couldn't create additional snap snap84

 <fail name="snapper_thinp_verify_auto_extension_of_pool_meta" id="lvm,snapper_thinp,verify_auto_extension_of_pool_meta,lvm_single" pid="12503" time="Fri Dec  9 14:38:10 2016" type="cmd" duration="207" ec="1" />
 PAN2: ALL STOP!!!
 ------------------- Summary ---------------------
 Testcase                                 Result    
 --------                                 ------    
 snapper_thinp_verify_auto_extension      FAIL  


NOTE:

1) So far I was not able to reproduce this while running the same test with zeroing, need more runs to verify this.

2) While running a sanity check (thin_check) on thin meta while zeroing is set a warning is printed:
   'WARNING: Pool zeroing and large 8.00 MiB chunk size slows down provisioning.'
   So I assume zeroing is not recommended in this case.



2.6.32-676.el6.x86_64

lvm2-2.02.143-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
lvm2-libs-2.02.143-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
lvm2-cluster-2.02.143-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
udev-147-2.73.el6_8.2    BUILT: Tue Aug 30 15:17:19 CEST 2016
device-mapper-1.02.117-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
device-mapper-libs-1.02.117-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
device-mapper-event-1.02.117-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
device-mapper-event-libs-1.02.117-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 14:58:09 CET 2016
cmirror-2.02.143-10.el6    BUILT: Thu Nov 24 10:58:43 CET 2016

Comment 6 Roman Bednář 2016-12-16 14:55:07 UTC

I finally managed to reproduce it on 6.8 so it's really not a regression, we just missed it in previous releases because it happens quite rarely. Our regression runs could pass easily without us noticing. 

But I have to agree with Marian on this, lvcreate commands should wait and retry in this case for sure or at least visibly warn about what's going on.

Zeroing was not used in the tests, as already mentioned.

Moving to assigned as TestBlocker since we can't get through regression run without having a workaround for the test case. If you still prefer having it as RFE, let me know I'll create a new BZ if needed.

Comment 7 Zdenek Kabelac 2016-12-16 15:21:38 UTC

I do not agree.

Thin-pool is  configured and documented to not allow to create thin-volume when it's above threshold.  This  HAS race inside:

i.e. you write - then you run  'fstrim' and in parallel  lvcreate.

Depending on 'timing' you will see different occupancy of thin-pool and rejection or acceptance of lvcreate operation.

Same applies on speed of resize operation.

Basically you are operating thin-pool on configured border fullness - and lvm2 may reject further filling of thin-pool.

lvm2 can WAIT in lvcreate and HOLD lock - since resize clearly would not be able to take place - so it would need to run in  'loop' ??.

Other option would be to implement 'lvcreate' that is resizing thin-pool - but that's another huge  cave of troubles  (i.e. PVS on lvcreate are for what)

For such case it's fair and correct to reject operation.

Admin should be aware he is over the threshold and fix workflow.


Possible improvement could be to support several different level of threshold - 1 for resize, 1 for reject of lvcreate, 1 for pool shutdown....
But we are not yet there....

Comment 8 Zdenek Kabelac 2017-01-03 08:24:45 UTC

Closing this bug - as it's not a but - it works as designed.

If there is wish for new behavior,  try to open and specify new RFE.

(But IMHO not really worth the effort - there is always some race on this road,
you would need to probably 'freeze' whole system and do a deep test on frozen
state to give  'definitive' answer for  allow/deny of lvcreate....)