Hide Forgot
I noticed our thinpool autoextend tests are crashing occasionally. After further digging I found that it's being caused by autoextension of thin pool being slow sometimes. It can be easily observed while lvcreate is ran in a loop and autoextension starts happening while the loop is still running, 'lvcreate' starts failing as shown below. There is a possibility of performance issue, or maybe our configuration is just lacking some tuneup. ++++++++++++++++++++++manual reproducer++++++++++++++++++++++ # lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert POOL snapper_thinp twi-aot--- 1.20g 59.48 19.92 [POOL_tdata] snapper_thinp Twi-ao---- 1.20g [POOL_tmeta] snapper_thinp ewi-ao---- 2.00m [lvol0_pmspare] snapper_thinp ewi------- 4.00m .... Start creating lvs/snaps in a loop (might work as well without snapshot creation): #for i in {101..500}; do lvcreate -ay -V 10M -T snapper_thinp/POOL -n virt$i && lvcreate -ay -s snapper_thinp/virt$i -n snap$i;done ... Logical volume "virt116" created. Logical volume "snap116" created. Logical volume "virt117" created. Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold. Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold. Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold. Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold. Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold. Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold. Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold. Logical volume "virt124" created. Logical volume "snap124" created. Logical volume "virt125" created. ... Thin pool is actually extending correctly but causes errors shown here while doing so. ++++++++++++++++++++++automated test++++++++++++++++++++++ SCENARIO - [verify_auto_extension_of_pool_meta] Recreating VG and PVs to decrease PE size for smaller pool mda device Create virt origin and snap volumes until the meta area is filled past the auto extend threshold Enabling thin_pool_autoextend_threshold Making pool volume lvcreate --thinpool POOL -L 1G --profile thin-performance --zero n --poolmetadatasize 2M snapper_thinp Sanity checking pool device (POOL) metadata thin_check /dev/mapper/snapper_thinp-meta_swap examining superblock examining devices tree examining mapping tree checking space map counts Making origin volume lvcreate --virtualsize 1G -T snapper_thinp/POOL -n origin lvcreate -V 1G -T snapper_thinp/POOL -n other1 lvcreate -V 1G -T snapper_thinp/POOL -n other2 lvcreate -V 1G -T snapper_thinp/POOL -n other3 lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other4 lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other5 Creating origin/snap number: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold. couldn't create additional snap snap84 <fail name="snapper_thinp_verify_auto_extension_of_pool_meta" id="lvm,snapper_thinp,verify_auto_extension_of_pool_meta,lvm_single" pid="12503" time="Fri Dec 9 14:38:10 2016" type="cmd" duration="207" ec="1" /> PAN2: ALL STOP!!! ------------------- Summary --------------------- Testcase Result -------- ------ snapper_thinp_verify_auto_extension FAIL NOTE: 1) So far I was not able to reproduce this while running the same test with zeroing, need more runs to verify this. 2) While running a sanity check (thin_check) on thin meta while zeroing is set a warning is printed: 'WARNING: Pool zeroing and large 8.00 MiB chunk size slows down provisioning.' So I assume zeroing is not recommended in this case. 2.6.32-676.el6.x86_64 lvm2-2.02.143-10.el6 BUILT: Thu Nov 24 10:58:43 CET 2016 lvm2-libs-2.02.143-10.el6 BUILT: Thu Nov 24 10:58:43 CET 2016 lvm2-cluster-2.02.143-10.el6 BUILT: Thu Nov 24 10:58:43 CET 2016 udev-147-2.73.el6_8.2 BUILT: Tue Aug 30 15:17:19 CEST 2016 device-mapper-1.02.117-10.el6 BUILT: Thu Nov 24 10:58:43 CET 2016 device-mapper-libs-1.02.117-10.el6 BUILT: Thu Nov 24 10:58:43 CET 2016 device-mapper-event-1.02.117-10.el6 BUILT: Thu Nov 24 10:58:43 CET 2016 device-mapper-event-libs-1.02.117-10.el6 BUILT: Thu Nov 24 10:58:43 CET 2016 device-mapper-persistent-data-0.6.2-0.1.rc7.el6 BUILT: Tue Mar 22 14:58:09 CET 2016 cmirror-2.02.143-10.el6 BUILT: Thu Nov 24 10:58:43 CET 2016
I finally managed to reproduce it on 6.8 so it's really not a regression, we just missed it in previous releases because it happens quite rarely. Our regression runs could pass easily without us noticing. But I have to agree with Marian on this, lvcreate commands should wait and retry in this case for sure or at least visibly warn about what's going on. Zeroing was not used in the tests, as already mentioned. Moving to assigned as TestBlocker since we can't get through regression run without having a workaround for the test case. If you still prefer having it as RFE, let me know I'll create a new BZ if needed.
I do not agree. Thin-pool is configured and documented to not allow to create thin-volume when it's above threshold. This HAS race inside: i.e. you write - then you run 'fstrim' and in parallel lvcreate. Depending on 'timing' you will see different occupancy of thin-pool and rejection or acceptance of lvcreate operation. Same applies on speed of resize operation. Basically you are operating thin-pool on configured border fullness - and lvm2 may reject further filling of thin-pool. lvm2 can WAIT in lvcreate and HOLD lock - since resize clearly would not be able to take place - so it would need to run in 'loop' ??. Other option would be to implement 'lvcreate' that is resizing thin-pool - but that's another huge cave of troubles (i.e. PVS on lvcreate are for what) For such case it's fair and correct to reject operation. Admin should be aware he is over the threshold and fix workflow. Possible improvement could be to support several different level of threshold - 1 for resize, 1 for reject of lvcreate, 1 for pool shutdown.... But we are not yet there....
Closing this bug - as it's not a but - it works as designed. If there is wish for new behavior, try to open and specify new RFE. (But IMHO not really worth the effort - there is always some race on this road, you would need to probably 'freeze' whole system and do a deep test on frozen state to give 'definitive' answer for allow/deny of lvcreate....)