Red Hat Bugzilla – Bug 1274676
Resize of full thin-pool looses data
Last modified: 2016-11-04 00:11:42 EDT
Description of problem: lvm2 design should allow resize of the thin-pool which gets 100% full - however it's not doing it correctly. It leaves resize on 'dmeventd' - which notices pool is being full and runs 'lvextend --use-policies' - however this resize operation runs 'suspend' with 'flush' - this cannot pass since pool is already full in this case. But - the existing dmeventd send itself ALRM signal and releases itself from suspend and exits hiddenly, but pool 60s timeout releases 'preasure' by erroring all 'in-flight' operation - so next retry to resize succeeds. Pool appears to be 'resize' and working - so the problem of loosing buffer-cached writes is quite well hidden. Version-Release number of selected component (if applicable): lvm2 2.02.132 How reproducible: Steps to Reproduce: 1. fill thin-pool 100% with writes 2. check till pool is resize 3. retest writes Actual results: Expected results: Additional info:
Attaching 'minimized' patchset which avoid usage of flush with suspend when resizing thin-pool. It's worth to note here - if the 'extension' of thin-pool data LV is 'big' enough to handle whole 'flush' data will not be lost. Providing also some example (fits into lvm2 test suite) how to test issue.
Created attachment 1097317 [details] https://www.redhat.com/archives/lvm-devel/2015-October/msg00142.html
Created attachment 1097318 [details] https://www.redhat.com/archives/lvm-devel/2015-October/msg00156.html
Created attachment 1097319 [details] https://www.redhat.com/archives/lvm-devel/2015-October/msg00143.html
Created attachment 1097320 [details] https://www.redhat.com/archives/lvm-devel/2015-October/msg00144.html
Created attachment 1097321 [details] Testing script (for lvm2 test suite) Testing data-lost during write to pool which needs resize, and lvextend happens in parallel (e.g. via dmeventd)
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Marking verified. No data loss occured while extending 100% full thin pool. # bash -x test.sh + vg=vg + lv=test_lv + vgcreate -s 512K vg /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde Volume group "vg" successfully created + lvcreate -L1M -V2M -n test_lv vg/pool Using default stripesize 64.00 KiB. WARNING: Sum of all thin volume sizes (2.00 MiB) exceeds the size of thin pool vg/pool (1.00 MiB)! For thin pool auto extension activation/thin_pool_autoextend_threshold should be below 100. Logical volume "test_lv" created. + seq 0 315465 + cut -f 1 -d ' ' + tee MD5 + md5sum 2M 0ebb1b10c6b3e4ff62d1f2350af86ffb + sleep .1 + dd if=2M of=/dev/mapper/vg-test_lv bs=512K conv=fdatasync + lvs -a vg LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [lvol0_pmspare] vg ewi------- 2.00m pool vg twi-aotzD- 1.00m 100.00 1.95 [pool_tdata] vg Twi-ao---- 1.00m [pool_tmeta] vg ewi-ao---- 2.00m test_lv vg Vwi-aotz-- 2.00m pool 50.00 + lvextend -L+512k vg/pool WARNING: Sum of all thin volume sizes (2.00 MiB) exceeds the size of thin pools (1.50 MiB)! For thin pool auto extension activation/thin_pool_autoextend_threshold should be below 100. Size of logical volume vg/pool_tdata changed from 1.00 MiB (2 extents) to 1.50 MiB (3 extents). Logical volume vg/pool_tdata successfully resized. + lvextend -L+512k vg/pool Size of logical volume vg/pool_tdata changed from 1.50 MiB (3 extents) to 2.00 MiB (4 extents). 4+0 records in 4+0 records out 2097152 bytes (2.1 MB) copied, 0.838726 s, 2.5 MB/s Logical volume vg/pool_tdata successfully resized. + wait + cat log + lvs -a vg LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [lvol0_pmspare] vg ewi------- 2.00m pool vg twi-aotz-- 2.00m 100.00 1.95 [pool_tdata] vg Twi-ao---- 2.00m [pool_tmeta] vg ewi-ao---- 2.00m test_lv vg Vwi-a-tz-- 2.00m pool 100.00 + dd if=/dev/mapper/vg-test_lv of=2M-2 iflag=direct 4096+0 records in 4096+0 records out 2097152 bytes (2.1 MB) copied, 2.13775 s, 981 kB/s + cut -f 1 -d ' ' + tee MD5-2 + md5sum 2M-2 0ebb1b10c6b3e4ff62d1f2350af86ffb + diff MD5 MD5-2 + vgremove -f vg Logical volume "test_lv" successfully removed Logical volume "pool" successfully removed Volume group "vg" successfully removed 3.10.0-505.el7.x86_64 lvm2-2.02.165-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 lvm2-libs-2.02.165-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 lvm2-cluster-2.02.165-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-1.02.134-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-libs-1.02.134-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-event-1.02.134-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-event-libs-1.02.134-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 12:29:13 CEST 2016 cmirror-2.02.165-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1445.html