Bug 1065051
| Summary: | A few races/bugs in the thinp code compromise the ability to successfully resize a full thin-pool | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Zdenek Kabelac <zkabelac> | ||||||||
| Component: | kernel | Assignee: | Mike Snitzer <msnitzer> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | yanfu,wang <yanwang> | ||||||||
| Severity: | unspecified | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 7.0 | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, yanwang, zkabelac | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | kernel-3.10.0-110.el7 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2014-06-13 12:41:30 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Zdenek Kabelac
2014-02-13 19:01:00 UTC
Created attachment 863191 [details]
Patch 1
1st. patch suggest to partial fix problem
Created attachment 863192 [details]
Patch 2
2nd. patch proposed by Mike
When both patches are applied - problem seems to be slightly different. Now lvm2 is capable to resize the data device for pool without getting write errors generated during on going write to thin volume. Here are again step to repeat the problem without even using dmeventd - just multiple terminals are needed. Check lvm.conf has disable threashold for pool (or disabled monitoring) and filter out dm devices (enable on PV device and reject everything else) thin_pool_autoextend_threshold = 100 monitoring = 0 filter = ["a/loop/", "r/.*/"] Create pool & thin volume: - lvcreate -L10 -V20 -T vg/pool Create 20MB dd write which will block - dd if=/dev/zero of=/dev/vg/lvol1 bs=1M Now the pool should be blocked and awating resize. In the 2nd terminal - - lvextend -L+20 vg/pool this should unlock pool and let the write finish. During this I can obtain this kernel log: [21144.209733] device-mapper: thin: 253:3: reached low water mark for data device: sending event. [21144.210063] device-mapper: thin: 253:3: no free data space available. [21144.210066] device-mapper: thin: 253:3: switching pool to read-only mode [21144.218431] bio: create slab <bio-0> at 0 [21177.928424] bio: create slab <bio-0> at 0 [21177.929101] device-mapper: thin: 253:3: growing the data device from 160 to 480 blocks [21177.929172] device-mapper: thin: 253:3: switching pool to write mode [21177.992551] device-mapper: thin: 253:3: switching pool to read-only mode [21178.001530] bio: create slab <bio-0> at 0 As could be seen - pool seems to stay in read-only mode Now when I try to remove thin volume lvol1 - I get error when delete message is passed to the thin pool: [21269.301661] device-mapper: space map common: dm_tm_shadow_block() failed [21269.301671] device-mapper: space map common: dm_tm_shadow_block() failed [21269.301674] device-mapper: space map metadata: unable to allocate new metadata block [21269.301677] device-mapper: thin: Deletion of thin device 1 failed. Maybe there needs to be a special new command passed to the resize thin-pool ? - but this would also mean the driver is incompatible with previous behavior, where pool has been just waiting for resize and after that it has continued with normal operation. (In reply to Zdenek Kabelac from comment #4) > When both patches are applied - problem seems to be slightly different. > Now lvm2 is capable to resize the data device for pool without getting write > errors generated during on going write to thin volume. > > Here are again step to repeat the problem without even using dmeventd - just > multiple terminals are needed. Thanks, I'll try to reproduce today. > As could be seen - pool seems to stay in read-only mode Yeah, considering there is no indication of error that would cause the transition back to read-only, this is weird. (In reply to Mike Snitzer from comment #5) > (In reply to Zdenek Kabelac from comment #4) > > When both patches are applied - problem seems to be slightly different. > > Now lvm2 is capable to resize the data device for pool without getting write > > errors generated during on going write to thin volume. > > > > Here are again step to repeat the problem without even using dmeventd - just > > multiple terminals are needed. > > Thanks, I'll try to reproduce today. > > > As could be seen - pool seems to stay in read-only mode > > Yeah, considering there is no indication of error that would cause the > transition back to read-only, this is weird. lvextend of the pool is suspending and resuming the pool multiple times during resize: #libdm-deptree.c:2476 Loading stec-pool-tpool table (253:3) #libdm-deptree.c:2420 Adding target to (253:3): 0 65536 thin-pool 253:1 253:2 128 0 0 #ioctl/libdm-iface.c:1750 dm table (253:3) OF [16384] (*1) #ioctl/libdm-iface.c:1750 dm reload (253:3) NF [16384] (*1) #libdm-deptree.c:2528 Table size changed from 24576 to 65536 for stec-pool-tpool (253:3). #libdm-deptree.c:1263 Resuming stec-pool-tpool (253:3) #libdm-common.c:2154 Udev cookie 0xd4db319 (semid 1736707) incremented to 3 #libdm-common.c:2395 Udev cookie 0xd4db319 (semid 1736707) assigned to RESUME task(5) with flags DISABLE_SUBSYSTEM_RULES DISABLE_DISK_RULES DISABLE_OTHER_RULES DISABLE_LIBRARY_FALLBACK (0x2e) #ioctl/libdm-iface.c:1750 dm resume (253:3) NF [16384] (*1) #libdm-common.c:1352 stec-pool-tpool: Stacking NODE_ADD (253,3) 0:6 0660 [trust_udev] #libdm-common.c:1362 stec-pool-tpool: Stacking NODE_READ_AHEAD 256 (flags=1) and later: #libdm-deptree.c:1314 Suspending stec-pool-tpool (253:3) with device flush #ioctl/libdm-iface.c:1750 dm suspend (253:3) NFS [16384] (*1) ... #libdm-deptree.c:2476 Loading stec-pool-tpool table (253:3) #libdm-deptree.c:2420 Adding target to (253:3): 0 65536 thin-pool 253:1 253:2 128 0 0 #ioctl/libdm-iface.c:1750 dm table (253:3) OF [16384] (*1) #libdm-deptree.c:2511 Suppressed stec-pool-tpool (253:3) identical table reload. ... #libdm-deptree.c:1263 Resuming stec-pool-tpool (253:3) #libdm-common.c:2154 Udev cookie 0xd4db319 (semid 1736707) incremented to 4 #libdm-common.c:2395 Udev cookie 0xd4db319 (semid 1736707) assigned to RESUME task(5) with flags DISABLE_SUBSYSTEM_RULES DISABLE_DISK_RULES DISABLE_OTHER_RULES DISABLE_LIBRARY_FALLBACK (0x2e) #ioctl/libdm-iface.c:1750 dm resume (253:3) NF [16384] (*1) #libdm-common.c:1352 stec-pool-tpool: Stacking NODE_ADD (253,3) 0:6 0660 [trust_udev] #libdm-common.c:1362 stec-pool-tpool: Stacking NODE_READ_AHEAD 256 (flags=1) #libdm-common.c:225 Suspended device counter reduced to 1 Not sure why lvextend is doing the second suspend/resume (probably some bug in libdm-deptree?) but regardless.. the table isn't changing so it certainly shouldn't be causing the pool to transition to read-only mode. (In reply to Mike Snitzer from comment #6) > Not sure why lvextend is doing the second suspend/resume (probably some bug > in libdm-deptree?) but regardless.. the table isn't changing so it certainly > shouldn't be causing the pool to transition to read-only mode. Using the device-mapper-test-suite -- which doesn't use lvm -- I've been able to confirm that a suspend+resume of a read-write pool, at the end of our "resize_io" test, will cause it to transition to read-only. Subtree suspend/resume is the current way of operation of whole lvm. Mostly to advertise on top-level node, there is something underneath suspended - so you may avoid opening device which you know you will sleep on it (yet it's not really race-free...) Anyway I guess it shouldn't cause troubles - operation is just slightly slower. Created attachment 863524 [details]
Patch 3
This patch fixes the unexpected pool mode transition on table reload.
FYI, the latest fixes are all available in the 'devel' branch of this git repo: git://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git You can browse the changes here: https://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel I'll be pulling a subset (or all) of these thinp changes into linux-dm.git (as fixes for 3.14) once I've had a chance to coordinate with Joe. (In reply to Mike Snitzer from comment #7) > (In reply to Mike Snitzer from comment #6) > > > Not sure why lvextend is doing the second suspend/resume (probably some bug > > in libdm-deptree?) but regardless.. the table isn't changing so it certainly > > shouldn't be causing the pool to transition to read-only mode. > > Using the device-mapper-test-suite -- which doesn't use lvm -- I've been > able to confirm that a suspend+resume of a read-write pool, at the end of > our "resize_io" test, will cause it to transition to read-only. Hi Mike, I'm reproducing the issue using device-mapper-test-suite, below is my test result. Could you confirm if the failure is the expected and how do you confirm that a suspend+resume of a read-write pool like your said above? # dmtest run --suite thin-provisioning -n resize_io --profile spindle Loaded suite thin-provisioning Started test_resize_io(PoolResizeTests): F Finished in 7.896977168 seconds. 1) Failure: test_resize_io(PoolResizeTests) [/usr/local/rvm/gems/ruby-1.9.3-p484/gems/rspec-expectations-2.14.5/lib/rspec/expectations/fail_with.rb:32:in `fail_with' /usr/local/rvm/gems/ruby-1.9.3-p484/gems/rspec-expectations-2.14.5/lib/rspec/expectations/handler.rb:36:in `handle_matcher' /usr/local/rvm/gems/ruby-1.9.3-p484/gems/rspec-expectations-2.14.5/lib/rspec/expectations/syntax.rb:53:in `should' /root/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/pool_resize_tests.rb:91:in `block in resize_io_many' /root/device-mapper-test-suite/lib/dmtest/pool-stack.rb:33:in `call' /root/device-mapper-test-suite/lib/dmtest/pool-stack.rb:33:in `block in activate' /root/device-mapper-test-suite/lib/dmtest/prelude.rb:6:in `bracket' /root/device-mapper-test-suite/lib/dmtest/device-mapper/lexical_operators.rb:12:in `with_dev' /root/device-mapper-test-suite/lib/dmtest/pool-stack.rb:31:in `activate' /root/device-mapper-test-suite/lib/dmtest/thinp-mixin.rb:125:in `with_standard_pool' /root/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/pool_resize_tests.rb:61:in `resize_io_many' /root/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/pool_resize_tests.rb:99:in `test_resize_io']: expected: false value got: true 1 tests, 0 assertions, 1 failures, 0 errors [root@hp-dl385pg8-03 device-mapper-test-suite]# lvs tsvg LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lvol0 tsvg -wi------- 8.00m mythinpool tsvg twi-a-tz-- 4.88g 20.49 thinlv1 tsvg Vwi-a-tz-- 1.00g mythinpool 0.07 thinlv2 tsvg Vwi-a-tz-- 4.00g mythinpool 25.00 # cat ~/.dmtest/config profile :spindle do metadata_dev '/dev/tsvg/thinlv1' data_dev '/dev/tsvg/thinlv2' end (In reply to yanfu,wang from comment #11) > (In reply to Mike Snitzer from comment #7) > > (In reply to Mike Snitzer from comment #6) > > > > > Not sure why lvextend is doing the second suspend/resume (probably some bug > > > in libdm-deptree?) but regardless.. the table isn't changing so it certainly > > > shouldn't be causing the pool to transition to read-only mode. > > > > Using the device-mapper-test-suite -- which doesn't use lvm -- I've been > > able to confirm that a suspend+resume of a read-write pool, at the end of > > our "resize_io" test, will cause it to transition to read-only. > > Hi Mike, > I'm reproducing the issue using device-mapper-test-suite, below is my test > result. Could you confirm if the failure is the expected and how do you > confirm that a suspend+resume of a read-write pool like your said above? > # dmtest run --suite thin-provisioning -n resize_io --profile spindle > Loaded suite thin-provisioning > Started > test_resize_io(PoolResizeTests): F > > Finished in 7.896977168 seconds. > > 1) Failure: <snip> > /root/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/ > pool_resize_tests.rb:99:in `test_resize_io']: > expected: false value > got: true The failed assertion is: status.options[:read_only].should be_false Meaning, the pool is in read-only mode after the suspend+resume. A kernel with the fix wouldn't transition the pool to read-only. FYI, the 'devel' branch that I referenced in comment#10 doesn't exist any more -- it was replaced by the 'dm-3.14-fixes' branch of the snitzer/linux.git repo. (In reply to Mike Snitzer from comment #12) > (In reply to yanfu,wang from comment #11) > > (In reply to Mike Snitzer from comment #7) > > > (In reply to Mike Snitzer from comment #6) > > > > > > > Not sure why lvextend is doing the second suspend/resume (probably some bug > > > > in libdm-deptree?) but regardless.. the table isn't changing so it certainly > > > > shouldn't be causing the pool to transition to read-only mode. > > > > > > Using the device-mapper-test-suite -- which doesn't use lvm -- I've been > > > able to confirm that a suspend+resume of a read-write pool, at the end of > > > our "resize_io" test, will cause it to transition to read-only. > > > > Hi Mike, > > I'm reproducing the issue using device-mapper-test-suite, below is my test > > result. Could you confirm if the failure is the expected and how do you > > confirm that a suspend+resume of a read-write pool like your said above? > > # dmtest run --suite thin-provisioning -n resize_io --profile spindle > > Loaded suite thin-provisioning > > Started > > test_resize_io(PoolResizeTests): F > > > > Finished in 7.896977168 seconds. > > > > 1) Failure: > > <snip> > > > /root/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/ > > pool_resize_tests.rb:99:in `test_resize_io']: > > expected: false value > > got: true > > The failed assertion is: > status.options[:read_only].should be_false Hi Mike, Seems I'm running in unexpected failure, could you help to look at where I'm wrong? I'm using https://github.com/jthornber/device-mapper-test-suite.git and test on kernel 3.10.0-75.el7.x86_64 to reproduce. My setup shown as below: # modprobe scsi-debug dev_size_mb=6000 lbpu=1 lbpws10=1 # pvcreate /dev/sdb # vgcreate tsvg /dev/sdb # lvcreate -L 5000M -T tsvg/mythinpool # lvcreate -V1G -T tsvg/mythinpool -n thinlv1 # lvcreate -V4G -T tsvg/mythinpool -n thinlv2 Edit ~/.dmtest/config: profile :spindle do metadata_dev '/dev/tsvg/thinlv1' data_dev '/dev/tsvg/thinlv2' end run 'dmtest run --suite thin-provisioning -n resize_io --profile spindle' and get the failure like comment #11. How to get the expected failure: status.options[:read_only].should be_false? Thanks in advance. > > Meaning, the pool is in read-only mode after the suspend+resume. A kernel > with the fix wouldn't transition the pool to read-only. > > FYI, the 'devel' branch that I referenced in comment#10 doesn't exist any > more -- it was replaced by the 'dm-3.14-fixes' branch of the > snitzer/linux.git repo. (In reply to yanfu,wang from comment #13) > run 'dmtest run --suite thin-provisioning -n resize_io --profile spindle' > and get the failure like comment #11. > > How to get the expected failure: status.options[:read_only].should be_false? That is _not_ the expected failure. 'status.options[:read_only].should be_false' is the ruby code that causes the failed assertion error. The "expected: false value" error you're seeing is what I'd expect from a kernel that isn't fixed. Do you need any additional info from me? The "resize_io" test is one of the tests we have in the device-mapper-test-suite that validates resising the data volume works as expected. We also have new tests that were developed to exercise new aspects of the kernel fixes that have been developed (these fixes will be posted to rhkernel-list for RHEL7 Snap11 kernel inclusion). (In reply to Mike Snitzer from comment #15) > Do you need any additional info from me? The "resize_io" test is one of the > tests we have in the device-mapper-test-suite that validates resising the > data volume works as expected. > > We also have new tests that were developed to exercise new aspects of the > kernel fixes that have been developed (these fixes will be posted to > rhkernel-list for RHEL7 Snap11 kernel inclusion). Thanks Mike's reply, I will try testing again as per above comments. set qa_ack+ Patch(es) available on kernel-3.10.0-108.el7 Patch(es) available on kernel-3.10.0-109.el7, not 108 as previously stated Due to issues in 309, please wait to test these patches in kernel-3.10.0-110.el7 or later This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |