Bug 1970719 - lvmcache_label_scan() is not protected by a lock, an exception may occur when reading metadata, causing command execution failure
Summary: lvmcache_label_scan() is not protected by a lock, an exception may occur when...
Keywords:
Status: POST
Alias: None
Product: LVM and device-mapper
Classification: Community
Component: lvm2
Version: unspecified
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: David Teigland
QA Contact: cluster-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-11 01:37 UTC by Wu Guanghao
Modified: 2024-04-27 05:52 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
pm-rhel: lvm-technical-solution?
pm-rhel: lvm-test-coverage?


Attachments (Terms of Use)

Description Wu Guanghao 2021-06-11 01:37:18 UTC
Description of problem:

If the execution of lvmcache_label_scan() fails, which will cause the vgid to be unable to obtain normally, then the re-reading will not be triggered in vg_read(), causing the execution of this command to fail .

Version-Release number of selected component (if applicable):


How reproducible:

Execute this script, it can be reproduced within two hours under normal conditions

test.sh
    #!/bin/bash
    yum install -y libcgroup
    modprobe brd rd_nr=2 rd_size=1572864 max_part=0
    vgname=brd_vg
    pvcreate /dev/ram0 /dev/ram1
    vgcreate $vgname /dev/ram0 /dev/ram1
    lvcreate -y --wipesignatures y -n thinpool $vgname -l 90%VG
    lvcreate -y --wipesignatures y -n thinpoolmeta $vgname -l 1%VG
    lvconvert -y --zero n -c 512K --thinpool $vgname/thinpool --poolmetadata $vgname/thinpoolmeta
    basedir=/sys/fs/cgroup/blkio/brd_test
    mkdir $basedir
    
    function device_test()
    {
            while true
            do
                    i=0
                    while [ $i -le 500 ]
                    do
                            lvcreate -V1M -T $vgname/thinpool -n lv$i &
                            ((i=i+1))
                    done
                    wait
                    sleep 1
    
                    i=0
                    while [ $i -le 500 ]
                    do
                            test -e /dev/disk/by-id/dm-name-$vgname-lv$i || exit 1
                            test -e /dev/mapper/$vgname-lv$i || exit 1
                            test -e /dev/$vgname/lv$i || exit 1
                            ((i=i+1))
                    done
    
                    i=0
                    while [ $i -le 500 ]
                    do
                            lvremove -f $vgname/lv$i &
                            ((i=i+1))
                    done
                    wait
                    sleep 1
    
                    i=0
                    while [ $i -le 500 ]
                    do
                            test -e /dev/disk/by-id/dm-name-$vgname-lv$i && exit 1
                            test -e /dev/mapper/$vgname-lv$i && exit 1
                            test -e /dev/$vgname/lv$i && exit 1
                            ((i=i+1))
                    done
            done
    }
    
    device_test

Steps to Reproduce:
1.
2.
3.

Actual results:

2.03.12 Error log:
143682:Jun 10 16:40:38 lvm[708341]: /dev/ram0: Checksum error at offset 203776
143683:Jun 10 16:40:38 lvm[708341]: WARNING: invalid metadata text from /dev/ram0 at 203776.
143684:Jun 10 16:40:38 lvm[708341]: WARNING: metadata on /dev/ram0 at 203776 has invalid summary for VG.
143685:Jun 10 16:40:38 lvm[708341]: WARNING: bad metadata text on /dev/ram0 in mda1
143686:Jun 10 16:40:38 lvm[708341]: WARNING: scanning /dev/ram0 mda1 failed to read metadata summary.
143687:Jun 10 16:40:38 lvm[708341]: WARNING: repair VG metadata on /dev/ram0 with vgck --updatemetadata.
143688:Jun 10 16:40:38 lvm[708341]: WARNING: scan failed to get metadata summary from /dev/ram0 PVID elQQtAsePg5vfZA2DuMV4exqYhvvRdrq
143689:Jun 10 16:40:38 lvm[708341]: /dev/ram1: Checksum error at offset 508416
143690:Jun 10 16:40:38 lvm[708341]: WARNING: invalid metadata text from /dev/ram1 at 508416.
143691:Jun 10 16:40:38 lvm[708341]: WARNING: metadata on /dev/ram1 at 508416 has invalid summary for VG.
143692:Jun 10 16:40:38 lvm[708341]: WARNING: bad metadata text on /dev/ram1 in mda1
143693:Jun 10 16:40:38 lvm[708341]: WARNING: scanning /dev/ram1 mda1 failed to read metadata summary.
143694:Jun 10 16:40:38 lvm[708341]: WARNING: repair VG metadata on /dev/ram1 with vgck --updatemetadata.
143695:Jun 10 16:40:38 lvm[708341]: WARNING: scan failed to get metadata summary from /dev/ram1 PVID mLoVSnWPN1IqsHh3eXMh5QCmWyHsBwDi
143901:Jun 10 16:41:15 lvm[708341]: Volume group "brd_vg" not found
143902:Jun 10 16:41:15 lvm[708341]: Cannot process volume group brd_vg


Expected results:

The script executes normally without exiting abnormally

Additional info:

Comment 1 Zdenek Kabelac 2021-06-11 11:38:35 UTC
So your test case seems to be this - you've queue hundreds of lvm commands and occasionally it happens the one 'scanning command'  is reading  ring-buffer while several other sequentially locked commands manage to overwrite whole ringbuffer which is ~1MiB by default and such metadata for 500LV takes around 200KiB - so there are just 5 commands to reach ring around - so depending on kernel scheduler skill it may happen we will obtain already 'invalid' metadata in scanning phase.

Could you try to use much bigger metadata area in your case (i.e. 10MiB) - that should further eliminate this race chance ?

We need to think should be the best approach in this case...

Comment 4 David Teigland 2021-06-14 21:34:48 UTC
There are two commits to address this issue on this devel branch:
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-scan-errors-1

(Further changes may appear on branches dev-dct-scan-errors-<N>)

The first commit is a locking optimization that will probably avoid many of the issues described above, but only applies to commands that use a specific VG.  This commit will need some further work if we decide that it's useful enough to keep.

The second commit applies to all commands and has them begin using the global lock prior to scanning when the metadata becomes large enough a percent of the metadata area.  This should reduce the chance of problems even further, but the percentage may need to be adjusted (currently the extra locking begins when metadata uses 25% of the metadata area.)

I'd like to know if either or both of these commits helps the test case.  Thanks

Comment 5 David Teigland 2021-06-17 19:11:53 UTC
Studying the logs from reproducing this myself, I'm not seeing metadata wrap when the problem appears, so I'm not sure we've identified a root cause yet (or perhaps there's another.)

In the test logs:

lvremove1 writes seqno 7083 to ram1 at offset 226304 size 159774 (ondisk 230400 size 160256)
lvremove2 writes seqno 7084 to ram1 at offset 386560 size 159751 (ondisk 390656 size 160256)
lvremove3 writes seqno 7085 to ram1 at offset 546816 size 159454 (ondisk 550912 size 159744)

lvremove-BAD reads seqno 7084 from ram0 as was written by lvremove2, but lvremove-BAD gets a checksum error reading ram1, where we expect it to also see seqno 7084.
The debugging from lvremove-BAD shows that it's reading metadata from ram1 at the same location that lvremove2 wrote to, but apparently not getting the same metadata.

When writing vg metadata, lvm first writes the new metadata (7084), then it writes the mda_header with the new metadata location (386560) in the header precommit field, and then it writes the mda_header again with the new metadata location (386560) in the header commit field.  So, if lvmremove-BAD sees the same new metadata location (386560) in the mda_header commit field, then that metadata should be seen on disk.  I will next extend debugging to the bcache layer to see if it's performing disk io as expected.

lvremove1:
17:09:58.976457 lvremove[1694867] format_text/format-text.c:693  VG wvg seqno 7083 metadata write to /dev/ram0 mda_start 4096 mda_size 1044480 mda_last 1048575
17:09:58.976464 lvremove[1694867] format_text/format-text.c:435  VG wvg 7083 new metadata start align from 654149 to 654336 (+187)
17:09:58.976470 lvremove[1694867] format_text/format-text.c:809  VG wvg 7083 metadata area location old start 494080 last 654148 size 160069 wrap 0
17:09:58.976475 lvremove[1694867] format_text/format-text.c:816  VG wvg 7083 metadata area location new start 654336 last 814109 size 159774 wrap 0
17:09:58.976481 lvremove[1694867] format_text/format-text.c:901  VG wvg 7083 metadata disk location start 658432 size 159774 last 818205
17:09:58.976486 lvremove[1694867] format_text/format-text.c:939  VG wvg 7083 metadata last align from 818205 to 818687 (+482)
17:09:58.976494 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:58.976500 lvremove[1694867] format_text/format-text.c:988  VG wvg 7083 metadata write at 658432 size 160256 (wrap 0)
17:09:58.977804 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:58.977822 lvremove[1694867] format_text/format-text.c:693  VG wvg seqno 7083 metadata write to /dev/ram1 mda_start 4096 mda_size 1044480 mda_last 1048575
17:09:58.977829 lvremove[1694867] format_text/format-text.c:435  VG wvg 7083 new metadata start align from 226117 to 226304 (+187)
17:09:58.977835 lvremove[1694867] format_text/format-text.c:809  VG wvg 7083 metadata area location old start 66048 last 226116 size 160069 wrap 0
17:09:58.977840 lvremove[1694867] format_text/format-text.c:816  VG wvg 7083 metadata area location new start 226304 last 386077 size 159774 wrap 0
17:09:58.977849 lvremove[1694867] format_text/format-text.c:901  VG wvg 7083 metadata disk location start 230400 size 159774 last 390173
17:09:58.977853 lvremove[1694867] format_text/format-text.c:939  VG wvg 7083 metadata last align from 390173 to 390655 (+482)
17:09:58.977860 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:58.977864 lvremove[1694867] format_text/format-text.c:988  VG wvg 7083 metadata write at 230400 size 160256 (wrap 0)
17:09:58.979136 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram0 at 4096
17:09:58.979152 lvremove[1694867] format_text/format-text.c:1148  VG wvg metadata precommit seq 7083 on /dev/ram0 mda header at 4096 (used)
17:09:58.979157 lvremove[1694867] format_text/format-text.c:1155  VG wvg metadata precommit slot0 offset 494080 size 160069 slot1 offset 654336 size 159774
17:09:58.979168 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:58.979174 lvremove[1694867] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:58.979187 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:58.979199 lvremove[1694867] format_text/format-text.c:1148  VG wvg metadata precommit seq 7083 on /dev/ram1 mda header at 4096 (used)
17:09:58.979205 lvremove[1694867] format_text/format-text.c:1155  VG wvg metadata precommit slot0 offset 66048 size 160069 slot1 offset 226304 size 159774
17:09:58.979214 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:58.979220 lvremove[1694867] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:58.979235 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram0 at 4096
17:09:58.979244 lvremove[1694867] format_text/format-text.c:1161  VG wvg metadata commit seq 7083 on /dev/ram0 mda header at 4096 (used)
17:09:58.979249 lvremove[1694867] format_text/format-text.c:1168  VG wvg metadata commit slot0 offset 654336 size 159774 slot1 offset 0 size 0
17:09:58.979257 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:58.979264 lvremove[1694867] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:58.979279 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:58.979290 lvremove[1694867] format_text/format-text.c:1161  VG wvg metadata commit seq 7083 on /dev/ram1 mda header at 4096 (used)
17:09:58.979295 lvremove[1694867] format_text/format-text.c:1168  VG wvg metadata commit slot0 offset 226304 size 159774 slot1 offset 0 size 0
17:09:58.979305 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:58.979310 lvremove[1694867] device/bcache.c:239  Limit write at 0 len 131072 to len 4608

lvremove2:
17:09:59.111749 lvremove[1694867] format_text/format-text.c:693  VG wvg seqno 7084 metadata write to /dev/ram0 mda_start 4096 mda_size 1044480 mda_last 1048575
17:09:59.111757 lvremove[1694867] format_text/format-text.c:435  VG wvg 7084 new metadata start align from 814110 to 814592 (+482)
17:09:59.111762 lvremove[1694867] format_text/format-text.c:809  VG wvg 7084 metadata area location old start 654336 last 814109 size 159774 wrap 0
17:09:59.111769 lvremove[1694867] format_text/format-text.c:816  VG wvg 7084 metadata area location new start 814592 last 974342 size 159751 wrap 0
17:09:59.111773 lvremove[1694867] format_text/format-text.c:901  VG wvg 7084 metadata disk location start 818688 size 159751 last 978438
17:09:59.111779 lvremove[1694867] format_text/format-text.c:939  VG wvg 7084 metadata last align from 978438 to 978943 (+505)
17:09:59.111784 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.111790 lvremove[1694867] format_text/format-text.c:988  VG wvg 7084 metadata write at 818688 size 160256 (wrap 0)
17:09:59.113099 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:59.113119 lvremove[1694867] format_text/format-text.c:693  VG wvg seqno 7084 metadata write to /dev/ram1 mda_start 4096 mda_size 1044480 mda_last 1048575
17:09:59.113128 lvremove[1694867] format_text/format-text.c:435  VG wvg 7084 new metadata start align from 386078 to 386560 (+482)
17:09:59.113133 lvremove[1694867] format_text/format-text.c:809  VG wvg 7084 metadata area location old start 226304 last 386077 size 159774 wrap 0
17:09:59.113138 lvremove[1694867] format_text/format-text.c:816  VG wvg 7084 metadata area location new start 386560 last 546310 size 159751 wrap 0
17:09:59.113145 lvremove[1694867] format_text/format-text.c:901  VG wvg 7084 metadata disk location start 390656 size 159751 last 550406
17:09:59.113150 lvremove[1694867] format_text/format-text.c:939  VG wvg 7084 metadata last align from 550406 to 550911 (+505)
17:09:59.113156 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.113160 lvremove[1694867] format_text/format-text.c:988  VG wvg 7084 metadata write at 390656 size 160256 (wrap 0)
17:09:59.114502 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram0 at 4096
17:09:59.114517 lvremove[1694867] format_text/format-text.c:1148  VG wvg metadata precommit seq 7084 on /dev/ram0 mda header at 4096 (used)
17:09:59.114522 lvremove[1694867] format_text/format-text.c:1155  VG wvg metadata precommit slot0 offset 654336 size 159774 slot1 offset 814592 size 159751
17:09:59.114532 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.114538 lvremove[1694867] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:59.114553 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:59.114563 lvremove[1694867] format_text/format-text.c:1148  VG wvg metadata precommit seq 7084 on /dev/ram1 mda header at 4096 (used)
17:09:59.114570 lvremove[1694867] format_text/format-text.c:1155  VG wvg metadata precommit slot0 offset 226304 size 159774 slot1 offset 386560 size 159751
17:09:59.114578 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.114585 lvremove[1694867] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:59.114599 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram0 at 4096
17:09:59.114609 lvremove[1694867] format_text/format-text.c:1161  VG wvg metadata commit seq 7084 on /dev/ram0 mda header at 4096 (used)
17:09:59.114617 lvremove[1694867] format_text/format-text.c:1168  VG wvg metadata commit slot0 offset 814592 size 159751 slot1 offset 0 size 0
17:09:59.114626 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.114633 lvremove[1694867] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:59.114647 lvremove[1694867] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:59.114657 lvremove[1694867] format_text/format-text.c:1161  VG wvg metadata commit seq 7084 on /dev/ram1 mda header at 4096 (used)
17:09:59.114662 lvremove[1694867] format_text/format-text.c:1168  VG wvg metadata commit slot0 offset 386560 size 159751 slot1 offset 0 size 0
17:09:59.114672 lvremove[1694867] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.114677 lvremove[1694867] device/bcache.c:239  Limit write at 0 len 131072 to len 4608

lvremove3:
17:09:59.377090 lvremove[1694829] format_text/format-text.c:693  VG wvg seqno 7085 metadata write to /dev/ram0 mda_start 4096 mda_size 1044480 mda_last 1048575
17:09:59.377103 lvremove[1694829] format_text/format-text.c:435  VG wvg 7085 new metadata start align from 974343 to 974848 (+505)
17:09:59.377113 lvremove[1694829] format_text/format-text.c:790  VG wvg 7085 wrapping metadata new_start 974848 new_size 159454 to size1 69632 size2 89822
17:09:59.377120 lvremove[1694829] format_text/format-text.c:809  VG wvg 7085 metadata area location old start 814592 last 974342 size 159751 wrap 0
17:09:59.377126 lvremove[1694829] format_text/format-text.c:816  VG wvg 7085 metadata area location new start 974848 last 90333 size 159454 wrap 89822
17:09:59.377132 lvremove[1694829] format_text/format-text.c:910  VG wvg 7085 metadata disk location write1 start 978944 size 69632 last 1048575 write2 start 4608 size 89822 last
17:09:59.377138 lvremove[1694829] format_text/format-text.c:959  VG wvg 7085 metadata last align from 94429 to 94719 (+290) (wrapped)
17:09:59.377149 lvremove[1694829] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.377154 lvremove[1694829] format_text/format-text.c:988  VG wvg 7085 metadata write at 978944 size 69632 (wrap 90112)
17:09:59.377250 lvremove[1694829] format_text/format-text.c:999  VG wvg 7085 metadata write at 4608 size 90112 (wrapped)
17:09:59.378452 lvremove[1694829] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:59.378470 lvremove[1694829] format_text/format-text.c:693  VG wvg seqno 7085 metadata write to /dev/ram1 mda_start 4096 mda_size 1044480 mda_last 1048575
17:09:59.378477 lvremove[1694829] format_text/format-text.c:435  VG wvg 7085 new metadata start align from 546311 to 546816 (+505)
17:09:59.378482 lvremove[1694829] format_text/format-text.c:809  VG wvg 7085 metadata area location old start 386560 last 546310 size 159751 wrap 0
17:09:59.378488 lvremove[1694829] format_text/format-text.c:816  VG wvg 7085 metadata area location new start 546816 last 706269 size 159454 wrap 0
17:09:59.378493 lvremove[1694829] format_text/format-text.c:901  VG wvg 7085 metadata disk location start 550912 size 159454 last 710365
17:09:59.378499 lvremove[1694829] format_text/format-text.c:939  VG wvg 7085 metadata last align from 710365 to 710655 (+290)
17:09:59.378505 lvremove[1694829] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.378509 lvremove[1694829] format_text/format-text.c:988  VG wvg 7085 metadata write at 550912 size 159744 (wrap 0)
17:09:59.379775 lvremove[1694829] format_text/format-text.c:196  Reading mda header sector from /dev/ram0 at 4096
17:09:59.379790 lvremove[1694829] format_text/format-text.c:1148  VG wvg metadata precommit seq 7085 on /dev/ram0 mda header at 4096 (used)
17:09:59.379796 lvremove[1694829] format_text/format-text.c:1155  VG wvg metadata precommit slot0 offset 814592 size 159751 slot1 offset 974848 size 159454
17:09:59.379806 lvremove[1694829] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.379813 lvremove[1694829] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:59.379828 lvremove[1694829] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:59.379840 lvremove[1694829] format_text/format-text.c:1148  VG wvg metadata precommit seq 7085 on /dev/ram1 mda header at 4096 (used)
17:09:59.379845 lvremove[1694829] format_text/format-text.c:1155  VG wvg metadata precommit slot0 offset 386560 size 159751 slot1 offset 546816 size 159454
17:09:59.379854 lvremove[1694829] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.379863 lvremove[1694829] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:59.379876 lvremove[1694829] format_text/format-text.c:196  Reading mda header sector from /dev/ram0 at 4096
17:09:59.379885 lvremove[1694829] format_text/format-text.c:1161  VG wvg metadata commit seq 7085 on /dev/ram0 mda header at 4096 (used)
17:09:59.379890 lvremove[1694829] format_text/format-text.c:1168  VG wvg metadata commit slot0 offset 974848 size 159454 slot1 offset 0 size 0
17:09:59.379898 lvremove[1694829] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.379908 lvremove[1694829] device/bcache.c:239  Limit write at 0 len 131072 to len 4608
17:09:59.379923 lvremove[1694829] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:09:59.379933 lvremove[1694829] format_text/format-text.c:1161  VG wvg metadata commit seq 7085 on /dev/ram1 mda header at 4096 (used)
17:09:59.379939 lvremove[1694829] format_text/format-text.c:1168  VG wvg metadata commit slot0 offset 546816 size 159454 slot1 offset 0 size 0
17:09:59.379948 lvremove[1694829] label/label.c:1781  Set last byte mixed block sizes physical 4096 logical 512 using 512
17:09:59.379960 lvremove[1694829] device/bcache.c:239  Limit write at 0 len 131072 to len 4608


lvremove-BAD:
17:09:59.327308 lvremove[1694881] format_text/text_label.c:497  Scanning /dev/ram0 mda1 summary.
17:09:59.327313 lvremove[1694881] format_text/format-text.c:196  Reading mda header sector from /dev/ram0 at 4096
17:09:59.327514 lvremove[1694881] format_text/import.c:59  Reading metadata summary from /dev/ram0 at 818688 size 159751 (+0)
17:09:59.334607 lvremove[1694881] format_text/format-text.c:1677  Found metadata summary on /dev/ram0 at 818688 size 159751 for VG wvg
17:09:59.334632 lvremove[1694881] cache/lvmcache.c:1553  lvmcache adding vginfo for wvg jKQbBM-2rXQ-OC7l-WMcl-xYNh-mb38-g0Ui3I
17:09:59.334639 lvremove[1694881] cache/lvmcache.c:1634  lvmcache /dev/ram0: now in VG wvg jKQbBM-2rXQ-OC7l-WMcl-xYNh-mb38-g0Ui3I
17:09:59.334645 lvremove[1694881] cache/lvmcache.c:1486  lvmcache /dev/ram0: VG wvg: set VGID to jKQbBM2rXQOC7lWMclxYNhmb38g0Ui3I.
17:09:59.334652 lvremove[1694881] cache/lvmcache.c:1832  lvmcache /dev/ram0 mda1 VG wvg set seqno 7084 checksum 8e9aca9d mda_size 159751
17:09:59.334658 lvremove[1694881] cache/lvmcache.c:1669  lvmcache /dev/ram0: VG wvg: set creation host to null-04.lab.msp.redhat.com.
17:09:59.334665 lvremove[1694881] cache/lvmcache.c:1705  lvmcache /dev/ram0: VG wvg: set system_id to null-04.lab.msp.redhat.com.
17:09:59.334670 lvremove[1694881] format_text/text_label.c:525  Scanned /dev/ram0 mda1 seqno 7084
17:09:59.334677 lvremove[1694881] label/label.c:746  Processing data from device /dev/sda 8:0 di 1 block 0x55adbfccde70
17:09:59.334685 lvremove[1694881] device/dev-io.c:94  /dev/sda: using cached size 976773168 sectors
17:10:00.581435 lvremove[1694881] filters/filter-partitioned.c:44  /dev/sda: Skipping: Partition table signature found
17:10:00.581449 lvremove[1694881] filters/filter-persistent.c:140  filter caching bad /dev/sda
17:10:00.581459 lvremove[1694881] label/label.c:394  <backtrace>
17:10:00.581479 lvremove[1694881] label/label.c:746  Processing data from device /dev/ram1 1:1 di 2 block 0x55adbfccdeb0
17:10:00.581489 lvremove[1694881] device/dev-io.c:94  /dev/ram1: using cached size 3145728 sectors
17:10:00.581501 lvremove[1694881] device/dev-io.c:94  /dev/ram1: using cached size 3145728 sectors
17:10:00.581510 lvremove[1694881] filters/filter-persistent.c:140  filter caching good /dev/ram1
17:10:00.581520 lvremove[1694881] label/label.c:324  /dev/ram1: lvm2 label detected at sector 1
17:10:00.581536 lvremove[1694881] cache/lvmcache.c:1634  lvmcache /dev/ram1: now in VG #orphans_lvm2 #orpha-ns_l-vm2
17:10:00.581544 lvremove[1694881] format_text/text_label.c:464  /dev/ram1: PV header extension version 2 found
17:10:00.581557 lvremove[1694881] format_text/text_label.c:497  Scanning /dev/ram1 mda1 summary.
17:10:00.581563 lvremove[1694881] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
17:10:00.581713 lvremove[1694881] format_text/import.c:59  Reading metadata summary from /dev/ram1 at 390656 size 159751 (+0)
17:10:00.583524 lvremove[1694881] config/config.c:560  /dev/ram1: Checksum error at offset 390656
17:10:00.583562 lvremove[1694881] format_text/import.c:66  WARNING: invalid metadata text from /dev/ram1 at 390656.
17:10:00.583578 lvremove[1694881] format_text/format-text.c:1661  WARNING: metadata on /dev/ram1 at 390656 has invalid summary for VG.
17:10:00.583591 lvremove[1694881] format_text/text_label.c:358  WARNING: bad metadata text on /dev/ram1 in mda1
17:10:00.583604 lvremove[1694881] format_text/text_label.c:536  WARNING: scanning /dev/ram1 mda1 failed to read metadata summary.


lvremove2 wrote:
ram0: VG wvg 7084 metadata area location new start 814592 last 974342 size 159751
ram0: VG wvg 7084 metadata disk location start 818688 size 159751 last 978438

ram1: VG wvg 7084 metadata area location new start 386560 last 546310 size 159751
ram1: VG wvg 7084 metadata disk location start 390656 size 159751 last 550406


lvremove-BAD reads:
ram0: Reading metadata summary from /dev/ram0 at 818688 size 159751 -> gets seqno 7084 that lvremove2 wrote.
ram1: Reading metadata summary from /dev/ram1 at 390656 size 159751 -> gets checksum error

Comment 6 David Teigland 2021-06-17 23:17:22 UTC
Adding bcache io logging is showing something that doesn't look right in bcache, but I've not yet confirmed.

_raw_read_mda_header() does:
  
        log_debug_metadata("Reading mda header sector from %s at %llu",
                           dev_name(dev_area->dev), (unsigned long long)dev_area->start);

        if (!dev_read_bytes(dev_area->dev, dev_area->start, MDA_HEADER_SIZE, mdah)) {
        }

The logging is showing:

format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
device/bcache.c:280  io R off 131072 bytes 131072 di 2 fd 7

Which indicates that reading the mda header at offset 4096 causes bcache to do a read at offset 131072K.  bcache works in 128K blocks, so it should be reading the first block at offset 0 (which includes the mda header at 4K), and not the second block at offset 128K.

Logging appears to show this incorrect read offset occuring very frequently, but it doesn't usually lead to errors.

The next step will require digging into the bcache code to see if there's really a bug here, or if this is a misreading (it's hard to believe such a basic flaw would not yet have been detected.)

Comment 7 Wu Guanghao 2021-06-18 07:26:09 UTC
(In reply to David Teigland from comment #6)
> Adding bcache io logging is showing something that doesn't look right in
> bcache, but I've not yet confirmed.
> 
> _raw_read_mda_header() does:
>   
>         log_debug_metadata("Reading mda header sector from %s at %llu",
>                            dev_name(dev_area->dev), (unsigned long
> long)dev_area->start);
> 
>         if (!dev_read_bytes(dev_area->dev, dev_area->start, MDA_HEADER_SIZE,
> mdah)) {
>         }
> 
> The logging is showing:
> 
> format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at
> 4096
> device/bcache.c:280  io R off 131072 bytes 131072 di 2 fd 7
> 
> Which indicates that reading the mda header at offset 4096 causes bcache to
> do a read at offset 131072K.  bcache works in 128K blocks, so it should be
> reading the first block at offset 0 (which includes the mda header at 4K),
> and not the second block at offset 128K.
> 
> Logging appears to show this incorrect read offset occuring very frequently,
> but it doesn't usually lead to errors.
> 
> The next step will require digging into the bcache code to see if there's
> really a bug here, or if this is a misreading (it's hard to believe such a
> basic flaw would not yet have been detected.)

Sorry, I didn't reply to you until now. https://sourceware.org/git/?p=lvm2.git;a=commit;h=31e2c4c1a6e0ddb7ee8a34fe96505b3135d51aa6 I use this patch, and adjusting the metadata percentage to 0, the problem was solved。

Comment 8 David Teigland 2021-06-18 14:10:33 UTC
> Sorry, I didn't reply to you until now.
> https://sourceware.org/git/?p=lvm2.git;a=commit;
> h=31e2c4c1a6e0ddb7ee8a34fe96505b3135d51aa6 I use this patch, and adjusting
> the metadata percentage to 0, the problem was solved。

Thanks, we might not need that patch if we find that the real bug is something else, but not yet certain.

Comment 9 David Teigland 2021-06-21 17:05:32 UTC
> format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
> device/bcache.c:280  io R off 131072 bytes 131072 di 2 fd 7
> 
> Which indicates that reading the mda header at offset 4096 causes bcache to
> do a read at offset 131072K.

This was confusion about the logging output.

Comment 10 David Teigland 2021-06-24 22:07:08 UTC
I've added enough debugging to observe the problem that Zdenek mentioned above.  In label_scan, the command first reads the headers from each PV (label_header, pv_header, mda_header), and then reads the metadata text from the metadata area on each of those PVs.  The mda_header from the first read gives the location of the metadata text.  In the time between reading the mda_header and reading the referenced metadata text, other commands completely rewrite the metadata area with new versions of metadata.  So, the metadata text referenced by the mda_header no longer exists, and the command fails (invalid vg name or checksum error.)

With metadata seqno added to the mda_header for debugging, we see that the mda_header is referencing metadata seqno 50598:

15:48:40.412193 lvremove[2094584] label/label.c:697  Scanning 11 devices for VG info
15:48:40.412212 lvremove[2094584] label/label.c:594  open /dev/ram0 ro di 0 fd 5
15:48:40.412222 lvremove[2094584] device/bcache.c:284  io R off 0 bytes 131072 di 0 fd 5 unset start 0
15:48:40.412502 lvremove[2094584] label/label.c:594  open /dev/ram1 ro di 2 fd 7
15:48:40.412508 lvremove[2094584] device/bcache.c:284  io R off 0 bytes 131072 di 2 fd 7 unset start 0
...
15:48:41.587301 lvremove[2094584] label/label.c:749  Processing data from device /dev/ram1 1:1 di 2 block 0x5562fb227eb0
15:48:41.587337 lvremove[2094584] label/label.c:327  /dev/ram1: lvm2 label detected at sector 1
15:48:41.587359 lvremove[2094584] format_text/text_label.c:497  Scanning /dev/ram1 mda1 summary.
15:48:41.587367 lvremove[2094584] format_text/format-text.c:196  Reading mda header sector from /dev/ram1 at 4096
15:48:41.587378 lvremove[2094584] format_text/format-text.c:247  Read mda_header from /dev/ram1 seqno 50598
15:48:41.588065 lvremove[2094584] format_text/import.c:59  Reading metadata summary from /dev/ram1 at 337408 size 159771 (+0)
15:48:41.588087 lvremove[2094584] device/bcache.c:284  io R off 262144 bytes 131072 di 2 fd 7 metadata_config start 337408
15:48:41.588162 lvremove[2094584] device/bcache.c:284  io R off 393216 bytes 131072 di 2 fd 7 unset start 0
15:48:41.589406 lvremove[2094584] config/config.c:588  /dev/ram1: Checksum error at offset 337408 size 159771 size2 0
15:48:41.589663 lvremove[2094584] format_text/import.c:66  WARNING: invalid metadata text from /dev/ram1 at 337408.
15:48:41.589678 lvremove[2094584] format_text/format-text.c:1745  WARNING: metadata on /dev/ram1 at 337408 has invalid summary for VG.

Note that label_scan reading headers from ram1 occurs at 15:48:40.412502, then mda_header from that read is used to read metadata text at 15:48:41.588065.  This is about a one second gap, which provides quite a bit of time for other commands to make changes.  (This command reads and processes metadata text from another slower device, sda, before attempting to read metadata from ram1, which is largely the cause of the time gap.)


From a history of metadata wrapping we see that seqno 50602 wrapped:

pid 2094368 seqno 50595 old_start 735232 new_start 896000 new_size1 148480 new_size2 11909
pid 2094654 seqno 50602 old_start 813568 new_start 973312 new_size1 71168 new_size2 87961

From dumping the unexpected metadata text that the command read, we see that the referenced metadata location actually contains metadata for seqno's 50604 and 50605 (not 50598).  From the metadata size and wrapping log, we know that the metadata wraps every 6-7 metadata seqno's, and the difference between 50598 and 50694 is 6.

I think our initial recommendation for this problem should be to use a larger metadata area when using large metadata and a large number of concurrent commands changing metadata.  As for future improvements to address this, a couple options would be:  expanding locking to cover label_scan (perhaps optional since most cases don't require it), or adding error handling in which we would reread the mda_header+metadata_area all together if a checksum error is seen during scan.

Comment 11 David Teigland 2021-06-28 23:18:59 UTC
here are two commits that should also fix the problem more simply by retrying (the same errors are printed when the problem occurs, but after retrying the command should succeeed.)
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-scan-retry-1


Note You need to log in before you can comment on or make changes to this bug.