Description of problem: If the execution of lvmcache_label_scan() fails, which will cause the vgid to be unable to obtain normally, then the re-reading will not be triggered in vg_read(), causing the execution of this command to fail . Version-Release number of selected component (if applicable): How reproducible: Execute this script, it can be reproduced within two hours under normal conditions test.sh #!/bin/bash yum install -y libcgroup modprobe brd rd_nr=2 rd_size=1572864 max_part=0 vgname=brd_vg pvcreate /dev/ram0 /dev/ram1 vgcreate $vgname /dev/ram0 /dev/ram1 lvcreate -y --wipesignatures y -n thinpool $vgname -l 90%VG lvcreate -y --wipesignatures y -n thinpoolmeta $vgname -l 1%VG lvconvert -y --zero n -c 512K --thinpool $vgname/thinpool --poolmetadata $vgname/thinpoolmeta basedir=/sys/fs/cgroup/blkio/brd_test mkdir $basedir function device_test() { while true do i=0 while [ $i -le 500 ] do lvcreate -V1M -T $vgname/thinpool -n lv$i & ((i=i+1)) done wait sleep 1 i=0 while [ $i -le 500 ] do test -e /dev/disk/by-id/dm-name-$vgname-lv$i || exit 1 test -e /dev/mapper/$vgname-lv$i || exit 1 test -e /dev/$vgname/lv$i || exit 1 ((i=i+1)) done i=0 while [ $i -le 500 ] do lvremove -f $vgname/lv$i & ((i=i+1)) done wait sleep 1 i=0 while [ $i -le 500 ] do test -e /dev/disk/by-id/dm-name-$vgname-lv$i && exit 1 test -e /dev/mapper/$vgname-lv$i && exit 1 test -e /dev/$vgname/lv$i && exit 1 ((i=i+1)) done done } device_test Steps to Reproduce: 1. 2. 3. Actual results: 2.03.12 Error log: 143682:Jun 10 16:40:38 lvm[708341]: /dev/ram0: Checksum error at offset 203776 143683:Jun 10 16:40:38 lvm[708341]: WARNING: invalid metadata text from /dev/ram0 at 203776. 143684:Jun 10 16:40:38 lvm[708341]: WARNING: metadata on /dev/ram0 at 203776 has invalid summary for VG. 143685:Jun 10 16:40:38 lvm[708341]: WARNING: bad metadata text on /dev/ram0 in mda1 143686:Jun 10 16:40:38 lvm[708341]: WARNING: scanning /dev/ram0 mda1 failed to read metadata summary. 143687:Jun 10 16:40:38 lvm[708341]: WARNING: repair VG metadata on /dev/ram0 with vgck --updatemetadata. 143688:Jun 10 16:40:38 lvm[708341]: WARNING: scan failed to get metadata summary from /dev/ram0 PVID elQQtAsePg5vfZA2DuMV4exqYhvvRdrq 143689:Jun 10 16:40:38 lvm[708341]: /dev/ram1: Checksum error at offset 508416 143690:Jun 10 16:40:38 lvm[708341]: WARNING: invalid metadata text from /dev/ram1 at 508416. 143691:Jun 10 16:40:38 lvm[708341]: WARNING: metadata on /dev/ram1 at 508416 has invalid summary for VG. 143692:Jun 10 16:40:38 lvm[708341]: WARNING: bad metadata text on /dev/ram1 in mda1 143693:Jun 10 16:40:38 lvm[708341]: WARNING: scanning /dev/ram1 mda1 failed to read metadata summary. 143694:Jun 10 16:40:38 lvm[708341]: WARNING: repair VG metadata on /dev/ram1 with vgck --updatemetadata. 143695:Jun 10 16:40:38 lvm[708341]: WARNING: scan failed to get metadata summary from /dev/ram1 PVID mLoVSnWPN1IqsHh3eXMh5QCmWyHsBwDi 143901:Jun 10 16:41:15 lvm[708341]: Volume group "brd_vg" not found 143902:Jun 10 16:41:15 lvm[708341]: Cannot process volume group brd_vg Expected results: The script executes normally without exiting abnormally Additional info:
So your test case seems to be this - you've queue hundreds of lvm commands and occasionally it happens the one 'scanning command' is reading ring-buffer while several other sequentially locked commands manage to overwrite whole ringbuffer which is ~1MiB by default and such metadata for 500LV takes around 200KiB - so there are just 5 commands to reach ring around - so depending on kernel scheduler skill it may happen we will obtain already 'invalid' metadata in scanning phase. Could you try to use much bigger metadata area in your case (i.e. 10MiB) - that should further eliminate this race chance ? We need to think should be the best approach in this case...
There are two commits to address this issue on this devel branch: https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-scan-errors-1 (Further changes may appear on branches dev-dct-scan-errors-<N>) The first commit is a locking optimization that will probably avoid many of the issues described above, but only applies to commands that use a specific VG. This commit will need some further work if we decide that it's useful enough to keep. The second commit applies to all commands and has them begin using the global lock prior to scanning when the metadata becomes large enough a percent of the metadata area. This should reduce the chance of problems even further, but the percentage may need to be adjusted (currently the extra locking begins when metadata uses 25% of the metadata area.) I'd like to know if either or both of these commits helps the test case. Thanks
Studying the logs from reproducing this myself, I'm not seeing metadata wrap when the problem appears, so I'm not sure we've identified a root cause yet (or perhaps there's another.) In the test logs: lvremove1 writes seqno 7083 to ram1 at offset 226304 size 159774 (ondisk 230400 size 160256) lvremove2 writes seqno 7084 to ram1 at offset 386560 size 159751 (ondisk 390656 size 160256) lvremove3 writes seqno 7085 to ram1 at offset 546816 size 159454 (ondisk 550912 size 159744) lvremove-BAD reads seqno 7084 from ram0 as was written by lvremove2, but lvremove-BAD gets a checksum error reading ram1, where we expect it to also see seqno 7084. The debugging from lvremove-BAD shows that it's reading metadata from ram1 at the same location that lvremove2 wrote to, but apparently not getting the same metadata. When writing vg metadata, lvm first writes the new metadata (7084), then it writes the mda_header with the new metadata location (386560) in the header precommit field, and then it writes the mda_header again with the new metadata location (386560) in the header commit field. So, if lvmremove-BAD sees the same new metadata location (386560) in the mda_header commit field, then that metadata should be seen on disk. I will next extend debugging to the bcache layer to see if it's performing disk io as expected. lvremove1: 17:09:58.976457 lvremove[1694867] format_text/format-text.c:693 VG wvg seqno 7083 metadata write to /dev/ram0 mda_start 4096 mda_size 1044480 mda_last 1048575 17:09:58.976464 lvremove[1694867] format_text/format-text.c:435 VG wvg 7083 new metadata start align from 654149 to 654336 (+187) 17:09:58.976470 lvremove[1694867] format_text/format-text.c:809 VG wvg 7083 metadata area location old start 494080 last 654148 size 160069 wrap 0 17:09:58.976475 lvremove[1694867] format_text/format-text.c:816 VG wvg 7083 metadata area location new start 654336 last 814109 size 159774 wrap 0 17:09:58.976481 lvremove[1694867] format_text/format-text.c:901 VG wvg 7083 metadata disk location start 658432 size 159774 last 818205 17:09:58.976486 lvremove[1694867] format_text/format-text.c:939 VG wvg 7083 metadata last align from 818205 to 818687 (+482) 17:09:58.976494 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:58.976500 lvremove[1694867] format_text/format-text.c:988 VG wvg 7083 metadata write at 658432 size 160256 (wrap 0) 17:09:58.977804 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:58.977822 lvremove[1694867] format_text/format-text.c:693 VG wvg seqno 7083 metadata write to /dev/ram1 mda_start 4096 mda_size 1044480 mda_last 1048575 17:09:58.977829 lvremove[1694867] format_text/format-text.c:435 VG wvg 7083 new metadata start align from 226117 to 226304 (+187) 17:09:58.977835 lvremove[1694867] format_text/format-text.c:809 VG wvg 7083 metadata area location old start 66048 last 226116 size 160069 wrap 0 17:09:58.977840 lvremove[1694867] format_text/format-text.c:816 VG wvg 7083 metadata area location new start 226304 last 386077 size 159774 wrap 0 17:09:58.977849 lvremove[1694867] format_text/format-text.c:901 VG wvg 7083 metadata disk location start 230400 size 159774 last 390173 17:09:58.977853 lvremove[1694867] format_text/format-text.c:939 VG wvg 7083 metadata last align from 390173 to 390655 (+482) 17:09:58.977860 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:58.977864 lvremove[1694867] format_text/format-text.c:988 VG wvg 7083 metadata write at 230400 size 160256 (wrap 0) 17:09:58.979136 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram0 at 4096 17:09:58.979152 lvremove[1694867] format_text/format-text.c:1148 VG wvg metadata precommit seq 7083 on /dev/ram0 mda header at 4096 (used) 17:09:58.979157 lvremove[1694867] format_text/format-text.c:1155 VG wvg metadata precommit slot0 offset 494080 size 160069 slot1 offset 654336 size 159774 17:09:58.979168 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:58.979174 lvremove[1694867] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:58.979187 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:58.979199 lvremove[1694867] format_text/format-text.c:1148 VG wvg metadata precommit seq 7083 on /dev/ram1 mda header at 4096 (used) 17:09:58.979205 lvremove[1694867] format_text/format-text.c:1155 VG wvg metadata precommit slot0 offset 66048 size 160069 slot1 offset 226304 size 159774 17:09:58.979214 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:58.979220 lvremove[1694867] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:58.979235 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram0 at 4096 17:09:58.979244 lvremove[1694867] format_text/format-text.c:1161 VG wvg metadata commit seq 7083 on /dev/ram0 mda header at 4096 (used) 17:09:58.979249 lvremove[1694867] format_text/format-text.c:1168 VG wvg metadata commit slot0 offset 654336 size 159774 slot1 offset 0 size 0 17:09:58.979257 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:58.979264 lvremove[1694867] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:58.979279 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:58.979290 lvremove[1694867] format_text/format-text.c:1161 VG wvg metadata commit seq 7083 on /dev/ram1 mda header at 4096 (used) 17:09:58.979295 lvremove[1694867] format_text/format-text.c:1168 VG wvg metadata commit slot0 offset 226304 size 159774 slot1 offset 0 size 0 17:09:58.979305 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:58.979310 lvremove[1694867] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 lvremove2: 17:09:59.111749 lvremove[1694867] format_text/format-text.c:693 VG wvg seqno 7084 metadata write to /dev/ram0 mda_start 4096 mda_size 1044480 mda_last 1048575 17:09:59.111757 lvremove[1694867] format_text/format-text.c:435 VG wvg 7084 new metadata start align from 814110 to 814592 (+482) 17:09:59.111762 lvremove[1694867] format_text/format-text.c:809 VG wvg 7084 metadata area location old start 654336 last 814109 size 159774 wrap 0 17:09:59.111769 lvremove[1694867] format_text/format-text.c:816 VG wvg 7084 metadata area location new start 814592 last 974342 size 159751 wrap 0 17:09:59.111773 lvremove[1694867] format_text/format-text.c:901 VG wvg 7084 metadata disk location start 818688 size 159751 last 978438 17:09:59.111779 lvremove[1694867] format_text/format-text.c:939 VG wvg 7084 metadata last align from 978438 to 978943 (+505) 17:09:59.111784 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.111790 lvremove[1694867] format_text/format-text.c:988 VG wvg 7084 metadata write at 818688 size 160256 (wrap 0) 17:09:59.113099 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:59.113119 lvremove[1694867] format_text/format-text.c:693 VG wvg seqno 7084 metadata write to /dev/ram1 mda_start 4096 mda_size 1044480 mda_last 1048575 17:09:59.113128 lvremove[1694867] format_text/format-text.c:435 VG wvg 7084 new metadata start align from 386078 to 386560 (+482) 17:09:59.113133 lvremove[1694867] format_text/format-text.c:809 VG wvg 7084 metadata area location old start 226304 last 386077 size 159774 wrap 0 17:09:59.113138 lvremove[1694867] format_text/format-text.c:816 VG wvg 7084 metadata area location new start 386560 last 546310 size 159751 wrap 0 17:09:59.113145 lvremove[1694867] format_text/format-text.c:901 VG wvg 7084 metadata disk location start 390656 size 159751 last 550406 17:09:59.113150 lvremove[1694867] format_text/format-text.c:939 VG wvg 7084 metadata last align from 550406 to 550911 (+505) 17:09:59.113156 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.113160 lvremove[1694867] format_text/format-text.c:988 VG wvg 7084 metadata write at 390656 size 160256 (wrap 0) 17:09:59.114502 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram0 at 4096 17:09:59.114517 lvremove[1694867] format_text/format-text.c:1148 VG wvg metadata precommit seq 7084 on /dev/ram0 mda header at 4096 (used) 17:09:59.114522 lvremove[1694867] format_text/format-text.c:1155 VG wvg metadata precommit slot0 offset 654336 size 159774 slot1 offset 814592 size 159751 17:09:59.114532 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.114538 lvremove[1694867] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:59.114553 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:59.114563 lvremove[1694867] format_text/format-text.c:1148 VG wvg metadata precommit seq 7084 on /dev/ram1 mda header at 4096 (used) 17:09:59.114570 lvremove[1694867] format_text/format-text.c:1155 VG wvg metadata precommit slot0 offset 226304 size 159774 slot1 offset 386560 size 159751 17:09:59.114578 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.114585 lvremove[1694867] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:59.114599 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram0 at 4096 17:09:59.114609 lvremove[1694867] format_text/format-text.c:1161 VG wvg metadata commit seq 7084 on /dev/ram0 mda header at 4096 (used) 17:09:59.114617 lvremove[1694867] format_text/format-text.c:1168 VG wvg metadata commit slot0 offset 814592 size 159751 slot1 offset 0 size 0 17:09:59.114626 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.114633 lvremove[1694867] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:59.114647 lvremove[1694867] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:59.114657 lvremove[1694867] format_text/format-text.c:1161 VG wvg metadata commit seq 7084 on /dev/ram1 mda header at 4096 (used) 17:09:59.114662 lvremove[1694867] format_text/format-text.c:1168 VG wvg metadata commit slot0 offset 386560 size 159751 slot1 offset 0 size 0 17:09:59.114672 lvremove[1694867] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.114677 lvremove[1694867] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 lvremove3: 17:09:59.377090 lvremove[1694829] format_text/format-text.c:693 VG wvg seqno 7085 metadata write to /dev/ram0 mda_start 4096 mda_size 1044480 mda_last 1048575 17:09:59.377103 lvremove[1694829] format_text/format-text.c:435 VG wvg 7085 new metadata start align from 974343 to 974848 (+505) 17:09:59.377113 lvremove[1694829] format_text/format-text.c:790 VG wvg 7085 wrapping metadata new_start 974848 new_size 159454 to size1 69632 size2 89822 17:09:59.377120 lvremove[1694829] format_text/format-text.c:809 VG wvg 7085 metadata area location old start 814592 last 974342 size 159751 wrap 0 17:09:59.377126 lvremove[1694829] format_text/format-text.c:816 VG wvg 7085 metadata area location new start 974848 last 90333 size 159454 wrap 89822 17:09:59.377132 lvremove[1694829] format_text/format-text.c:910 VG wvg 7085 metadata disk location write1 start 978944 size 69632 last 1048575 write2 start 4608 size 89822 last 17:09:59.377138 lvremove[1694829] format_text/format-text.c:959 VG wvg 7085 metadata last align from 94429 to 94719 (+290) (wrapped) 17:09:59.377149 lvremove[1694829] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.377154 lvremove[1694829] format_text/format-text.c:988 VG wvg 7085 metadata write at 978944 size 69632 (wrap 90112) 17:09:59.377250 lvremove[1694829] format_text/format-text.c:999 VG wvg 7085 metadata write at 4608 size 90112 (wrapped) 17:09:59.378452 lvremove[1694829] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:59.378470 lvremove[1694829] format_text/format-text.c:693 VG wvg seqno 7085 metadata write to /dev/ram1 mda_start 4096 mda_size 1044480 mda_last 1048575 17:09:59.378477 lvremove[1694829] format_text/format-text.c:435 VG wvg 7085 new metadata start align from 546311 to 546816 (+505) 17:09:59.378482 lvremove[1694829] format_text/format-text.c:809 VG wvg 7085 metadata area location old start 386560 last 546310 size 159751 wrap 0 17:09:59.378488 lvremove[1694829] format_text/format-text.c:816 VG wvg 7085 metadata area location new start 546816 last 706269 size 159454 wrap 0 17:09:59.378493 lvremove[1694829] format_text/format-text.c:901 VG wvg 7085 metadata disk location start 550912 size 159454 last 710365 17:09:59.378499 lvremove[1694829] format_text/format-text.c:939 VG wvg 7085 metadata last align from 710365 to 710655 (+290) 17:09:59.378505 lvremove[1694829] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.378509 lvremove[1694829] format_text/format-text.c:988 VG wvg 7085 metadata write at 550912 size 159744 (wrap 0) 17:09:59.379775 lvremove[1694829] format_text/format-text.c:196 Reading mda header sector from /dev/ram0 at 4096 17:09:59.379790 lvremove[1694829] format_text/format-text.c:1148 VG wvg metadata precommit seq 7085 on /dev/ram0 mda header at 4096 (used) 17:09:59.379796 lvremove[1694829] format_text/format-text.c:1155 VG wvg metadata precommit slot0 offset 814592 size 159751 slot1 offset 974848 size 159454 17:09:59.379806 lvremove[1694829] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.379813 lvremove[1694829] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:59.379828 lvremove[1694829] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:59.379840 lvremove[1694829] format_text/format-text.c:1148 VG wvg metadata precommit seq 7085 on /dev/ram1 mda header at 4096 (used) 17:09:59.379845 lvremove[1694829] format_text/format-text.c:1155 VG wvg metadata precommit slot0 offset 386560 size 159751 slot1 offset 546816 size 159454 17:09:59.379854 lvremove[1694829] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.379863 lvremove[1694829] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:59.379876 lvremove[1694829] format_text/format-text.c:196 Reading mda header sector from /dev/ram0 at 4096 17:09:59.379885 lvremove[1694829] format_text/format-text.c:1161 VG wvg metadata commit seq 7085 on /dev/ram0 mda header at 4096 (used) 17:09:59.379890 lvremove[1694829] format_text/format-text.c:1168 VG wvg metadata commit slot0 offset 974848 size 159454 slot1 offset 0 size 0 17:09:59.379898 lvremove[1694829] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.379908 lvremove[1694829] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 17:09:59.379923 lvremove[1694829] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:09:59.379933 lvremove[1694829] format_text/format-text.c:1161 VG wvg metadata commit seq 7085 on /dev/ram1 mda header at 4096 (used) 17:09:59.379939 lvremove[1694829] format_text/format-text.c:1168 VG wvg metadata commit slot0 offset 546816 size 159454 slot1 offset 0 size 0 17:09:59.379948 lvremove[1694829] label/label.c:1781 Set last byte mixed block sizes physical 4096 logical 512 using 512 17:09:59.379960 lvremove[1694829] device/bcache.c:239 Limit write at 0 len 131072 to len 4608 lvremove-BAD: 17:09:59.327308 lvremove[1694881] format_text/text_label.c:497 Scanning /dev/ram0 mda1 summary. 17:09:59.327313 lvremove[1694881] format_text/format-text.c:196 Reading mda header sector from /dev/ram0 at 4096 17:09:59.327514 lvremove[1694881] format_text/import.c:59 Reading metadata summary from /dev/ram0 at 818688 size 159751 (+0) 17:09:59.334607 lvremove[1694881] format_text/format-text.c:1677 Found metadata summary on /dev/ram0 at 818688 size 159751 for VG wvg 17:09:59.334632 lvremove[1694881] cache/lvmcache.c:1553 lvmcache adding vginfo for wvg jKQbBM-2rXQ-OC7l-WMcl-xYNh-mb38-g0Ui3I 17:09:59.334639 lvremove[1694881] cache/lvmcache.c:1634 lvmcache /dev/ram0: now in VG wvg jKQbBM-2rXQ-OC7l-WMcl-xYNh-mb38-g0Ui3I 17:09:59.334645 lvremove[1694881] cache/lvmcache.c:1486 lvmcache /dev/ram0: VG wvg: set VGID to jKQbBM2rXQOC7lWMclxYNhmb38g0Ui3I. 17:09:59.334652 lvremove[1694881] cache/lvmcache.c:1832 lvmcache /dev/ram0 mda1 VG wvg set seqno 7084 checksum 8e9aca9d mda_size 159751 17:09:59.334658 lvremove[1694881] cache/lvmcache.c:1669 lvmcache /dev/ram0: VG wvg: set creation host to null-04.lab.msp.redhat.com. 17:09:59.334665 lvremove[1694881] cache/lvmcache.c:1705 lvmcache /dev/ram0: VG wvg: set system_id to null-04.lab.msp.redhat.com. 17:09:59.334670 lvremove[1694881] format_text/text_label.c:525 Scanned /dev/ram0 mda1 seqno 7084 17:09:59.334677 lvremove[1694881] label/label.c:746 Processing data from device /dev/sda 8:0 di 1 block 0x55adbfccde70 17:09:59.334685 lvremove[1694881] device/dev-io.c:94 /dev/sda: using cached size 976773168 sectors 17:10:00.581435 lvremove[1694881] filters/filter-partitioned.c:44 /dev/sda: Skipping: Partition table signature found 17:10:00.581449 lvremove[1694881] filters/filter-persistent.c:140 filter caching bad /dev/sda 17:10:00.581459 lvremove[1694881] label/label.c:394 <backtrace> 17:10:00.581479 lvremove[1694881] label/label.c:746 Processing data from device /dev/ram1 1:1 di 2 block 0x55adbfccdeb0 17:10:00.581489 lvremove[1694881] device/dev-io.c:94 /dev/ram1: using cached size 3145728 sectors 17:10:00.581501 lvremove[1694881] device/dev-io.c:94 /dev/ram1: using cached size 3145728 sectors 17:10:00.581510 lvremove[1694881] filters/filter-persistent.c:140 filter caching good /dev/ram1 17:10:00.581520 lvremove[1694881] label/label.c:324 /dev/ram1: lvm2 label detected at sector 1 17:10:00.581536 lvremove[1694881] cache/lvmcache.c:1634 lvmcache /dev/ram1: now in VG #orphans_lvm2 #orpha-ns_l-vm2 17:10:00.581544 lvremove[1694881] format_text/text_label.c:464 /dev/ram1: PV header extension version 2 found 17:10:00.581557 lvremove[1694881] format_text/text_label.c:497 Scanning /dev/ram1 mda1 summary. 17:10:00.581563 lvremove[1694881] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 17:10:00.581713 lvremove[1694881] format_text/import.c:59 Reading metadata summary from /dev/ram1 at 390656 size 159751 (+0) 17:10:00.583524 lvremove[1694881] config/config.c:560 /dev/ram1: Checksum error at offset 390656 17:10:00.583562 lvremove[1694881] format_text/import.c:66 WARNING: invalid metadata text from /dev/ram1 at 390656. 17:10:00.583578 lvremove[1694881] format_text/format-text.c:1661 WARNING: metadata on /dev/ram1 at 390656 has invalid summary for VG. 17:10:00.583591 lvremove[1694881] format_text/text_label.c:358 WARNING: bad metadata text on /dev/ram1 in mda1 17:10:00.583604 lvremove[1694881] format_text/text_label.c:536 WARNING: scanning /dev/ram1 mda1 failed to read metadata summary. lvremove2 wrote: ram0: VG wvg 7084 metadata area location new start 814592 last 974342 size 159751 ram0: VG wvg 7084 metadata disk location start 818688 size 159751 last 978438 ram1: VG wvg 7084 metadata area location new start 386560 last 546310 size 159751 ram1: VG wvg 7084 metadata disk location start 390656 size 159751 last 550406 lvremove-BAD reads: ram0: Reading metadata summary from /dev/ram0 at 818688 size 159751 -> gets seqno 7084 that lvremove2 wrote. ram1: Reading metadata summary from /dev/ram1 at 390656 size 159751 -> gets checksum error
Adding bcache io logging is showing something that doesn't look right in bcache, but I've not yet confirmed. _raw_read_mda_header() does: log_debug_metadata("Reading mda header sector from %s at %llu", dev_name(dev_area->dev), (unsigned long long)dev_area->start); if (!dev_read_bytes(dev_area->dev, dev_area->start, MDA_HEADER_SIZE, mdah)) { } The logging is showing: format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 device/bcache.c:280 io R off 131072 bytes 131072 di 2 fd 7 Which indicates that reading the mda header at offset 4096 causes bcache to do a read at offset 131072K. bcache works in 128K blocks, so it should be reading the first block at offset 0 (which includes the mda header at 4K), and not the second block at offset 128K. Logging appears to show this incorrect read offset occuring very frequently, but it doesn't usually lead to errors. The next step will require digging into the bcache code to see if there's really a bug here, or if this is a misreading (it's hard to believe such a basic flaw would not yet have been detected.)
(In reply to David Teigland from comment #6) > Adding bcache io logging is showing something that doesn't look right in > bcache, but I've not yet confirmed. > > _raw_read_mda_header() does: > > log_debug_metadata("Reading mda header sector from %s at %llu", > dev_name(dev_area->dev), (unsigned long > long)dev_area->start); > > if (!dev_read_bytes(dev_area->dev, dev_area->start, MDA_HEADER_SIZE, > mdah)) { > } > > The logging is showing: > > format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at > 4096 > device/bcache.c:280 io R off 131072 bytes 131072 di 2 fd 7 > > Which indicates that reading the mda header at offset 4096 causes bcache to > do a read at offset 131072K. bcache works in 128K blocks, so it should be > reading the first block at offset 0 (which includes the mda header at 4K), > and not the second block at offset 128K. > > Logging appears to show this incorrect read offset occuring very frequently, > but it doesn't usually lead to errors. > > The next step will require digging into the bcache code to see if there's > really a bug here, or if this is a misreading (it's hard to believe such a > basic flaw would not yet have been detected.) Sorry, I didn't reply to you until now. https://sourceware.org/git/?p=lvm2.git;a=commit;h=31e2c4c1a6e0ddb7ee8a34fe96505b3135d51aa6 I use this patch, and adjusting the metadata percentage to 0, the problem was solved。
> Sorry, I didn't reply to you until now. > https://sourceware.org/git/?p=lvm2.git;a=commit; > h=31e2c4c1a6e0ddb7ee8a34fe96505b3135d51aa6 I use this patch, and adjusting > the metadata percentage to 0, the problem was solved。 Thanks, we might not need that patch if we find that the real bug is something else, but not yet certain.
> format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 > device/bcache.c:280 io R off 131072 bytes 131072 di 2 fd 7 > > Which indicates that reading the mda header at offset 4096 causes bcache to > do a read at offset 131072K. This was confusion about the logging output.
I've added enough debugging to observe the problem that Zdenek mentioned above. In label_scan, the command first reads the headers from each PV (label_header, pv_header, mda_header), and then reads the metadata text from the metadata area on each of those PVs. The mda_header from the first read gives the location of the metadata text. In the time between reading the mda_header and reading the referenced metadata text, other commands completely rewrite the metadata area with new versions of metadata. So, the metadata text referenced by the mda_header no longer exists, and the command fails (invalid vg name or checksum error.) With metadata seqno added to the mda_header for debugging, we see that the mda_header is referencing metadata seqno 50598: 15:48:40.412193 lvremove[2094584] label/label.c:697 Scanning 11 devices for VG info 15:48:40.412212 lvremove[2094584] label/label.c:594 open /dev/ram0 ro di 0 fd 5 15:48:40.412222 lvremove[2094584] device/bcache.c:284 io R off 0 bytes 131072 di 0 fd 5 unset start 0 15:48:40.412502 lvremove[2094584] label/label.c:594 open /dev/ram1 ro di 2 fd 7 15:48:40.412508 lvremove[2094584] device/bcache.c:284 io R off 0 bytes 131072 di 2 fd 7 unset start 0 ... 15:48:41.587301 lvremove[2094584] label/label.c:749 Processing data from device /dev/ram1 1:1 di 2 block 0x5562fb227eb0 15:48:41.587337 lvremove[2094584] label/label.c:327 /dev/ram1: lvm2 label detected at sector 1 15:48:41.587359 lvremove[2094584] format_text/text_label.c:497 Scanning /dev/ram1 mda1 summary. 15:48:41.587367 lvremove[2094584] format_text/format-text.c:196 Reading mda header sector from /dev/ram1 at 4096 15:48:41.587378 lvremove[2094584] format_text/format-text.c:247 Read mda_header from /dev/ram1 seqno 50598 15:48:41.588065 lvremove[2094584] format_text/import.c:59 Reading metadata summary from /dev/ram1 at 337408 size 159771 (+0) 15:48:41.588087 lvremove[2094584] device/bcache.c:284 io R off 262144 bytes 131072 di 2 fd 7 metadata_config start 337408 15:48:41.588162 lvremove[2094584] device/bcache.c:284 io R off 393216 bytes 131072 di 2 fd 7 unset start 0 15:48:41.589406 lvremove[2094584] config/config.c:588 /dev/ram1: Checksum error at offset 337408 size 159771 size2 0 15:48:41.589663 lvremove[2094584] format_text/import.c:66 WARNING: invalid metadata text from /dev/ram1 at 337408. 15:48:41.589678 lvremove[2094584] format_text/format-text.c:1745 WARNING: metadata on /dev/ram1 at 337408 has invalid summary for VG. Note that label_scan reading headers from ram1 occurs at 15:48:40.412502, then mda_header from that read is used to read metadata text at 15:48:41.588065. This is about a one second gap, which provides quite a bit of time for other commands to make changes. (This command reads and processes metadata text from another slower device, sda, before attempting to read metadata from ram1, which is largely the cause of the time gap.) From a history of metadata wrapping we see that seqno 50602 wrapped: pid 2094368 seqno 50595 old_start 735232 new_start 896000 new_size1 148480 new_size2 11909 pid 2094654 seqno 50602 old_start 813568 new_start 973312 new_size1 71168 new_size2 87961 From dumping the unexpected metadata text that the command read, we see that the referenced metadata location actually contains metadata for seqno's 50604 and 50605 (not 50598). From the metadata size and wrapping log, we know that the metadata wraps every 6-7 metadata seqno's, and the difference between 50598 and 50694 is 6. I think our initial recommendation for this problem should be to use a larger metadata area when using large metadata and a large number of concurrent commands changing metadata. As for future improvements to address this, a couple options would be: expanding locking to cover label_scan (perhaps optional since most cases don't require it), or adding error handling in which we would reread the mda_header+metadata_area all together if a checksum error is seen during scan.
here are two commits that should also fix the problem more simply by retrying (the same errors are printed when the problem occurs, but after retrying the command should succeeed.) https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-scan-retry-1
fixed by scan: don't hold bcache block during scan https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=d89942d157e49168d069a65f77a6ecb2c155b3fb scan: retry reading metadata on error https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=e035e323508383fdcd9df0ef036d276a9c9909ab scan: move metadata vgname check https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=b876dbfc245b8fe0a4c433b39d214652455c3f9e