Bug 1787071
| Summary: | pvresize overwrites wrong PV headers after failing to update old extension headers because of scsi reservations | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Alexandros Panagiotou <apanagio> | ||||||
| Component: | lvm2 | Assignee: | David Teigland <teigland> | ||||||
| lvm2 sub component: | Command-line tools | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||
| Severity: | high | ||||||||
| Priority: | urgent | CC: | agk, cmarthal, heinzm, jbrassow, lmiksik, mcsontos, msnitzer, prajnoha, rbednar, rhandlin, rmadhuso, teigland, thornber, zkabelac | ||||||
| Version: | 7.7 | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | lvm2-2.02.186-6.el7 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-03-31 20:04:51 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1651358 [details]
Test results including pvresize verbose output, strace and a funcgraph of the reproducing run of pvresize
Hello,
A few more notes along with the results of one more test (test5). This time I have excluded via filtering sda2 to avoid root getting overwritten and simplify recovery:
[1] The initial good state (before pvresize runs) is:
# pvs -a -v
PV VG Fmt Attr PSize PFree DevSize PV UUID
/dev/mapper/mpatha vg_a lvm2 a-- <50,00g 25,00g 50,00g OWPpmm-FiyJ-CvaO-eVnd-Tgej-SkkK-DkroeM
/dev/mapper/mpathb vg_b lvm2 a-- <50,00g 25,00g 50,00g 19qYTb-GNSN-DclK-85eW-zJm3-xdf9-MHcezg
/dev/mapper/mpathc vg_c lvm2 a-- <50,00g 25,00g 50,00g Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D
/dev/mapper/mpathd vg_d lvm2 a-- <54,00g 29,00g 56,00g AFNrSH-mwMc-QQz0-jBSQ-MoQI-X2gj-VKq5Xe
/dev/mapper/mpathe vg_e lvm2 a-- <50,00g 25,00g 50,00g Z6TOqr-n9mU-Fy8e-Rmt2-0Iat-lHCt-rVQbbS
/dev/mapper/mpathf vg_f lvm2 a-- <50,00g 25,00g 50,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz
/dev/mapper/mpathg vg_g lvm2 a-- <50,00g 25,00g 50,00g Ydv7Lq-1WrC-O9Sc-578S-qNkt-jXpJ-V3tGlB
/dev/sda1 --- 0 0 500,00m
/dev/sda2 r7vg lvm2 a-- <9,51g <2,38g 9,51g xW57i5-rfdb-30UH-fnLG-GbAm-DDPL-eLz8d2
[2] After the test, pvs looks like (included in 1787071.test5.tar.gz):
WARNING: Not using device /dev/mapper/mpathd for PV XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz.
WARNING: PV XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz prefers device /dev/mapper/mpathf because device size is correct.
PV VG Fmt Attr PSize PFree DevSize PV UUID
/dev/mapper/mpatha vg_a lvm2 a-- <50,00g 25,00g 50,00g OWPpmm-FiyJ-CvaO-eVnd-Tgej-SkkK-DkroeM
/dev/mapper/mpathb vg_b lvm2 a-- <50,00g 25,00g 50,00g 19qYTb-GNSN-DclK-85eW-zJm3-xdf9-MHcezg
/dev/mapper/mpathc vg_c lvm2 a-- <50,00g 25,00g 50,00g Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D
/dev/mapper/mpathd vg_f lvm2 d-- 0 0 54,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz <-+
/dev/mapper/mpathe vg_e lvm2 a-- <50,00g 25,00g 50,00g Z6TOqr-n9mU-Fy8e-Rmt2-0Iat-lHCt-rVQbbS |
/dev/mapper/mpathf vg_f lvm2 a-- <50,00g 25,00g 50,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz <-+
/dev/mapper/mpathg vg_g lvm2 a-- <50,00g 25,00g 50,00g Ydv7Lq-1WrC-O9Sc-578S-qNkt-jXpJ-V3tGlB
/dev/sda1 --- 0 0 500,00m
/dev/sda2 r7vg lvm2 a-- <9,51g <2,38g 9,51g xW57i5-rfdb-30UH-fnLG-GbAm-DDPL-eLz8d2
[3] The strace of pvresize shows that pvresize tries to write the pv uuid XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz on mpathf, mpathe, mpathc and mpathd. As mpathf, mpathe and mpathc have a write reservation and do not allow writing, the result of io_submit (as seen in io_getevents) is -5. Practically, it appears that once pvresize fails on mpathf (due to the scsi reservation and the subsequent IO error) it tries to write the header on each device it detects, until it succeeds (which happens on mpathd).
$ grep -A1 -B10 -e "x58.x41.x4a.x5a.x45.x4f.x68.x78.x76.x33.x63" pvresize_vvvv_mpathd.fastvm-rhel-7-7-181.2019-12-31-16:26:59.strace | sed -E -e "s/(\\\x00)+/.../g"
21233 16:28:18.639606 write(2<pipe:[40663]>, "Unlock: Memlock counters: prioritized:0 locked:0 critical:0 daemon:0 suspended:0", 80) = 80 <0.000048>
21233 16:28:18.639714 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000050>
21233 16:28:18.640227 write(2<pipe:[40663]>, "#format_text/format-text.c:1474 ", 41) = 41 <0.000055>
21233 16:28:18.640347 write(2<pipe:[40663]>, "Creating metadata area on /dev/mapper/mpathf at sector 8 size 2040 sectors", 74) = 74 <0.000044>
21233 16:28:18.640446 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000055>
21233 16:28:18.640935 ioctl(3</dev/dm-7>, BLKPBSZGET, [512]) = 0 <0.000023>
21233 16:28:18.641034 ioctl(3</dev/dm-7>, BLKSSZGET, [512]) = 0 <0.000019>
21233 16:28:18.641542 write(2<pipe:[40663]>, "#device/bcache.c:242 ", 31) = 31 <0.000060>
21233 16:28:18.641674 write(2<pipe:[40663]>, "Limit write at 0 len 131072 to len 4608", 39) = 39 <0.000059>
21233 16:28:18.641804 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000041> L A B E L O N E uuid of mpathf: X A J ...
21233 16:28:18.641936 io_submit(140468775841792, 1, [{pwrite, fildes=3, str="...\x4c\x41\x42\x45\x4c\x4f\x4e\x45\x01...\xc0\x7f\xea\x64\x20...\x4c\x56\x4d\x32\x20\x30\x30\x31\x58\x41\x4a\x5a\x45\x4f\x68\x78\x76\x33\x63\x37\x6a\x7a\x4b\x66\x4c\x59\x56\x67\x73\x54\x48\x57\x77\x45\x32\x43\x34\x64\x76\x7a...\x80\x0c...\x10...\x10...\xf0\x0f...\x01..."..., nbytes=4608, offset=0}]) = 1 <0.000070>
21233 16:28:18.642140 io_getevents(140468775841792, 1, 64, [{data=0, obj=0x5574f87ef748, res=-5, res2=0}], NULL) = 1 <0.009916>
--
21233 16:28:18.675008 open("/dev/mapper/mpathe", O_RDWR|O_DIRECT|O_NOATIME) = 3</dev/dm-6> <0.000032>
21233 16:28:18.675271 write(2<pipe:[40663]>, "#label/label.c:668 ", 29) = 29 <0.000051>
21233 16:28:18.675386 write(2<pipe:[40663]>, "Scanning submitted 1 reads", 26) = 26 <0.000042>
21233 16:28:18.675514 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000047>
21233 16:28:18.675867 write(2<pipe:[40663]>, "#label/label.c:677 ", 29) = 29 <0.000043>
21233 16:28:18.676038 write(2<pipe:[40663]>, "Scan failed to read /dev/mapper/mpathe error 0.", 47) = 47 <0.000044>
21233 16:28:18.676145 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000040>
21233 16:28:18.676450 write(2<pipe:[40663]>, "#device/bcache.c:242 ", 31) = 31 <0.000050>
21233 16:28:18.676594 write(2<pipe:[40663]>, "Limit write at 0 len 131072 to len 4608", 39) = 39 <0.000041>
21233 16:28:18.676690 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000039>
21233 16:28:18.676815 io_submit(140468775841792, 1, [{pwrite, fildes=3, str="...\x4c\x41\x42\x45\x4c\x4f\x4e\x45\x01...\xc0\x7f\xea\x64\x20...\x4c\x56\x4d\x32\x20\x30\x30\x31\x58\x41\x4a\x5a\x45\x4f\x68\x78\x76\x33\x63\x37\x6a\x7a\x4b\x66\x4c\x59\x56\x67\x73\x54\x48\x57\x77\x45\x32\x43\x34\x64\x76\x7a...\x80\x0c...\x10...\x10...\xf0\x0f...\x01..."..., nbytes=4608, offset=0}]) = 1 <0.000075>
21233 16:28:18.677355 io_getevents(140468775841792, 1, 64, [{data=0, obj=0x5574f87ef748, res=-5, res2=0}], NULL) = 1 <0.009953>
--
21233 16:28:18.711564 open("/dev/mapper/mpathc", O_RDWR|O_DIRECT|O_NOATIME) = 3</dev/dm-5> <0.000042>
21233 16:28:18.711826 write(2<pipe:[40663]>, "#label/label.c:668 ", 29) = 29 <0.000064>
21233 16:28:18.712004 write(2<pipe:[40663]>, "Scanning submitted 1 reads", 26) = 26 <0.000058>
21233 16:28:18.712166 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000099>
21233 16:28:18.712644 write(2<pipe:[40663]>, "#label/label.c:677 ", 29) = 29 <0.000070>
21233 16:28:18.712847 write(2<pipe:[40663]>, "Scan failed to read /dev/mapper/mpathc error 0.", 47) = 47 <0.000069>
21233 16:28:18.713028 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000058>
21233 16:28:18.713438 write(2<pipe:[40663]>, "#device/bcache.c:242 ", 31) = 31 <0.000083>
21233 16:28:18.713696 write(2<pipe:[40663]>, "Limit write at 0 len 131072 to len 4608", 39) = 39 <0.000063>
21233 16:28:18.713877 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000061>
21233 16:28:18.714088 io_submit(140468775841792, 1, [{pwrite, fildes=3, str="...\x4c\x41\x42\x45\x4c\x4f\x4e\x45\x01...\xc0\x7f\xea\x64\x20...\x4c\x56\x4d\x32\x20\x30\x30\x31\x58\x41\x4a\x5a\x45\x4f\x68\x78\x76\x33\x63\x37\x6a\x7a\x4b\x66\x4c\x59\x56\x67\x73\x54\x48\x57\x77\x45\x32\x43\x34\x64\x76\x7a...\x80\x0c...\x10...\x10...\xf0\x0f...\x01..."..., nbytes=4608, offset=0}]) = 1 <0.000082>
21233 16:28:18.714300 io_getevents(140468775841792, 1, 64, [{data=0, obj=0x5574f87ef748, res=-5, res2=0}], NULL) = 1 <0.009549>
--
21233 16:28:18.746946 open("/dev/mapper/mpathd", O_RDWR|O_DIRECT|O_NOATIME) = 3</dev/dm-4> <0.000034>
21233 16:28:18.747216 write(2<pipe:[40663]>, "#label/label.c:668 ", 29) = 29 <0.000028>
21233 16:28:18.747309 write(2<pipe:[40663]>, "Scanning submitted 1 reads", 26) = 26 <0.000027>
21233 16:28:18.747390 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000026>
21233 16:28:18.747739 write(2<pipe:[40663]>, "#label/label.c:677 ", 29) = 29 <0.000035>
21233 16:28:18.747845 write(2<pipe:[40663]>, "Scan failed to read /dev/mapper/mpathd error 0.", 47) = 47 <0.000027>
21233 16:28:18.747930 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000026>
21233 16:28:18.748219 write(2<pipe:[40663]>, "#device/bcache.c:242 ", 31) = 31 <0.000030>
21233 16:28:18.748317 write(2<pipe:[40663]>, "Limit write at 0 len 131072 to len 4608", 39) = 39 <0.000026>
21233 16:28:18.748398 write(2<pipe:[40663]>, "\n", 1) = 1 <0.000026>
21233 16:28:18.748540 io_submit(140468775841792, 1, [{pwrite, fildes=3, str="...\x4c\x41\x42\x45\x4c\x4f\x4e\x45\x01...\xc0\x7f\xea\x64\x20...\x4c\x56\x4d\x32\x20\x30\x30\x31\x58\x41\x4a\x5a\x45\x4f\x68\x78\x76\x33\x63\x37\x6a\x7a\x4b\x66\x4c\x59\x56\x67\x73\x54\x48\x57\x77\x45\x32\x43\x34\x64\x76\x7a...\x80\x0c...\x10...\x10...\xf0\x0f...\x01..."..., nbytes=4608, offset=0}]) = 1 <0.000079>
21233 16:28:18.748762 io_getevents(140468775841792, 1, 64, [{data=0, obj=0x5574f87ef748, res=4608, res2=0}], NULL) = 1 <0.000516>
[4] Based on the verbose pvresize output, it all starts as a result of pvresize trying to update the old header extension it has found on mpathf (as a result of mpathf having been created with an old lvm version). It is worth noting that mpathf does not belong to the same VG as mpathd (which was getting resized).
#metadata/metadata.c:2842 PV /dev/mapper/mpathf has old extension header, updating to newest version.
#metadata/pv_manip.c:417 /dev/mapper/mpathf 0: 0 6399: lv_f(0:0)
#metadata/pv_manip.c:417 /dev/mapper/mpathf 1: 6399 6400: NULL(0:0)
#locking/locking.c:367 Dropping cache for vg_f.
#mm/memlock.c:594 Unlock: Memlock counters: prioritized:0 locked:0 critical:0 daemon:0 suspended:0
#format_text/format-text.c:1474 Creating metadata area on /dev/mapper/mpathf at sector 8 size 2040 sectors
#device/bcache.c:242 Limit write at 0 len 131072 to len 4608
#label/label.c:1383 Error writing device /dev/mapper/mpathf at 4096 length 512.
#format_text/format-text.c:407 Failed to write mda header to /dev/mapper/mpathf fd -1
#format_text/format-text.c:1411 <backtrace>
#cache/lvmcache.c:2788 <backtrace>
#format_text/format-text.c:1509 <backtrace>
#metadata/metadata.c:5005 <backtrace>
#metadata/metadata.c:2997 <backtrace>
#metadata/metadata.c:2848 Failed to update old PV extension headers in VG vg_f.
#metadata/vg.c:89 Freeing VG vg_f at 0x5574f8818b20.
[5] After a discussion with zkabelac, we have tried updating lvm to lvm2-2.02.186-4.el7 (as found on brew) to make sure that we don't hit a known fixed problem. Unfortunately, it still reproduces with lvm2-2.02.186-4.el7. I'll give it a try with RHEL 8 and fedora on Monday to see if newer lvm versions still hit this.
Regards,
Alexandros
This looks like the bug that was reported and fixed by Heming Zhao on linux-lvm. From: Heming Zhao <heming zhao suse com> To: LVM general discussion and development <linux-lvm redhat com>, Gang He <GHe suse com> Subject: Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Date: Fri, 11 Oct 2019 08:11:29 +0000 https://www.redhat.com/archives/linux-lvm/2019-October/msg00004.html The fixes (from Joe and Heming) on the master branch are: 13c254fc0538 fix dev_unset_last_byte after write error 25e7bf021a4e [bcache] bcache_invalidate_fd, only remove prefixes on success. 7e8296f4788d [bcache] reverse earlier patch. 2b3c39e402b9 [bcache] pass up the error from io_submit rather than using generic -EIO 5fdebf9bbf68 [bcache] add unit test 6b0d969b2a85 [label] Use bcache_abort_fd() to ensure blocks are no longer in the cache. 2938b4dcca0a [bcache] add bcache_abort() We weren't able to trivially backport these to the stable branch because they touch on the bcache radix tree which does not exist in stable. I've backported the radix-tree and btree changes. Cherry picking the patch list in comment #9 goes ok apart from the last patch (13c254fc0538) which fails in format-text.c. The tree is here: http://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/2020-01-16-back-port-bcache-changes The final list of commits from the stable-2.02 branch for this bz: d20490f76dd5 [radix-tree] Add missing test case 019fa6f8eec7 [bcache] bcache_invalidate_fd, only remove prefixes on success. 1e2e12f19c58 [bcache] reverse earlier patch. 6370c20d392f [bcache] pass up the error from io_submit rather than using generic -EIO 056eb0a8809a [bcache] add unit test babde3da5530 [label] Use bcache_abort_fd() to ensure blocks are no longer in the cache. 232f779db4a4 [bcache] add bcache_abort() b6e6ea2d6578 [bcache] Bring bcache into sync with master branch e55210302787 [radix-tree] Bring radix-tree up to date with the master branch 245d7fcd5905 fix dev_unset_last_byte after write error Hello,
I have been running tests on the systems that I have been using for reproducing this.
With lvm2-2.02.185-2.el7_7.2.x86_64/device-mapper-1.02.158-2.el7_7.2.x86_64 the overwriting happens and pvresize fails to resize the PV on mpathd:
pvs -a -v before the test:
PV VG Fmt Attr PSize PFree DevSize PV UUID
/dev/mapper/mpatha vg_a lvm2 a-- <50,00g <25,00g 50,00g OWPpmm-FiyJ-CvaO-eVnd-Tgej-SkkK-DkroeM
/dev/mapper/mpathb vg_b lvm2 a-- <50,00g <25,00g 50,00g 19qYTb-GNSN-DclK-85eW-zJm3-xdf9-MHcezg
/dev/mapper/mpathc vg_c lvm2 a-- <50,00g <25,00g 50,00g Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D
/dev/mapper/mpathd vg_d lvm2 a-- <50,00g <25,00g 56,00g AFNrSH-mwMc-QQz0-jBSQ-MoQI-X2gj-VKq5Xe
/dev/mapper/mpathe vg_e lvm2 a-- <50,00g <25,00g 50,00g Z6TOqr-n9mU-Fy8e-Rmt2-0Iat-lHCt-rVQbbS
/dev/mapper/mpathf vg_f lvm2 a-- <50,00g <25,00g 50,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz
/dev/mapper/mpathg vg_g lvm2 a-- <50,00g <25,00g 50,00g Ydv7Lq-1WrC-O9Sc-578S-qNkt-jXpJ-V3tGlB
/dev/sda1 --- 0 0 500,00m
/dev/sda2 r7vg lvm2 a-- <9,51g <2,38g 9,51g xW57i5-rfdb-30UH-fnLG-GbAm-DDPL-eLz8d2
pvs -a -v after the test:
WARNING: Not using device /dev/mapper/mpathd for PV XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz.
WARNING: PV XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz prefers device /dev/mapper/mpathf because device size is correct.
PV VG Fmt Attr PSize PFree DevSize PV UUID
/dev/mapper/mpatha vg_a lvm2 a-- <50,00g <25,00g 50,00g OWPpmm-FiyJ-CvaO-eVnd-Tgej-SkkK-DkroeM
/dev/mapper/mpathb vg_b lvm2 a-- <50,00g <25,00g 50,00g 19qYTb-GNSN-DclK-85eW-zJm3-xdf9-MHcezg
/dev/mapper/mpathc vg_c lvm2 a-- <50,00g <25,00g 50,00g Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D
/dev/mapper/mpathd vg_f lvm2 d-- 0 0 56,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz
/dev/mapper/mpathe vg_e lvm2 a-- <50,00g <25,00g 50,00g Z6TOqr-n9mU-Fy8e-Rmt2-0Iat-lHCt-rVQbbS
/dev/mapper/mpathf vg_f lvm2 a-- <50,00g <25,00g 50,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz
/dev/mapper/mpathg vg_g lvm2 a-- <50,00g <25,00g 50,00g Ydv7Lq-1WrC-O9Sc-578S-qNkt-jXpJ-V3tGlB
/dev/sda1 --- 0 0 500,00m
/dev/sda2 r7vg lvm2 a-- <9,51g <2,38g 9,51g xW57i5-rfdb-30UH-fnLG-GbAm-DDPL-eLz8d2
With lvm2-libs-2.02.186-6.el7.x86_64/device-mapper-1.02.164-6.el7.x86_64 the overwriting does not happen and pvs succeeds resizing the PV on mpathd
pvs -a -v before the test:
PV VG Fmt Attr PSize PFree DevSize PV UUID
/dev/mapper/mpatha vg_a lvm2 a-- <50,00g <25,00g 50,00g OWPpmm-FiyJ-CvaO-eVnd-Tgej-SkkK-DkroeM
/dev/mapper/mpathb vg_b lvm2 a-- <50,00g <25,00g 50,00g 19qYTb-GNSN-DclK-85eW-zJm3-xdf9-MHcezg
/dev/mapper/mpathc vg_c lvm2 a-- <50,00g <25,00g 50,00g Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D
/dev/mapper/mpathd vg_d lvm2 a-- <50,00g <25,00g 56,00g AFNrSH-mwMc-QQz0-jBSQ-MoQI-X2gj-VKq5Xe
/dev/mapper/mpathe vg_e lvm2 a-- <50,00g <25,00g 50,00g Z6TOqr-n9mU-Fy8e-Rmt2-0Iat-lHCt-rVQbbS
/dev/mapper/mpathf vg_f lvm2 a-- <50,00g <25,00g 50,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz
/dev/mapper/mpathg vg_g lvm2 a-- <50,00g <25,00g 50,00g Ydv7Lq-1WrC-O9Sc-578S-qNkt-jXpJ-V3tGlB
/dev/sda1 --- 0 0 500,00m
/dev/sda2 r7vg lvm2 a-- <9,51g <2,38g 9,51g xW57i5-rfdb-30UH-fnLG-GbAm-DDPL-eLz8d2
pvs -a -v after the test:
PV VG Fmt Attr PSize PFree DevSize PV UUID
/dev/mapper/mpatha vg_a lvm2 a-- <50,00g <25,00g 50,00g OWPpmm-FiyJ-CvaO-eVnd-Tgej-SkkK-DkroeM
/dev/mapper/mpathb vg_b lvm2 a-- <50,00g <25,00g 50,00g 19qYTb-GNSN-DclK-85eW-zJm3-xdf9-MHcezg
/dev/mapper/mpathc vg_c lvm2 a-- <50,00g <25,00g 50,00g Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D
/dev/mapper/mpathd vg_d lvm2 a-- <56,00g <31,00g 56,00g AFNrSH-mwMc-QQz0-jBSQ-MoQI-X2gj-VKq5Xe
/dev/mapper/mpathe vg_e lvm2 a-- <50,00g <25,00g 50,00g Z6TOqr-n9mU-Fy8e-Rmt2-0Iat-lHCt-rVQbbS
/dev/mapper/mpathf vg_f lvm2 a-- <50,00g <25,00g 50,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz
/dev/mapper/mpathg vg_g lvm2 a-- <50,00g <25,00g 50,00g Ydv7Lq-1WrC-O9Sc-578S-qNkt-jXpJ-V3tGlB
/dev/sda1 --- 0 0 500,00m
/dev/sda2 r7vg lvm2 a-- <9,51g <2,38g 9,51g xW57i5-rfdb-30UH-fnLG-GbAm-DDPL-eLz8d2
The test command I use is:
pvresize -vvvv --config 'devices{ global_filter = [ "a|/dev/mapper/mpath|", "r|.*|" ] }' /dev/mapper/mpathd
(the global_filter is added to save sda2 which contains root)
Thanks and Regards,
Alexandros
Thanks Alexandros! Moving to verified based on test results shown in comment #14. (In reply to Corey Marthaler from comment #15) > Thanks Alexandros! > > Moving to verified based on test results shown in comment #14. Just to make sure: I have only been testing for the problem mentioned in the BZ description (i.e. running pvresize). If there are any other tests you normally run with new packages, then I have not run these. Regards, Alexandros Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1129 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |
Created attachment 1648656 [details] verbose output and strace of a failed pvresize and scripts Description of problem: The problem appeared in a multi-node SAP HANA cluster and is triggered by factors related to the cluster architecture (mostly scsi reservations preventing write operations on all shared LUNs). The result is that pvresize overwrites PV headers of some PVs with headers from other PVs. As an example: "pvresize /dev/mapper/mpathd" results in the PV header of mpathf being written on mpathd and of mpathc being written on sda2. The sequence of events we can see in verbose logs is: 1) pvresize detects that mpathf has an "old extension header" and needs to be updated: 2) There is a scsi reservation on mpathf that is blocking writes from the node that pvresize runs on. As a result, pvresize fails to write the updated header. 3) Later in the pvresize execution, mpathd is updated. An strace from pvresize, shows the header of mpathf being written on mptahd. Version-Release number of selected component (if applicable): Reproduced with lvm2-2.02.185-2.el7.x86_64 & lvm2-2.02.185-2.el7_7.2.x86_64 How reproducible: So far 100% persistent. Steps to Reproduce: 1. Create 2 systems (node1 and node2) with 7 shared LUNs (mpatha-mpathg). This will need also a 3rd system being a "storage array". I have been using targetcli/LIO on RHEL7 as my storage. 2. On each shared PV, create a VG (vg_a - vg_g) and in each VG an LV (lv_a - lv_g). This needs to be done with an old lvm version. (I have used lvm2-2.02.130-5.el7.x86_64 - RHEL 7.2) 3. Add a "Write Exclusive, registrants only" scsi reservation from node2 on LUNs: mpatha mpathb mpathc mpathe mpathf Add a "Write Exclusive, registrants only" scsi reservation from node1 on LUNs: mpathd mpathg Now both nodes have read access on all LUNs, but they can only write on the LUNs each of them has reserved. 4. Update LVM. 5. On the shared storage, increase the size of LUN mpathd. Rescan the scsi bus and multipath (e.g. rescan-scsi-bus.sh -m -s) to detect the change. 6. On node1 (that owns mpathd) run pvresize /dev/mapper/mpathd Actual results: pvresize fails. After the failure pvs -v -a looks similar to: # pvs -v -a WARNING: Not using device /dev/mapper/mpathc for PV Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D. WARNING: Not using device /dev/mapper/mpathd for PV XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz. WARNING: PV Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D prefers device /dev/sda2 because device is used by LV. WARNING: PV XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz prefers device /dev/mapper/mpathf because device size is correct. PV VG Fmt Attr PSize PFree DevSize PV UUID /dev/mapper/mpatha vg_a lvm2 a-- <50,00g 25,00g 50,00g OWPpmm-FiyJ-CvaO-eVnd-Tgej-SkkK-DkroeM /dev/mapper/mpathb vg_b lvm2 a-- <50,00g 25,00g 50,00g 19qYTb-GNSN-DclK-85eW-zJm3-xdf9-MHcezg /dev/mapper/mpathc [unknown] lvm2 d-- 0 0 50,00g Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D <---+ /dev/mapper/mpathd vg_f lvm2 d-- 0 0 54,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz <-+ | /dev/mapper/mpathe vg_e lvm2 a-- <50,00g 25,00g 50,00g Z6TOqr-n9mU-Fy8e-Rmt2-0Iat-lHCt-rVQbbS | | /dev/mapper/mpathf vg_f lvm2 a-- <50,00g 25,00g 50,00g XAJZEO-hxv3-c7jz-KfLY-VgsT-HWwE-2C4dvz <-+ | /dev/mapper/mpathg vg_g lvm2 a-- <50,00g 25,00g 50,00g Ydv7Lq-1WrC-O9Sc-578S-qNkt-jXpJ-V3tGlB | /dev/sda1 --- 0 0 500,00m | /dev/sda2 lvm2 --- 50,00g 50,00g 5,51g Rvhf2V-peNB-WLx9-yrP0-x8mi-5S9l-9AIA6D <---+ The header of mpathf is cloned on mpathd The header of mpathc is cloned on sda2 Expected results: pvresize succeeds and does not write wrong PV headers on other devices. Additional info: 1) Using the global_filter provides a functional workaround, by masking all other devices except mpathd. e.g.: pvresize --config 'devices{ global_filter = [ "a|/dev/mapper/mpathd|" , "r|.*|" ] }' /dev/mapper/mpathd both succeeds and does not overwrite any other device. 2) I am attaching an archive containing verbose pvresize output and an strace collected in parallel from a failed run. It also contains the scripts I am using to setup/clear and view the scsi reservations. 3) I have been trying to get a simpler reproducer by limiting write access to VM disks, but failed so far.