Bug 1915580
Summary: | dmeventd segfault during raid integrity failure testing | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Corey Marthaler <cmarthal> |
Component: | lvm2 | Assignee: | David Teigland <teigland> |
lvm2 sub component: | dmeventd | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | agk, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, teigland, thornber, zkabelac |
Version: | 8.4 | ||
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.03.11-2.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-18 15:02:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2021-01-13 00:27:32 UTC
segfault fixed by https://sourceware.org/git/?p=lvm2.git;a=commit;h=0534723a2de62da913dfd88d40ee6f8b8b93ac56 The segfault was caused by a bug on the error exit path while trying to revert imeta lvs that were created to replace the failed image. There is not a way to naturally force that error exit path to be followed, so to verify the fix I added code force that code path. before: [ 0:07] 13:27:30.600728 lvconvert[985988] metadata/integrity_manip.c:627 forcing error prior to wipe_lv LV1_rimage_2_imeta [ 0:07] 13:27:30.600738 lvconvert[985988] metadata/integrity_manip.c:628 <backtrace> [ 0:07] 13:27:30.600760 lvconvert[985988] metadata/integrity_manip.c:779 Failed to add integrity. [ 0:07] 6,6041,5001999686211,-;lvm[985988]: segfault at f0 ip 000055e75cebee47 sp 00007fff0ca0b630 error 4 in lvm[55e75cdca000+27e000] [ 0:07] /root/lvm.git/test/shell/integrity-misc.sh: line 203: 985988 Segmentation fault (core dumped) lvconvert -vvvv -y --repair $vg/$lv1 after: [ 0:07] 13:24:18.505498 lvconvert[982038] metadata/integrity_manip.c:627 forcing error prior to wipe_lv LV1_rimage_2_imeta [ 0:07] 13:24:18.505506 lvconvert[982038] metadata/integrity_manip.c:628 <backtrace> [ 0:07] 13:24:18.505519 lvconvert[982038] metadata/integrity_manip.c:779 Failed to add integrity. There appears to be a second bug in the backtrace related to a failure to open a new imeta LV to zero it. That bug is what exposed the segfault fixed here. A different bz should probably be created for the failure to open/wipe the new imeta LV. Marking Verified:Tested kernel-4.18.0-277.el8 BUILT: Wed Jan 20 09:06:28 CST 2021 lvm2-2.03.11-2.el8 BUILT: Thu Jan 28 14:40:36 CST 2021 lvm2-libs-2.03.11-2.el8 BUILT: Thu Jan 28 14:40:36 CST 2021 ================================================================================ Iteration 1.2 started at Tue 02 Feb 2021 09:25:30 PM CST ================================================================================ Scenario kill_random_synced_raid1_3legs: Kill random leg of synced 3 leg raid1 volume(s) ********* RAID hash info for this scenario ********* * names: synced_random_raid1_3legs_1 synced_random_raid1_3legs_2 * sync: 1 * type: raid1 * -m |-i value: 3 * leg devices: /dev/sdb1 /dev/sdg1 /dev/sdc1 /dev/sdi1 * spanned legs: 0 * manual repair: 0 * no MDA devices: * failpv(s): /dev/sdc1 * failnode(s): hayes-03 * integrity stack: 1 (Due to integrity stack, be mindful of false failures that are reliant on message checks that could be lost due to rate-limiting of corruption and other messages) * raid fault policy: allocate ****************************************************** Creating raids(s) on hayes-03... hayes-03: lvcreate --yes --type raid1 -m 3 -n synced_random_raid1_3legs_1 -L 500M black_bird /dev/sdb1:0-2400 /dev/sdg1:0-2400 /dev/sdc1:0-2400 /dev/sdi1:0-2400 hayes-03: lvcreate --yes --type raid1 -m 3 -n synced_random_raid1_3legs_2 -L 500M black_bird /dev/sdb1:0-2400 /dev/sdg1:0-2400 /dev/sdc1:0-2400 /dev/sdi1:0-2400 [...] Fix verified in the latest nightly kernel and lvm. kernel-4.18.0-287.el8 BUILT: Thu Feb 11 03:15:20 CST 2021 lvm2-2.03.11-4.el8 BUILT: Thu Feb 11 04:35:23 CST 2021 lvm2-libs-2.03.11-4.el8 BUILT: Thu Feb 11 04:35:23 CST 2021 ================================================================================ Iteration 1.1 started at Tue 16 Feb 2021 01:38:55 PM CST ================================================================================ Scenario kill_random_synced_raid1_3legs: Kill random leg of synced 3 leg raid1 volume(s) ********* RAID hash info for this scenario ********* * names: synced_random_raid1_3legs_1 synced_random_raid1_3legs_2 * sync: 1 * type: raid1 * -m |-i value: 3 * leg devices: /dev/sdj1 /dev/sdk1 /dev/sdd1 /dev/sde1 * spanned legs: 0 * manual repair: 0 * no MDA devices: * failpv(s): /dev/sdd1 * failnode(s): hayes-03 * integrity stack: 1 (Due to integrity stack, be mindful of false failures that are reliant on message checks that could be lost due to rate-limiting of corruption and other messages) * raid fault policy: allocate ****************************************************** Creating raids(s) on hayes-03... hayes-03: lvcreate --yes --type raid1 -m 3 -n synced_random_raid1_3legs_1 -L 500M black_bird /dev/sdj1:0-2400 /dev/sdk1:0-2400 /dev/sdd1:0-2400 /dev/sde1:0-2400 hayes-03: lvcreate --yes --type raid1 -m 3 -n synced_random_raid1_3legs_2 -L 500M black_bird /dev/sdj1:0-2400 /dev/sdk1:0-2400 /dev/sdd1:0-2400 /dev/sde1:0-2400 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices synced_random_raid1_3legs_1 rwi-a-r--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_0(0),synced_random_raid1_3legs_1_rimage_1(0),synced_random_raid1_3legs_1_rimage_2(0),synced_random_raid1_3legs_1_rimage_3(0) [synced_random_raid1_3legs_1_rimage_0] iwi-aor--- 500.00m /dev/sdj1(1) [synced_random_raid1_3legs_1_rimage_1] iwi-aor--- 500.00m /dev/sdk1(1) [synced_random_raid1_3legs_1_rimage_2] iwi-aor--- 500.00m /dev/sdd1(1) [synced_random_raid1_3legs_1_rimage_3] iwi-aor--- 500.00m /dev/sde1(1) [synced_random_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(0) [synced_random_raid1_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdk1(0) [synced_random_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdd1(0) [synced_random_raid1_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sde1(0) synced_random_raid1_3legs_2 rwi-a-r--- 500.00m 37.58 synced_random_raid1_3legs_2_rimage_0(0),synced_random_raid1_3legs_2_rimage_1(0),synced_random_raid1_3legs_2_rimage_2(0),synced_random_raid1_3legs_2_rimage_3(0) [synced_random_raid1_3legs_2_rimage_0] Iwi-aor--- 500.00m /dev/sdj1(127) [synced_random_raid1_3legs_2_rimage_1] Iwi-aor--- 500.00m /dev/sdk1(127) [synced_random_raid1_3legs_2_rimage_2] Iwi-aor--- 500.00m /dev/sdd1(127) [synced_random_raid1_3legs_2_rimage_3] Iwi-aor--- 500.00m /dev/sde1(127) [synced_random_raid1_3legs_2_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(126) [synced_random_raid1_3legs_2_rmeta_1] ewi-aor--- 4.00m /dev/sdk1(126) [synced_random_raid1_3legs_2_rmeta_2] ewi-aor--- 4.00m /dev/sdd1(126) [synced_random_raid1_3legs_2_rmeta_3] ewi-aor--- 4.00m /dev/sde1(126) Waiting until all mirror|raid volumes become fully syncd... 2/2 mirror(s) are fully synced: ( 100.00% 100.00% ) Convert mirror/raid volume(s) to utilize integrity target volume(s) on hayes-03... lvconvert --yes --raidintegrity y --raidintegritymode journal black_bird/synced_random_raid1_3legs_1 lvconvert --yes --raidintegrity y --raidintegritymode journal black_bird/synced_random_raid1_3legs_2 Creating xfs on top of mirror(s) on hayes-03... Mounting mirrored xfs filesystems on hayes-03... Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices synced_random_raid1_3legs_1 rwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_0(0),synced_random_raid1_3legs_1_rimage_1(0),synced_random_raid1_3legs_1_rimage_2(0),synced_random_raid1_3legs_1_rimage_3(0) [synced_random_raid1_3legs_1_rimage_0] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_0_iorig(0) [synced_random_raid1_3legs_1_rimage_0_imeta] ewi-ao---- 12.00m /dev/sdj1(252) [synced_random_raid1_3legs_1_rimage_0_iorig] -wi-ao---- 500.00m /dev/sdj1(1) [synced_random_raid1_3legs_1_rimage_1] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_1_iorig(0) [synced_random_raid1_3legs_1_rimage_1_imeta] ewi-ao---- 12.00m /dev/sdk1(252) [synced_random_raid1_3legs_1_rimage_1_iorig] -wi-ao---- 500.00m /dev/sdk1(1) [synced_random_raid1_3legs_1_rimage_2] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_2_iorig(0) [synced_random_raid1_3legs_1_rimage_2_imeta] ewi-ao---- 12.00m /dev/sdd1(252) [synced_random_raid1_3legs_1_rimage_2_iorig] -wi-ao---- 500.00m /dev/sdd1(1) [synced_random_raid1_3legs_1_rimage_3] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_3_iorig(0) [synced_random_raid1_3legs_1_rimage_3_imeta] ewi-ao---- 12.00m /dev/sde1(252) [synced_random_raid1_3legs_1_rimage_3_iorig] -wi-ao---- 500.00m /dev/sde1(1) [synced_random_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(0) [synced_random_raid1_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdk1(0) [synced_random_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdd1(0) [synced_random_raid1_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sde1(0) synced_random_raid1_3legs_2 rwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_0(0),synced_random_raid1_3legs_2_rimage_1(0),synced_random_raid1_3legs_2_rimage_2(0),synced_random_raid1_3legs_2_rimage_3(0) [synced_random_raid1_3legs_2_rimage_0] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_0_iorig(0) [synced_random_raid1_3legs_2_rimage_0_imeta] ewi-ao---- 12.00m /dev/sdj1(255) [synced_random_raid1_3legs_2_rimage_0_iorig] -wi-ao---- 500.00m /dev/sdj1(127) [synced_random_raid1_3legs_2_rimage_1] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_1_iorig(0) [synced_random_raid1_3legs_2_rimage_1_imeta] ewi-ao---- 12.00m /dev/sdk1(255) [synced_random_raid1_3legs_2_rimage_1_iorig] -wi-ao---- 500.00m /dev/sdk1(127) [synced_random_raid1_3legs_2_rimage_2] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_2_iorig(0) [synced_random_raid1_3legs_2_rimage_2_imeta] ewi-ao---- 12.00m /dev/sdd1(255) [synced_random_raid1_3legs_2_rimage_2_iorig] -wi-ao---- 500.00m /dev/sdd1(127) [synced_random_raid1_3legs_2_rimage_3] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_3_iorig(0) [synced_random_raid1_3legs_2_rimage_3_imeta] ewi-ao---- 12.00m /dev/sde1(255) [synced_random_raid1_3legs_2_rimage_3_iorig] -wi-ao---- 500.00m /dev/sde1(127) [synced_random_raid1_3legs_2_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(126) [synced_random_raid1_3legs_2_rmeta_1] ewi-aor--- 4.00m /dev/sdk1(126) [synced_random_raid1_3legs_2_rmeta_2] ewi-aor--- 4.00m /dev/sdd1(126) [synced_random_raid1_3legs_2_rmeta_3] ewi-aor--- 4.00m /dev/sde1(126) PV=/dev/sdd1 synced_random_raid1_3legs_1_rimage_2_imeta: 1.0 synced_random_raid1_3legs_1_rimage_2_iorig: 1.0 synced_random_raid1_3legs_1_rmeta_2: 1.0 synced_random_raid1_3legs_2_rimage_2_imeta: 1.0 synced_random_raid1_3legs_2_rimage_2_iorig: 1.0 synced_random_raid1_3legs_2_rmeta_2: 1.0 Writing verification files (checkit) to mirror(s) on... ---- hayes-03 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- hayes-03 ---- **** Raid Integrity Corruption info for this verification ***** * Current legs: /dev/sdj1 /dev/sdk1 /dev/sdd1 /dev/sde1 * Image(s) to corrupt synced_random_raid1_3legs_1_rimage_0_iorig * PV to corrupt /dev/sdj1 * READ (non span): lvchange --writemostly /dev/sdj1:n black_bird * WRITE (non span): lvchange --writemostly /dev/sdk1:y black_bird * WRITE (non span): lvchange --writemostly /dev/sdd1:y black_bird * WRITE (non span): lvchange --writemostly /dev/sde1:y black_bird * (Clearing out OLD dmesg corruption detection notifications) *************************************************************** Verifying files (checkit) on mirror(s) on... ---- hayes-03 ---- lvchange -an black_bird/synced_random_raid1_3legs_1 lvchange -an black_bird/synced_random_raid1_3legs_2 * Corrupting an integrity image's PV WRITE: dd if=/dev/urandom of=/dev/sdj1 oflag=direct,sync bs=1M seek=25 count=28 28+0 records in 28+0 records out 29360128 bytes (29 MB, 28 MiB) copied, 0.463076 s, 63.4 MB/s Verifying files (checkit) on mirror(s) on... ---- hayes-03 ---- lvchange -ay black_bird/synced_random_raid1_3legs_1 Detecting corruption on bad image one of two ways: FULL READ: dd if=/dev/black_bird/synced_random_raid1_3legs_1 of=/dev/null iflag=direct 1024000+0 records in 1024000+0 records out 524288000 bytes (524 MB, 500 MiB) copied, 514.189 s, 1.0 MB/s Corruption mismatches reported: 114560 Cycling the activation to reset the mismatch counter (lvchange -a n|y black_bird/synced_random_raid1_3legs_1) Corruption mismatches reported: 0 re-mounting /mnt/synced_random_raid1_3legs_1 re-verifying checkit files in /mnt/synced_random_raid1_3legs_1 lvchange -ay black_bird/synced_random_raid1_3legs_2 Detecting corruption on bad image one of two ways: FULL READ: dd if=/dev/black_bird/synced_random_raid1_3legs_2 of=/dev/null iflag=direct 1024000+0 records in 1024000+0 records out 524288000 bytes (524 MB, 500 MiB) copied, 36.816 s, 14.2 MB/s Corruption mismatches reported: 0 Cycling the activation to reset the mismatch counter (lvchange -a n|y black_bird/synced_random_raid1_3legs_2) Corruption mismatches reported: 0 re-mounting /mnt/synced_random_raid1_3legs_2 re-verifying checkit files in /mnt/synced_random_raid1_3legs_2 ** SIGN of integrity correction found!! ** [11235.855288] md/raid1:mdX: read error corrected (1 sectors at 85823 on dm-3) Disabling device sdd on hayes-03rescan device... Error reading device /dev/sdd1 at 0 length 4096. Attempting I/O to cause mirror down conversion(s) on hayes-03 dd if=/dev/zero of=/mnt/synced_random_raid1_3legs_1/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB, 40 MiB) copied, 0.0197865 s, 2.1 GB/s dd if=/dev/zero of=/mnt/synced_random_raid1_3legs_2/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB, 40 MiB) copied, 0.020657 s, 2.0 GB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): WARNING: Couldn't find device with uuid k6nOBz-t9Ht-qtRk-ASjm-F2tc-Q7ea-vngB0F. WARNING: VG black_bird is missing PV k6nOBz-t9Ht-qtRk-ASjm-F2tc-Q7ea-vngB0F (last written to [unknown]). LV Attr LSize Cpy%Sync Devices synced_random_raid1_3legs_1 rwi-aor--- 500.00m 55.69 synced_random_raid1_3legs_1_rimage_0(0),synced_random_raid1_3legs_1_rimage_1(0),synced_random_raid1_3legs_1_rimage_2(0),synced_random_raid1_3legs_1_rimage_3(0) [synced_random_raid1_3legs_1_rimage_0] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_0_iorig(0) [synced_random_raid1_3legs_1_rimage_0_imeta] ewi-ao---- 12.00m /dev/sdj1(252) [synced_random_raid1_3legs_1_rimage_0_iorig] -wi-ao---- 500.00m /dev/sdj1(1) [synced_random_raid1_3legs_1_rimage_1] gwi-aor-w- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_1_iorig(0) [synced_random_raid1_3legs_1_rimage_1_imeta] ewi-ao---- 12.00m /dev/sdk1(252) [synced_random_raid1_3legs_1_rimage_1_iorig] -wi-ao---- 500.00m /dev/sdk1(1) [synced_random_raid1_3legs_1_rimage_2] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_2_iorig(0) [synced_random_raid1_3legs_1_rimage_2_imeta] ewi-ao---- 12.00m /dev/sdb1(255) [synced_random_raid1_3legs_1_rimage_2_iorig] -wi-ao---- 500.00m /dev/sdb1(130) [synced_random_raid1_3legs_1_rimage_3] gwi-aor-w- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_3_iorig(0) [synced_random_raid1_3legs_1_rimage_3_imeta] ewi-ao---- 12.00m /dev/sde1(252) [synced_random_raid1_3legs_1_rimage_3_iorig] -wi-ao---- 500.00m /dev/sde1(1) [synced_random_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(0) [synced_random_raid1_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdk1(0) [synced_random_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdb1(129) [synced_random_raid1_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sde1(0) synced_random_raid1_3legs_2 rwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_0(0),synced_random_raid1_3legs_2_rimage_1(0),synced_random_raid1_3legs_2_rimage_2(0),synced_random_raid1_3legs_2_rimage_3(0) [synced_random_raid1_3legs_2_rimage_0] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_0_iorig(0) [synced_random_raid1_3legs_2_rimage_0_imeta] ewi-ao---- 12.00m /dev/sdj1(255) [synced_random_raid1_3legs_2_rimage_0_iorig] -wi-ao---- 500.00m /dev/sdj1(127) [synced_random_raid1_3legs_2_rimage_1] gwi-aor-w- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_1_iorig(0) [synced_random_raid1_3legs_2_rimage_1_imeta] ewi-ao---- 12.00m /dev/sdk1(255) [synced_random_raid1_3legs_2_rimage_1_iorig] -wi-ao---- 500.00m /dev/sdk1(127) [synced_random_raid1_3legs_2_rimage_2] gwi-aor--- 500.00m 100.00 synced_random_raid1_3legs_2_rimage_2_iorig(0) [synced_random_raid1_3legs_2_rimage_2_imeta] ewi-ao---- 12.00m /dev/sdb1(126) [synced_random_raid1_3legs_2_rimage_2_iorig] -wi-ao---- 500.00m /dev/sdb1(1) [synced_random_raid1_3legs_2_rimage_3_imeta] ewi-ao---- 12.00m /dev/sde1(255) [synced_random_raid1_3legs_2_rimage_3_iorig] -wi-ao---- 500.00m /dev/sde1(127) [synced_random_raid1_3legs_2_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(126) [synced_random_raid1_3legs_2_rmeta_1] ewi-aor--- 4.00m /dev/sdk1(126) [synced_random_raid1_3legs_2_rmeta_2] ewi-aor--- 4.00m /dev/sdb1(0) [synced_random_raid1_3legs_2_rmeta_3] ewi-aor--- 4.00m /dev/sde1(126 Verifying FAILED device /dev/sdd1 is *NOT* in the volume(s) Verifying IMAGE device /dev/sdj1 *IS* in the volume(s) Verifying IMAGE device /dev/sdk1 *IS* in the volume(s) Verifying IMAGE device /dev/sde1 *IS* in the volume(s) Verify the rimage/rmeta dm devices remain after the failures Checking EXISTENCE and STATE of synced_random_raid1_3legs_1_rimage_2_imeta on: hayes-03 Checking EXISTENCE and STATE of synced_random_raid1_3legs_1_rimage_2_iorig on: hayes-03 Checking EXISTENCE and STATE of synced_random_raid1_3legs_1_rmeta_2 on: hayes-03 Checking EXISTENCE and STATE of synced_random_raid1_3legs_2_rimage_2_imeta on: hayes-03 Checking EXISTENCE and STATE of synced_random_raid1_3legs_2_rimage_2_iorig on: hayes-03 Checking EXISTENCE and STATE of synced_random_raid1_3legs_2_rmeta_2 on: hayes-03 [...] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1659 |