Bug 1966712
| Summary: | mdadm erroneously reports incorrect checksum | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Chris Moore <christopherm> |
| Component: | mdadm | Assignee: | XiaoNi <xni> |
| Status: | CLOSED ERRATA | QA Contact: | Fine Fan <ffan> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | CentOS Stream | CC: | alex.iribarren, bstinson, carl, daniel.vanderster, davide, dledford, ffan, jamien, janguyen, jdonohue, jwboyer, mharri, ncroxon, ngompa13, pcahyna, rmeggins, xni, yizhan |
| Target Milestone: | beta | Keywords: | Triaged |
| Target Release: | 8.5 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | mdadm-4.2-rc1_3.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-11-09 20:02:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The patch is at https://marc.info/?l=linux-raid&m=162259662926315&w=2. It needs to wait for ack from maintainer. I think I've found the issue. sb->bitmap_offset can be negative (in my case it's -16), but struct mdp_superblock_1 defines it as __u32, so it has to be cast to an int32_t before doing the math on it. mdadm version 4.1 had the cast, but it disappeared in version 4.2. The following change seems to fix the problem:
$ git diff super1.c
diff --git a/super1.c b/super1.c
index c05e6237..f7981e3d 100644
--- a/super1.c
+++ b/super1.c
@@ -2631,7 +2631,7 @@ static int locate_bitmap1(struct supertype *st, int fd, int node_num)
else
ret = -1;
- offset = __le64_to_cpu(sb->super_offset) + __le32_to_cpu(sb->bitmap_offset);
+ offset = __le64_to_cpu(sb->super_offset) + (int32_t) __le32_to_cpu(sb->bitmap_offset);
if (node_num) {
bms = (bitmap_super_t*)(((char*)sb)+MAX_SB_SIZE);
bm_sectors_per_node = calc_bitmap_size(bms, 4096) >> 9;
Hi Chris The patch is right. I've sent the link at comment1. Make comment1 not private. Sorry for this. By the way, are you only testing with super 1.0 or you use super 1.0 in product? If for product, why don't use super1.2? The reason I ask this question is that I want to know more responses from different people :) Regards Xiao (In reply to XiaoNi from comment #3) > Hi Chris > > The patch is right. I've sent the link at comment1. Make comment1 not > private. Sorry for this. > By the way, are you only testing with super 1.0 or you use super 1.0 in > product? If for product, > why don't use super1.2? The reason I ask this question is that I want to > know more responses > from different people :) > > Regards > Xiao That's an interesting question. This RAID was setup by the RHEL Anaconda installer. On the installer GUI we create a 512 MiB /boot/efi partition, and the remainder of the disk is mounted to /. Both are created as RAID 1 by selecting "RAID" as the type in the GUI. I don't know of any place that we can select the superblock format. But it's odd that we get 1.0, since I think 1.2 is the default for mdadm --create. (In reply to Chris Moore from comment #4) > (In reply to XiaoNi from comment #3) > > Hi Chris > > > > The patch is right. I've sent the link at comment1. Make comment1 not > > private. Sorry for this. > > By the way, are you only testing with super 1.0 or you use super 1.0 in > > product? If for product, > > why don't use super1.2? The reason I ask this question is that I want to > > know more responses > > from different people :) > > > > Regards > > Xiao > > That's an interesting question. This RAID was setup by the RHEL Anaconda > installer. On the installer GUI we create a 512 MiB /boot/efi partition, > and the remainder of the disk is mounted to /. Both are created as RAID 1 > by selecting "RAID" as the type in the GUI. I don't know of any place that > we can select the superblock format. But it's odd that we get 1.0, since I > think 1.2 is the default for mdadm --create. The installer automatically creates a RAID1 /boot partition as version 1.0 so that it can be read by a non-RAID aware boot loader. That used to be a requirement before grub2 was the norm. (In reply to Doug Ledford from comment #5) > (In reply to Chris Moore from comment #4) > > (In reply to XiaoNi from comment #3) > > > Hi Chris > > > > > > The patch is right. I've sent the link at comment1. Make comment1 not > > > private. Sorry for this. > > > By the way, are you only testing with super 1.0 or you use super 1.0 in > > > product? If for product, > > > why don't use super1.2? The reason I ask this question is that I want to > > > know more responses > > > from different people :) > > > > > > Regards > > > Xiao > > > > That's an interesting question. This RAID was setup by the RHEL Anaconda > > installer. On the installer GUI we create a 512 MiB /boot/efi partition, > > and the remainder of the disk is mounted to /. Both are created as RAID 1 > > by selecting "RAID" as the type in the GUI. I don't know of any place that > > we can select the superblock format. But it's odd that we get 1.0, since I > > think 1.2 is the default for mdadm --create. > > The installer automatically creates a RAID1 /boot partition as version 1.0 > so that it can be read by a non-RAID aware boot loader. That used to be a > requirement before grub2 was the norm. Sorry, misread your prior statement. We create /boot/efi as a 1.0 superblock array because the EFI partition must be BIOS readable and the BIOS doesn't know how to skip a superblock at the beginning of the device. It is a hard requirement that an EFI partition be superblock 1.0 as a result. This doesn't change regardless of the grub version in use (although we also used to create /boot partitions as superblock 1.0 when grub 1 was in use too). Exactly the same issue here, and it causes Stream 8 anaconda installation to fail on our hardware.
Our anaconda raids are defined like this:
```
# partition table
%pre
#!/bin/sh
DISKS=$(lsblk -d -o name,rota,fstype --noheadings | grep ^sd | grep -v -i 'LVM2_member' | awk '{if ($2=='0') print $1}' | head -n 2)
ONE=$(echo ${DISKS} | cut -d ' ' -f 1)
TWO=$(echo ${DISKS} | cut -d ' ' -f 2)
# it is very important to only clearpart on the first two drives.
# there are often other drives sdc, etc.. which can be OSD journals
# or OSD data disks. We must not overwrite those partition tables.
echo "ignoredisk --only-use=${ONE},${TWO}" > /tmp/part-include
echo "clearpart --all --initlabel --drives ${ONE},${TWO}" >> /tmp/part-include
# for /boot
echo "partition raid.01 --size 1024 --ondisk ${ONE}" >> /tmp/part-include
echo "partition raid.02 --size 1024 --ondisk ${TWO}" >> /tmp/part-include
# for /boot/efi
echo "partition raid.11 --size 256 --ondisk ${ONE}" >> /tmp/part-include
echo "partition raid.12 --size 256 --ondisk ${TWO}" >> /tmp/part-include
# for /
echo "partition raid.21 --size 1 --ondisk ${ONE} --grow" >> /tmp/part-include
echo "partition raid.22 --size 1 --ondisk ${TWO} --grow" >> /tmp/part-include
echo "raid /boot --level=1 --device=boot --fstype=xfs raid.01 raid.02" >> /tmp/part-include
echo "raid /boot/efi --level=1 --device=boot_efi --fstype=efi raid.11 raid.12" >> /tmp/part-include
echo "raid / --level=1 --device=root --fstype=xfs raid.21 raid.22" >> /tmp/part-include
%end
# use the partition table defined above and dumped to file
%include /tmp/part-include
```
Installation fails with:
dasbus.error.DBusError: Process reported exit code 1: mdadm: RUN_ARRAY failed: Invalid argument
dmesg shows:
md126: bitmap superblock UUID mismatch
md126: fialed to create bitmap (-22)
And mdadm -E shows 1-bit checksum errors on the members of boot_efi, just like Chris posted.
(In reply to Doug Ledford from comment #6) > > Sorry, misread your prior statement. We create /boot/efi as a 1.0 > superblock array because the EFI partition must be BIOS readable and the > BIOS doesn't know how to skip a superblock at the beginning of the device. > It is a hard requirement that an EFI partition be superblock 1.0 as a > result. This doesn't change regardless of the grub version in use (although > we also used to create /boot partitions as superblock 1.0 when grub 1 was in > use too). Hi Doug Thanks for the explanation. (In reply to Dan van der Ster from comment #7) > Exactly the same issue here, and it causes Stream 8 anaconda installation to > fail on our hardware. > Hi Dan Could you try this patch https://marc.info/?l=linux-raid&m=162259662926315&w=2 > Could you try this patch https://marc.info/?l=linux-raid&m=162259662926315&w=2
First, a clear reproducer for you with 4.2-rc1_1, maybe to add to some test framework:
```
# rpm -q mdadm
mdadm-4.2-rc1_1.el8.x86_64
# dd if=/dev/zero of=a.dat b
s=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 0.107383 s, 2.5 GB/s
# dd if=/dev/zero of=b.dat b
s=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 0.108689 s, 2.5 GB/s
# losetup /dev/loop0 a.dat
# losetup /dev/loop1 b.dat
# mdadm --create /dev/md0 --level=1 --metadata=1.0 --bitmap=internal --raid-devices=2 /dev/loop0 /dev/loop1
mdadm: RUN_ARRAY failed: Invalid argument
# mdadm -E /dev/loop1
/dev/loop1:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x0
Array UUID : c5d9f7f7:53213a6c:90828f3b:c5f8ba22
Name : 0
Creation Time : Wed Jun 9 11:08:45 2021
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 524256 sectors (255.98 MiB 268.42 MB)
Array Size : 262080 KiB (255.94 MiB 268.37 MB)
Used Dev Size : 524160 sectors (255.94 MiB 268.37 MB)
Super Offset : 524272 sectors
Unused Space : before=0 sectors, after=104 sectors
State : active
Device UUID : d9ca4edc:50ee669f:c3b6dd74:f42d1b02
Update Time : Wed Jun 9 11:08:45 2021
Bad Block Log : 512 entries available at offset -8 sectors
Checksum : 1ff86d78 - expected 1ff86d77
Events : 0
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
# dmesg | tail
[161542.795797] md/raid1:md0: not clean -- starting background reconstruction
[161542.795798] md/raid1:md0: active with 2 out of 2 mirrors
[161542.795813] md0: invalid bitmap file superblock: bad magic
[161542.795815] md0: failed to create bitmap (-22)
[161542.795852] md: md0 stopped.
```
Now I built with that fix and it works:
```
# rpm -Uvh /tmp/mdadm/mdadm-4.2-rc1_2.el8.x86_64.rpm
Verifying... ################################# [100%]
Preparing... ################################# [100%]
Updating / installing...
1:mdadm-4.2-rc1_2.el8 ################################# [ 50%]
Cleaning up / removing...
2:mdadm-4.2-rc1_1.el8 ################################# [100%]
# dd if=/dev/zero of=/dev/loop0
dd: writing to '/dev/loop0': No space left on device
524289+0 records in
524288+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 1.50245 s, 179 MB/s
# dd if=/dev/zero of=/dev/loop1
dd: writing to '/dev/loop1': No space left on device
524289+0 records in
524288+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 1.65947 s, 162 MB/s
# mdadm --create /dev/md0 --level=1 --metadata=1.0 --bitmap=internal --raid-devices=2 /dev/loop0 /dev/loop1
mdadm: array /dev/md0 started.
# mdadm -E /dev/loop0
/dev/loop0:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x1
Array UUID : 719d639a:d16c5b0b:188ca971:cbb9e114
Name : 0
Creation Time : Wed Jun 9 11:50:04 2021
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 524256 sectors (255.98 MiB 268.42 MB)
Array Size : 262080 KiB (255.94 MiB 268.37 MB)
Used Dev Size : 524160 sectors (255.94 MiB 268.37 MB)
Super Offset : 524272 sectors
Unused Space : before=0 sectors, after=96 sectors
State : clean
Device UUID : e6aa2d8d:43097efd:40b6e6fc:b1922f80
Internal Bitmap : -16 sectors from superblock
Update Time : Wed Jun 9 11:50:05 2021
Bad Block Log : 512 entries available at offset -8 sectors
Checksum : 9ed9b9d9 - correct
Events : 17
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
# dmesg | tail
[164021.431804] md/raid1:md0: not clean -- starting background reconstruction
[164021.431805] md/raid1:md0: active with 2 out of 2 mirrors
[164021.433174] md0: detected capacity change from 0 to 268369920
[164021.433229] md: resync of RAID array md0
[164022.621793] md: md0: resync done.
```
So I assume anaconda will also work.
(In reply to Dan van der Ster from comment #10) > > Could you try this patch https://marc.info/?l=linux-raid&m=162259662926315&w=2 > > First, a clear reproducer for you with 4.2-rc1_1, maybe to add to some test > framework: Hi Fine Could you add the test case to our regression test. > ``` > > > Now I built with that fix and it works: Thanks Xiao Recorded Adding. We're unable to install new machines with CS8 due to this bug. What is the ETA for a fix? I see @ncroxon set a target release of 8.6, which I'm not sure if that means a year... I'll ping the upstream maintainer again. I have sent the patch to upstream some days ago. We still have some time to fix this in 8.5. So change target to 8.5 again. Hello, I have encountered the problem with creating 1.0 metadata on EFI partition (from inside anaconda) too: # mdadm --create /dev/md/boot_efi --run --level=raid1 --raid-devices=2 --metadata=1.0 --bitmap=internal --chunk=512 /dev/vdb1 /dev/vdb2 mdadm: RUN_ARRAY failed: Invalid argument kernel says: [ 119.232426] md/raid1:md127: not clean -- starting background reconstruction [ 119.232426] md/raid1:md127: active with 2 out of 2 mirrors [ 119.233852] md127: invalid bitmap file superblock: bad magic [ 119.233856] md127: failed to create bitmap (-22) [ 119.233942] md: md127 stopped. With the updated mdadm-4.2-rc1_3.el8, I don't get this problem anymore, but I get another one: # mdadm --create /dev/md/boot_efi --run --level=raid1 --raid-devices=2 --metadata=1.0 --bitmap=internal --chunk=512 /dev/vdb1 /dev/vdb2 mdadm: specifying chunk size is forbidden for this level now this is not specific to 1.0 metadata, with 1.2 the same thing happens. In the previous version mdadm-4.2-rc1_2.el8, I have not had this problem when using 1.2 metadata. *** Bug 1917308 has been marked as a duplicate of this bug. *** no new issue found on mdadm-4.2-rc1_3.el8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (mdadm bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4494 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |
Description of problem: The "Examine" mode of mdadm can report an incorrect checksum when the checksum in the superblock is correct. Version-Release number of selected component (if applicable): 4.2-rc1 How reproducible: always Steps to Reproduce: 1. Create a RAID 1 array with mdadm version 4.1 2. Update mdadm to version 4.2-rc1 3. Use "mdadm -E" to examine one of the devices in the array Actual results: - Feature Map is reported as 0x0 - Checksum is reported as being incorrect Example: [lab@cl1-fair-02 ~]$ sudo mdadm -E /dev/nvme2n1p1 /dev/nvme2n1p1: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 01291ee1:6bb27d23:98f42aba:9a68c40c Name : cl1-fair-02.nvidia.com:boot-efi Creation Time : Tue Jun 1 06:33:46 2021 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 1048544 sectors (511.98 MiB 536.85 MB) Array Size : 524224 KiB (511.94 MiB 536.81 MB) Used Dev Size : 1048448 sectors (511.94 MiB 536.81 MB) Super Offset : 1048560 sectors Unused Space : before=0 sectors, after=104 sectors State : active Device UUID : 9549c1c3:af4acc36:685b276d:39f87113 Update Time : Tue Jun 1 07:03:17 2021 Bad Block Log : 512 entries available at offset -8 sectors Checksum : d6fffdcf - expected d6fffdce Events : 28 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing, 'R' == replacing) Expected results: - Feature Map should be 0x1 if bitmap is enabled - Checksum should not show an error /dev/nvme2n1p1: Magic : a92b4efc Version : 1.0 Feature Map : 0x1 Array UUID : 01291ee1:6bb27d23:98f42aba:9a68c40c Name : cl1-fair-02.nvidia.com:boot-efi Creation Time : Tue Jun 1 06:33:46 2021 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 1048544 sectors (511.98 MiB 536.85 MB) Array Size : 524224 KiB (511.94 MiB 536.81 MB) Used Dev Size : 1048448 sectors (511.94 MiB 536.81 MB) Super Offset : 1048560 sectors Unused Space : before=0 sectors, after=96 sectors State : active Device UUID : 9549c1c3:af4acc36:685b276d:39f87113 Internal Bitmap : -16 sectors from superblock Update Time : Tue Jun 1 06:54:17 2021 Bad Block Log : 512 entries available at offset -8 sectors Checksum : d6fffbab - correct Events : 20 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing, 'R' == replacing) Additional info: This bug was introduced in commit commit 1fe2e1007310778d0551d5c34317e5318507399d That patch changed the behavior of locate_bitmap1() in super1.c. If the BITMAP_OFFSET bit is set in the feature map of the superblock, mdadm tries to read the bitmap. Because locate_bitmap1() is returning an invalid location the read of the bitmap fails. When this read fails mdadm clears the BITMAP_OFFSET bit in the header. It then calculates a checksum based on this modified in-memory copy of the superblock, but this checksum is now off by 1 because the BITMAP_OFFSET bit was cleared. It then reports an error because the checksum doesn't match the one read from the disk.