Bug 1443144
Summary: | Grub2 does not detect MD raid (level 1) 1.0 superblocks on 4k block devices | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Kyle Walker <kwalker> |
Component: | grub2 | Assignee: | Peter Jones <pjones> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Release Test Team <release-test-team-automation> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.3 | CC: | alexander, bgollahe, bugproxy, ccoates, cww, dlehman, hannsj_uhl, herrold, jkachuck, kcleveng, kwalker, release-test-team-automation, shalygin.k, thomas.jarosch, vanhoof |
Target Milestone: | rc | ||
Target Release: | 7.7 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-13 09:12:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1184945 | ||
Bug Blocks: | 1420851, 1477664, 1546815, 1598750, 1689420 |
Description
Kyle Walker
2017-04-18 15:08:07 UTC
------- Comment From ruddk.com 2017-04-19 13:58 EDT------- (In reply to comment #4) ... > Additional info: > Note, the issue was originally noted at installation-time as the default > superblock format for PPC64 (PReP boot) systems is 1.0 according to the > python-blivet library, used by the anaconda installer, code snippet below: > > def preCommitFixup(self, *args, **kwargs): > """ Determine create parameters for this set """ > mountpoints = kwargs.pop("mountpoints") > log_method_call(self, self.name, mountpoints) > > if "/boot" in mountpoints: > bootmountpoint = "/boot" > else: > bootmountpoint = "/" > > # If we are used to boot from we cannot use 1.1 metadata > if getattr(self.format, "mountpoint", None) == bootmountpoint or \ > getattr(self.format, "mountpoint", None) == "/boot/efi" or \ > self.format.type == "prepboot": > self.metadataVersion = "1.0" This is probably the key observation. It doesn't really make sense to have this restriction in place for disks with a PReP partition. The above appears to have already been backed out via the following commit: commit 8bce84025e0f0af9b2538a2611e5d52257a82881 Author: David Lehman <dlehman> Date: Wed May 27 16:07:05 2015 -0500 Use the default md metadata version for everything except /boot/efi. Now that we've moved to grub2 this is no longer necessary for /boot. As far as I know we have never actually allowed PReP on md, so that's not needed either. Apparently UEFI firmware/bootloader still needs it. Related: rhbz#1061711 . ------- Comment From cjt.com 2017-04-21 12:48 EDT------- Here's where I have stopped grub2-install in gdb: (gdb) run -vvv /dev/sda1 ... grub-core/osdep/hostdisk.c:415: opening the device `/dev/sda2' in open_device() (gdb) bt #0 grub_util_fd_seek (fd=0x8, off=0x3dcf8000) at grub-core/osdep/unix/hostdisk.c:105 #1 0x000000001013f3ac in grub_util_fd_open_device (disk=0x101e88e0, sector=0x3dcf8, flags=0x101000, max=0x3fffffffe018) at grub-core/osdep/linux/hostdisk.c:450 #2 0x000000001013c56c in grub_util_biosdisk_read (disk=0x101e88e0, sector=0x404f8, size=0x8, buf=0x101ee130 "\370\016\347\267\377?") at grub-core/kern/emu/hostdisk.c:289 #3 0x0000000010133ccc in grub_disk_read_small_real (disk=0x101e88e0, sector=0x2027c0, offset=0x6000, size=0x100, buf=0x3fffffffe308) at grub-core/kern/disk.c:344 #4 0x0000000010133fac in grub_disk_read_small (disk=0x101e88e0, sector=0x2027c0, offset=0x6000, size=0x100, buf=0x3fffffffe308) at grub-core/kern/disk.c:401 #5 0x00000000101341a8 in grub_disk_read (disk=0x101e88e0, sector=0x2027f0, offset=0x0, size=0x100, buf=0x3fffffffe308) at grub-core/kern/disk.c:440 #6 0x000000001004371c in grub_mdraid_detect (disk=0x101e88e0, id=0x3fffffffe4c8, start_sector=0x3fffffffe4c0) at grub-core/disk/mdraid1x_linux.c:149 #7 0x0000000010155eb0 in scan_disk_partition_iter (disk=0x101e88e0, p=0x3fffffffe548, data=0x101e8860) at grub-core/disk/diskfilter.c:161 #8 0x0000000010147000 in part_iterate (dsk=0x101e88e0, partition=0x3fffffffe660, data=0x3fffffffe900) at grub-core/kern/partition.c:196 #9 0x000000001015a2b8 in grub_partition_msdos_iterate (disk=0x101e88e0, hook=0x10146f24 <part_iterate>, hook_data=0x3fffffffe900) at grub-core/partmap/msdos.c:196 #10 0x000000001014718c in grub_partition_iterate (disk=0x101e88e0, hook=0x10155ccc <scan_disk_partition_iter>, hook_data=0x101e8860) at grub-core/kern/partition.c:233 #11 0x00000000101560c0 in scan_disk (name=0x101e8860 "hd0", accept_diskfilter=0x1) at grub-core/disk/diskfilter.c:204 #12 0x00000000101591ec in grub_diskfilter_get_pv_from_disk (disk=0x101e8810, vg_out=0x3fffffffea30) at grub-core/disk/diskfilter.c:1173 #13 0x0000000010154f9c in grub_util_get_ldm (disk=0x101e8810, start=0x2800) at grub-core/disk/ldm.c:876 #14 0x0000000010135bb0 in grub_util_biosdisk_get_grub_dev (os_dev=0x101e5fd0 "/dev/sda2") at util/getroot.c:437 #15 0x000000001013531c in grub_util_pull_device (os_dev=0x101e5fd0 "/dev/sda2") at util/getroot.c:111 #16 0x000000001013a6a0 in grub_util_pull_device_os (os_dev=0x101e7520 "/dev/md0", ab=GRUB_DEV_ABSTRACTION_RAID) at grub-core/osdep/linux/getroot.c:1064 #17 0x0000000010135300 in grub_util_pull_device (os_dev=0x101e7520 "/dev/md0") at util/getroot.c:108 #18 0x0000000010006688 in main (argc=0x3, argv=0x3ffffffff528) at util/grub-install.c:1233 (gdb) frame 6 #6 0x000000001004371c in grub_mdraid_detect (disk=0x101e88e0, id=0x3fffffffe4c8, start_sector=0x3fffffffe4c0) at grub-core/disk/mdraid1x_linux.c:149 149 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x), (gdb) print minor_version $34 = 0x0 (gdb) print *((*disk)->partition) $35 = {number = 0x1, start = 0x2800, len = 0x200000, offset = 0x0, index = 0x1, parent = 0x0, partmap = 0x101ba250 <grub_msdos_partition_map>, msdostype = 0xfd} (gdb) print sector $36 = 0x1ffff0 (gdb) frame 0 #0 grub_util_fd_seek (fd=0x8, off=0x3dcf8000) at grub-core/osdep/unix/hostdisk.c:105 105 if (lseek (fd, offset, SEEK_SET) != offset) (gdb) print offset $37 = 0x3dcf8000 There seem to be at least 1 problem with the sector/offset computations. According to mdadm -E /dev/sda2: Super Offset : 2097136 sectors (that's 512b sectors because my partition is only 1GB) Therefore we find our md superblock at 0x3FFFE. I believe one problem is in grub_util_fd_open_device(). It is passed sector=404f8, which refers to 4096b sectors. grub_util_fd_open_device() then subtracts the part_start which comes from disk->partition->start=0x2800 in grub_partition_get_start(). But that 0x2800 refers to 512b sectors. Here is the partition table: Model: AIX VDASD (scsi) Disk /dev/sda: 5242880s Sector size (logical/physical): 4096B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 256s 1279s 1024s primary boot, prep 2 1280s 263423s 262144s primary raid 3 263424s 4451327s 4187904s primary raid Partition 2 starts at 1280s (based on 4096b sectors) which is 0x500s. (gdb) frame 6 #6 0x000000001004371c in grub_mdraid_detect (disk=0x101e88e0, id=0x3fffffffe4c8, start_sector=0x3fffffffe4c0) at grub-core/disk/mdraid1x_linux.c:149 149 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x), (gdb) print ((*disk)->partition)->start $42 = 0x2800 So grub_util_fd_open() mixed the native 4096b sectors with the grub sector size of 512b. sector=0x404f8 was passed in but the mixing of sector sizes caused the offset sent to grub_util_fd_seek to be 0x3dcf8000 instead of 0x3FFF8000. ------- Comment From ruddk.com 2017-04-21 12:57 EDT------- Just to clarify for others in case the "All" arch specification in this bug is missed: This isn't really PPC64* specfic. The problem can be easily reproduced in an x86_64 KVM guest environment by simply presenting one of the virtual disks as a 4096 block device. For example: <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/vg_root/spare'/> <blockio logical_block_size='4096' physical_block_size='4096'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> ------- Comment From willianr.com 2017-06-08 13:55 EDT------- I just ran a fresh installation enabling raid on a 4k block disk and I could not reproduce the problem stated on additional notes "the issue was originally noted at installation-time". Here are the information right after the first boot: [root@rhel-grub ~]# uname -a Linux rhel-grub 3.10.0-514.el7.ppc64le #1 SMP Wed Oct 19 11:27:06 EDT 2016 ppc64le ppc64le ppc64le GNU/Linux [root@rhel-grub ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.3 (Maipo) [root@rhel-grub ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/md126 9.0G 1.2G 7.9G 13% / devtmpfs 8.0G 0 8.0G 0% /dev tmpfs 8.0G 0 8.0G 0% /dev/shm tmpfs 8.0G 14M 8.0G 1% /run tmpfs 8.0G 0 8.0G 0% /sys/fs/cgroup /dev/md127 1018M 145M 874M 15% /boot tmpfs 1.6G 0 1.6G 0% /run/user/0 [root@rhel-grub ~]# cat /proc/mdstat md126 : active raid1 sdb1[1] sda2[0] 9423872 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [64KB], 65536KB chunk md127 : active raid1 sdb2[1] sda3[0] 1048512 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk [root@rhel-grub ~]# grub2-probe --device /dev/md126 --target fs_uuid 5de99add-1cf2-41f0-ba54-c08067e404d4 [root@rhel-grub ~]# grub2-probe --device /dev/md127 --target fs_uuid d48f8f83-717b-405e-9e7b-02ba37de959a [root@rhel-grub ~]# parted /dev/sda u s p Model: QEMU QEMU HARDDISK (scsi) Disk /dev/sda: 2621440s Sector size (logical/physical): 4096B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 256s 1279s 1024s primary boot, prep 2 1280s 2359295s 2358016s primary raid 3 2359296s 2621439s 262144s primary raid [root@rhel-grub ~]# parted /dev/sdb u s p Model: QEMU QEMU HARDDISK (scsi) Disk /dev/sdb: 2621440s Sector size (logical/physical): 4096B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 256s 2358271s 2358016s primary raid 2 2358272s 2620415s 262144s primary raid I will do another installation without raid and then migrate it to raid to check if the problem happens. So, for now, can someone confirm this problem happens during install time? ------- Comment From willianr.com 2017-06-12 17:33 EDT------- As expected, migrating /boot to raid 1 using metadata 1.0 when it is the first partition after prep fails: [root@rhel-grub2-1 ~]# mdadm -D /dev/md0 /dev/md0: Version : 1.0 Creation Time : Mon Jun 12 17:22:07 2017 Raid Level : raid1 Array Size : 1048512 (1023.94 MiB 1073.68 MB) Used Dev Size : 1048512 (1023.94 MiB 1073.68 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Mon Jun 12 17:26:45 2017 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : rhel-grub2-1:0 (local to host rhel-grub2-1) UUID : 537bfbf4:0b89fb58:f50f14c3:ba5f2bf3 Events : 33 Number Major Minor RaidDevice State 2 253 2 0 active sync /dev/vda2 1 253 17 1 active sync /dev/vdb1 [root@rhel-grub2-1 ~]# parted /dev/vda u s p Model: Virtio Block Device (virtblk) Disk /dev/vda: 2621440s Sector size (logical/physical): 4096B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 256s 1279s 1024s primary boot, prep 2 1280s 263423s 262144s primary raid 3 263424s 2360575s 2097152s primary [root@rhel-grub2-1 ~]# parted /dev/vdb u s p Model: Virtio Block Device (virtblk) Disk /dev/vdb: 2621440s Sector size (logical/physical): 4096B/4096B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 256s 262399s 262144s primary raid [root@rhel-grub2-1 ~]# grub2-probe --device /dev/md0 --target fs_uuid grub2-probe: error: disk ?mduuid/537bfbf40b89fb58f50f14c3ba5f2bf3? not found. [root@rhel-grub2-1 ~]# grub2-install /dev/vdb Installing for powerpc-ieee1275 platform. grub2-install: error: disk ?mduuid/537bfbf40b89fb58f50f14c3ba5f2bf3? not found. Now, interesting thing is that I was not able to migrate /boot to raid 1 using metadata 1.0 when /boot is not the first partition after prep (just like the installer did on comment #15). When I tried the same as the installer did, grub was not able to find /boot partition after root partition. ------- Comment From victora.com 2017-09-26 08:09 EDT------- Hi, I still didn't have time to work on this. I will try to work on this bz this week. I will let you know when I have updates. Thanks Victor Is there any progress on this? We've just been hit by the same issue on an E850 during install, getting to the point where we can't install a system using software RAID 1. As it stands, we're having to install without RAID to get a system up and running... ------- Comment From ruddk.com 2017-10-27 10:51 EDT------- (In reply to comment #23) > Is there any progress on this? > > We've just been hit by the same issue on an E850 during install, getting to > the point where we can't install a system using software RAID 1. > > As it stands, we're having to install without RAID to get a system up and > running... The install-side issue should already have been addressed for RHEL 7.4 via RH Bug 1184945. The easy workaround is to not use version 1.0 metadata for the RAID config. (In reply to IBM Bug Proxy from comment #9) > ------- Comment From ruddk.com 2017-10-27 10:51 EDT------- > (In reply to comment #23) > > Is there any progress on this? > > > > We've just been hit by the same issue on an E850 during install, getting to > > the point where we can't install a system using software RAID 1. > > > > As it stands, we're having to install without RAID to get a system up and > > running... > > The install-side issue should already have been addressed for RHEL 7.4 via > RH Bug 1184945. The easy workaround is to not use version 1.0 metadata for > the RAID config. Unfortunately you can't specify a metadata type via kickstart for md devices - so that's still a show-stopper for using RHEL7.3 on an E850. As a work-around to allow RAID during install, i've had to specify /boot as a btrfs partition, which worked perfectly fine. Still - this isn't exactly an ideal solution for anyone using an E850 with RHEL 7.3... The customer i'm building out for isn't prepared to use RHEL 7.4 yet. ------- Comment From desnesn.com 2017-10-27 13:23 EDT------- Hello Kevin, The engineer that was in charged of this Bug is leaving IBM. I am working in this bug as we speak (started this week), and I think I am up to something. I will post my results by the end of the day. ------- Comment From desnesn.com 2017-10-27 16:59 EDT------- For now, I tried to look inside grub-probe to see if I could find any clues. Through gdb, I noticed that when using 4k blocksize with --metadata=1.0 on the MD Raid disks, the `dev` variable from util/grub-probe.c +376 is coming out not allocated, so a grub_util_error() is being thrown. ============================ util/grub-probe.c +376 ============================ 376 dev = grub_device_open (drives_names[0]); 377 if (! dev) 378 grub_util_error ("%s", grub_errmsg); ============================ Now, comparing grub_device_open() code on grub-core/kern/device.c, and using both --metadata=1.0 and --metadata=0.90: ============================ grub-core/kern/device.c +47 ============================ 47 dev = grub_malloc (sizeof (*dev)); 48 if (! dev) 49 goto fail; 50 51 dev->net = NULL; 52 /* Try to open a disk. */ 53 dev->disk = grub_disk_open (name); 54 if (dev->disk) 55 return dev; 56 if (grub_net_open && grub_errno == GRUB_ERR_UNKNOWN_DEVICE) 57 { 58 grub_errno = GRUB_ERR_NONE; 59 dev->net = grub_net_open (name); 60 } 61 62 if (dev->net) 63 return dev; 64 65 fail: 66 grub_free (dev); ============================ CURIOSITY: The addresses that came out of the grub_malloc() on line 47 seem a bit odd with 1.0. ===== RAID using --metadata=1.0: FAILS on grub2-probe ===== Breakpoint 4, grub_device_open (name=0x10185290 "mduuid/ceebb143b7f740ba41794f2e88b1e1de") at grub-core/kern/device.c:48 48 if (! dev) (gdb) print *dev $3 = { disk = 0x0, net = 0x3fffb7ed07b8 <main_arena+104> } (gdb) print *dev->net $4 = { server = 0x3fffb7ed07a8 <main_arena+88> "\240(\034\020", name = 0x3fffb7ed07a8 <main_arena+88> "\240(\034\020", protocol = 0x3fffb7ed07b8 <main_arena+104>, packs = { first = 0x3fffb7ed07b8 <main_arena+104>, last = 0x3fffb7ed07c8 <main_arena+120>, count = 70367534974920 }, offset = 70367534974936, fs = 0x3fffb7ed07d8 <main_arena+136>, eof = -1209202712, stall = 16383 } ===== ===== RAID using --metadata=0.90: SUCCESS on grub2-probe ===== Breakpoint 2, grub_device_open (name=0x10185830 "mduuid/1940b3311771bbb17b777c24c48ad94b") at grub-core/kern/device.c:48 48 if (! dev) (gdb) print *dev $1 = { disk = 0x0, net = 0x10185120 } (gdb) print *dev->net $3 = { server = 0x61 <Address 0x61 out of bounds>, name = 0x21 <Address 0x21 out of bounds>, protocol = 0x3fffb7ed07b8 <main_arena+104>, packs = { first = 0x3fffb7ed07b8 <main_arena+104>, last = 0x20, count = 32 }, offset = 7742648064551382888, fs = 0x64762f7665642f2f, eof = 98, stall = 0 } ===== Anyhow, this was only an allocation, and on line 51 of grub-core/kern/device.c dev->net receives NULL. Using --metadata=1.0, `dev` is allocated, and the execution moves into grub_disk_open() on line 53. This function is returning a struct with its value set to zero here. Thus, it will jump the ifs on lines 54, 56 and 62; and eventually fails on 65. ===== RAID using --metadata=1.0: FAILS on grub2-probe ===== Breakpoint 2, grub_device_open (name=0x10185290 "mduuid/7266eba408736585cf9c00e3a2342fdc") at grub-core/kern/device.c:54 54 if (dev->disk) (gdb) print *dev $3 = { disk = 0x0, net = 0x0 } ===== Whereas using --metadata=0.90, the struct is not zeroed out, and grub_device_open() returns `dev` on line 54. ===== RAID using --metadata=0.90: SUCCESS on grub2-probe ===== Breakpoint 1, grub_device_open (name=0x10185830 "mduuid/a29da500c684c0d47b777c24c48ad94b") at grub-core/kern/device.c:54 54 if (dev->disk) (gdb) print *dev $1 = { disk = 0x101834a0, net = 0x0 } ===== Riddle me this: why? More grub to come ... will look into grub_disk_open() and on mdadm next. ------- Comment From desnesn.com 2017-10-31 09:49 EDT------- Going deeper in the rabbit hole from IBM Comment 27 / RH Comment 12: ============================ grub-core/kern/disk.c +187 ============================ 187 grub_disk_t 188 grub_disk_open (const char *name) 189 { ... 224 for (dev = grub_disk_dev_list; dev; dev = dev->next) 225 { 226 if ((dev->open) (raw, disk) == GRUB_ERR_NONE) 227 break; 228 else if (grub_errno == GRUB_ERR_UNKNOWN_DEVICE) 229 grub_errno = GRUB_ERR_NONE; 230 else 231 goto fail; 232 } 233 234 if (! dev) 235 { 236 grub_error (GRUB_ERR_UNKNOWN_DEVICE, N_("disk `%s' not found"), 237 name); 238 goto fail; 239 } ============================ Using --metadata=1.0, `dev` comes zeroed out after the for loop on line 224, whereas on 0.90 it is defined. Moreover, line 236 grub_error() message is the one being printed by grub2-probe. ===== FAILURE grub2-probe - RAID using --metadata=1.0 ===== Breakpoint 1, grub_disk_open (name=0x10185290 "mduuid/0ef5c3920edae097657894d84aef753d") at grub-core/kern/disk.c:234 234 if (! dev) (gdb) print dev $1 = (grub_disk_dev_t) 0x0 (gdb) print *dev Cannot access memory at address 0x0 (gdb) s 236 grub_error (GRUB_ERR_UNKNOWN_DEVICE, N_("disk `%s' not found"), ===== ===== SUCCESS grub2-probe - RAID using --metadata=0.90 ===== Breakpoint 1, grub_disk_open (name=0x10185830 "mduuid/ebae38d5105eed037b777c24c48ad94b") at grub-core/kern/disk.c:234 234 if (! dev) (gdb) print dev $1 = (grub_disk_dev_t) 0x10165e80 <grub_diskfilter_dev> (gdb) print *dev $2 = { name = 0x10146d50 "diskfilter", id = GRUB_DISK_DEVICE_DISKFILTER_ID, iterate = 0x101107cc <grub_diskfilter_iterate>, open = 0x10110fd8 <grub_diskfilter_open>, close = 0x10111120 <grub_diskfilter_close>, read = 0x1011220c <grub_diskfilter_read>, write = 0x1011227c <grub_diskfilter_write>, memberlist = 0x10110950 <grub_diskfilter_memberlist>, raidname = 0x10110df4 <grub_diskfilter_getname>, next = 0x10165fb8 <grub_procfs_dev> } (gdb) s 240 if (disk->log_sector_size > GRUB_DISK_CACHE_BITS + GRUB_DISK_SECTOR_BITS ===== Since `dev` is used for a couple of devices on grub and this is a C template struct, each dev had its own functions. In our case, we are dealing with grub_disk_dev_t, and through gdb we can see that dev->open() on line 226 actually is grub_diskfilter_open() on: ============================ grub-core/disk/diskfilter.c +419 ============================ 419 static grub_err_t 420 grub_diskfilter_open (const char *name, grub_disk_t disk) 421 { 422 struct grub_diskfilter_lv *lv; 423 424 if (!is_valid_diskfilter_name (name)) 425 return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "unknown DISKFILTER device %s", 426 name); 427 428 lv = find_lv (name); 429 430 if (! lv) 431 { 432 scan_devices (name); 433 if (grub_errno) 434 { 435 grub_print_error (); 436 grub_errno = GRUB_ERR_NONE; 437 } 438 lv = find_lv (name); 439 } 440 441 if (!lv) 442 return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "unknown DISKFILTER device %s", 443 name); 444 445 disk->id = lv->number; 446 disk->data = lv; 447 448 disk->total_sectors = lv->size; 449 disk->max_agglomerate = GRUB_DISK_MAX_MAX_AGGLOMERATE; 450 return 0; ============================ The is_valid_diskfilter_name() check on line 426 passes for both metadatas 0.90 and 1.0. However, if we break line 441, a strange thing to note here - using --metadata=1.0 all my disk devices passes through the breakpoint, whereas using 0.90 only the raid device passed through the breakpoint on line 441. ===== FAILURE grub2-probe - RAID using --metadata=1.0 ===== Breakpoint 1, grub_diskfilter_open (name=0x10185b90 "lvm/rhel-root", disk=0x101827d0) at grub-core/disk/diskfilter.c:441 441 if (!lv) (gdb) c Continuing. Breakpoint 1, grub_diskfilter_open (name=0x101859f0 "lvm/rhel-home", disk=0x101827d0) at grub-core/disk/diskfilter.c:441 441 if (!lv) (gdb) c Continuing. Breakpoint 1, grub_diskfilter_open (name=0x101857e0 "lvm/rhel-swap", disk=0x101827d0) at grub-core/disk/diskfilter.c:441 441 if (!lv) (gdb) c Continuing. ... Breakpoint 2, grub_diskfilter_open (name=0x10185290 "mduuid/0ef5c3920edae097657894d84aef753d", disk=0x10182780) at grub-core/disk/diskfilter.c:441 441 if (!lv) (gdb) print lv $2 = (struct grub_diskfilter_lv *) 0x0 (gdb) print *lv Cannot access memory at address 0x0 (gdb) s 442 return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "unknown DISKFILTER device %s", ===== ===== SUCCESS grub2-probe - RAID using --metadata=0.90 ===== Breakpoint 1, grub_diskfilter_open (name=0x10185830 "mduuid/c5e0adca3d6a76ef7b777c24c48ad94b", disk=0x101834a0) at grub-core/disk/diskfilter.c:441 441 if (!lv) (gdb) print lv $1 = (struct grub_diskfilter_lv *) 0x10183690 (gdb) print *lv $2 = { fullname = 0x10183580 "md/md1", idname = 0x10183700 "mduuid/c5e0adca3d6a76ef7b777c24c48ad94b", name = 0x10183580 "md/md1", number = 0, segment_count = 1, segment_alloc = 0, size = 20969344, became_readable_at = 1, scanned = 0, visible = 1, segments = 0x10183730, vg = 0x10183530, next = 0x0, internal_id = 0x0 } (gdb) s 445 disk->id = lv->number; ===== So, `lv` on 441 is coming out zeroed out. Let's back up a bit and break line 430: ===== FAILURE grub2-probe - RAID using --metadata=1.0 ===== Breakpoint 2, grub_diskfilter_open (name=0x10185290 "mduuid/2872dd311d2585e4690defc1d9ba07a7", disk=0x10182780) at grub-core/disk/diskfilter.c:430 430 if (! lv) (gdb) print lv $2 = (struct grub_diskfilter_lv *) 0x0 (gdb) print *lv Cannot access memory at address 0x0 (gdb) c Continuing. ... Breakpoint 2, grub_diskfilter_open (name=0x10185b90 "lvm/rhel-root", disk=0x101827d0) at grub-core/disk/diskfilter.c:430 430 if (! lv) (gdb) c Continuing. Breakpoint 2, grub_diskfilter_open (name=0x101859f0 "lvm/rhel-home", disk=0x101827d0) at grub-core/disk/diskfilter.c:430 430 if (! lv) (gdb) c Continuing. Breakpoint 2, grub_diskfilter_open (name=0x101857e0 "lvm/rhel-swap", disk=0x101827d0) at grub-core/disk/diskfilter.c:430 430 if (! lv) (gdb) c Continuing. /usr/sbin/grub2-probe: error: disk ?mduuid/2872dd311d2585e4690defc1d9ba07a7? not found. [Inferior 1 (process 19832) exited with code 01] ===== ===== SUCCESS grub2-probe - RAID using --metadata=0.90 ===== Breakpoint 1, grub_diskfilter_open (name=0x10185830 "mduuid/a2f06ca0ad6cedbc7b777c24c48ad94b", disk=0x101834a0) at grub-core/disk/diskfilter.c:430 430 if (! lv) (gdb) print lv $1 = (struct grub_diskfilter_lv *) 0x10183690 (gdb) print *lv $2 = { fullname = 0x10183580 "md/md1", idname = 0x10183700 "mduuid/a2f06ca0ad6cedbc7b777c24c48ad94b", name = 0x10183580 "md/md1", number = 0, segment_count = 1, segment_alloc = 0, size = 20969344, became_readable_at = 1, scanned = 0, visible = 1, segments = 0x10183730, vg = 0x10183530, next = 0x0, internal_id = 0x0 } ===== Thus, apparently the culprit now might be hiding on find_lv (). ============================ grub-core/disk/diskfilter.c +401 ============================ 401 static struct grub_diskfilter_lv * 402 find_lv (const char *name) 403 { 404 struct grub_diskfilter_vg *vg; 405 struct grub_diskfilter_lv *lv = NULL; 406 407 for (vg = array_list; vg; vg = vg->next) 408 { 409 if (vg->lvs) 410 for (lv = vg->lvs; lv; lv = lv->next) 411 if (((lv->fullname && grub_strcmp (lv->fullname, name) == 0) 412 || (lv->idname && grub_strcmp (lv->idname, name) == 0)) 413 && is_lv_readable (lv, 0)) 414 return lv; 415 } 416 return NULL; 417 } ============================ ===== FAILURE grub2-probe - RAID using --metadata=1.0 ===== Breakpoint 1, find_lv (name=0x10185290 "mduuid/e5cf979ce818f58cc57b618bd78b4b86") at grub-core/disk/diskfilter.c:407 407 for (vg = array_list; vg; vg = vg->next) (gdb) print array_list $1 = (struct grub_diskfilter_vg *) 0x0 (gdb) print *array_list Cannot access memory at address 0x0 (gdb) s 416 return NULL; ===== ===== SUCCESS grub2-probe - RAID using --metadata=0.90 ===== Breakpoint 5, find_lv (name=0x10185830 "mduuid/a2f06ca0ad6cedbc7b777c24c48ad94b") at grub-core/disk/diskfilter.c:407 407 for (vg = array_list; vg; vg = vg->next) (gdb) print array_list $5 = (struct grub_diskfilter_vg *) 0x10183530 (gdb) print *array_list $6 = { uuid = 0x10183300 "\242\360l\240\255l\355\274{w|$?\331Ke/en_US.!", uuid_len = 16, name = 0x10183580 "md/md1", extent_size = 1, pvs = 0x10186090, lvs = 0x10183690, next = 0x0, driver = 0x10160228 <grub_mdraid_dev> } (gdb) s 409 if (vg->lvs) ===== Therefore, at this point we can infer that RAID 1 with --metadata=1.0 on 4k blocksize disks is leading to the creation of an empty array_list, which is leading to everything else. Riddle me this: why? More gdb to come ... apparently array_list is being populated on grub_diskfilter_vg_register() at grub-core/disk/diskfilter.c +838. Will look into that next, and eventually on mdadm. ------- Comment From desnesn.com 2017-12-11 15:28 EDT------- Finally had quality time for this bug again. Carrying on: From breaking grub_diskfilter_vg_register(), we can observe that the disk is being registered differently on 1.0 to 0.90; and that is because the diskfilter that is being registered is for my rhel OS instead of my raid. By doing a backtrace, I also noticed that even the stack is different when calling grub_diskfilter_vg_register(): ===== --- bad 2017-12-07 13:44:39.654222238 -0200 +++ good 2017-12-07 13:43:52.563919187 -0200 @@ -1,36 +1,39 @@ -Breakpoint 1, grub_diskfilter_vg_register (vg=0x10185ea0) at grub-core/disk/diskfilter.c:849 +Breakpoint 1, grub_diskfilter_vg_register (vg=0x10183530) at grub-core/disk/diskfilter.c:849 849 for (lv = vg->lvs; lv; lv = lv->next) (gdb) print *vg $4 = { - uuid = 0x10185ef0 "xZL9PN-dXgE-Vflt-rtI5-Y203-gQ6e-TBS0Mz", - uuid_len = 38, - name = 0x10185e80 "rhel", - extent_size = 8192, - pvs = 0x10185f20, - lvs = 0x10185b00, + uuid = 0x10183300 "?\300\263\362\242\326]\232\334\316r\364\264\370\267e/en_US.!", + uuid_len = 16, + name = 0x10183580 "md/1", + extent_size = 1, + pvs = 0x101850c0, + lvs = 0x10183690, next = 0x0, driver = 0x0 } (gdb) bt -#0 grub_diskfilter_vg_register (vg=0x10185ea0) at grub-core/disk/diskfilter.c:849 -#1 0x0000000010009810 in grub_lvm_detect (disk=0x101827d0, id=0x3ffffffde4e8, start_sector=0x3ffffffde4e0) at grub-core/disk/lvm.c:744 -#2 0x00000000101102b0 in scan_disk_partition_iter (disk=0x101827d0, p=0x3ffffffde568, data=0x101812d0) at grub-core/disk/diskfilter.c:161 -#3 0x0000000010101400 in part_iterate (dsk=0x101827d0, partition=0x3ffffffde680, data=0x3ffffffde920) at grub-core/kern/partition.c:196 -#4 0x00000000101146b8 in grub_partition_msdos_iterate (disk=0x101827d0, hook=0x10101324 <part_iterate>, hook_data=0x3ffffffde920) +#0 grub_diskfilter_vg_register (vg=0x10183530) at grub-core/disk/diskfilter.c:849 ==> +#1 0x0000000010112dd0 in grub_diskfilter_make_raid (uuidlen=16, + uuid=0x10183300 "?\300\263\362\242\326]\232\334\316r\364\264\370\267e/en_US.!", nmemb=2, name=0x3ffffffde3e8 "rhel-7.3:1", + disk_size=20969216, stripe_size=0, layout=0, level=1) at grub-core/disk/diskfilter.c:1030 +#2 0x000000001000a414 in grub_mdraid_detect (disk=0x101834a0, id=0x3ffffffde588, start_sector=0x3ffffffde580) + at grub-core/disk/mdraid1x_linux.c:202 +#3 0x00000000101102b0 in scan_disk_partition_iter (disk=0x101834a0, p=0x3ffffffde608, data=0x10182800) at grub-core/disk/diskfilter.c:161 +#4 0x0000000010101400 in part_iterate (dsk=0x101834a0, partition=0x3ffffffde720, data=0x3ffffffde9c0) at grub-core/kern/partition.c:196 +#5 0x00000000101146b8 in grub_partition_msdos_iterate (disk=0x101834a0, hook=0x10101324 <part_iterate>, hook_data=0x3ffffffde9c0) at grub-core/partmap/msdos.c:196 -#5 0x000000001010158c in grub_partition_iterate (disk=0x101827d0, hook=0x101100cc <scan_disk_partition_iter>, hook_data=0x101812d0) +#6 0x000000001010158c in grub_partition_iterate (disk=0x101834a0, hook=0x101100cc <scan_disk_partition_iter>, hook_data=0x10182800) at grub-core/kern/partition.c:233 -#6 0x00000000101104c0 in scan_disk (name=0x101812d0 "hd0", accept_diskfilter=0) at grub-core/disk/diskfilter.c:204 -#7 0x000000001011054c in scan_disk_hook (name=0x101812d0 "hd0", data=0x0) at grub-core/disk/diskfilter.c:213 -#8 0x00000000100f62e8 in grub_util_biosdisk_iterate (hook=0x1011051c <scan_disk_hook>, hook_data=0x0, pull=GRUB_DISK_PULL_NONE) - at grub-core/kern/emu/hostdisk.c:119 -#9 0x0000000010110610 in scan_devices (arname=0x10185290 "mduuid/76da2b4aea5de03cfc91176e98fc2140") at grub-core/disk/diskfilter.c:231 -#10 0x0000000010111050 in grub_diskfilter_open (name=0x10185290 "mduuid/76da2b4aea5de03cfc91176e98fc2140", disk=0x10182780) - at grub-core/disk/diskfilter.c:432 -#11 0x00000000100ee00c in grub_disk_open (name=0x10185290 "mduuid/76da2b4aea5de03cfc91176e98fc2140") at grub-core/kern/disk.c:226 -#12 0x00000000100ed0a0 in grub_device_open (name=0x10185290 "mduuid/76da2b4aea5de03cfc91176e98fc2140") at grub-core/kern/device.c:53 -#13 0x0000000010003ee0 in probe (path=0x0, device_names=0x10181050, delim=10 '\n') at util/grub-probe.c:376 -#14 0x00000000100056b4 in main (argc=5, argv=0x3ffffffff3d8) at util/grub-probe.c:882 +#7 0x00000000101104c0 in scan_disk (name=0x10182800 "hostdisk//dev/vda", accept_diskfilter=1) at grub-core/disk/diskfilter.c:204 +#8 0x00000000101135ec in grub_diskfilter_get_pv_from_disk (disk=0x101827b0, vg_out=0x3ffffffdeaf0) at grub-core/disk/diskfilter.c:1173 +#9 0x000000001010f39c in grub_util_get_ldm (disk=0x101827b0, start=2048) at grub-core/disk/ldm.c:876 +#10 0x00000000100f03f4 in grub_util_biosdisk_get_grub_dev (os_dev=0x1018a800 "/dev/vda1") at util/getroot.c:437 +#11 0x00000000100efb60 in grub_util_pull_device (os_dev=0x1018a800 "/dev/vda1") at util/getroot.c:111 +#12 0x00000000100f4ee4 in grub_util_pull_device_os (os_dev=0x10181350 "/dev/md1", ab=GRUB_DEV_ABSTRACTION_RAID) + at grub-core/osdep/linux/getroot.c:1064 +#13 0x00000000100efb44 in grub_util_pull_device (os_dev=0x10181350 "/dev/md1") at util/getroot.c:108 +#14 0x0000000010003b14 in probe (path=0x0, device_names=0x10181050, delim=10 '\n') at util/grub-probe.c:304 +#15 0x00000000100056b4 in main (argc=5, argv=0x3ffffffff3d8) at util/grub-probe.c:882 ... -/usr/sbin/grub2-probe: error: disk ?mduuid/76da2b4aea5de03cfc91176e98fc2140? not found. -[Inferior 1 (process 17894) exited with code 01] +d0aeef94-f2ba-4831-8bbe-523e587adc46 +[Inferior 1 (process 17933) exited normally] ===== Since I also noticed that grub_diskfilter_make_raid() (grub-core/disk/mdraid_linux.c:256 for 0.90, and grub-core/disk/mdraid1x_linux.c:202 for 1.0) was never called when using 4k block size, I thought now would be interesting to change my testing and follow the good stack (512 block size) to see where the path deviates. Thus, I decided to compare a raid disk of 4k (bad) with another of 512 (good), but only using only metadata 1.0 from this point on, which made me reach: ===== --- bad 2017-11-29 23:27:37.127052665 -0200 +++ good 2017-11-29 23:24:12.039919190 -0200 ... Breakpoint 1, grub_mdraid_detect (disk=0x101834a0, id=0x3ffffffde9c8, start_sector=0x3ffffffde9c0) at grub-core/disk/mdraid1x_linux.c:153 153 if (sb.magic != grub_cpu_to_le32_compile_time (SB_MAGIC) @@ -50,18 +50,21 @@ $18 = 0x8 (gdb) n 124 for (minor_version = 0; minor_version < 3; ++minor_version) (gdb) print disk->name -$19 = 0x101834f0 "hostdisk//dev/vdc" +$19 = 0x101834f0 "hostdisk//dev/vda" (gdb) c Continuing. ... (gdb) c Continuing. ... (gdb) c Continuing. ... Breakpoint 1, grub_mdraid_detect (disk=0x101834a0, id=0x3ffffffde588, start_sector=0x3ffffffde580) at grub-core/disk/mdraid1x_linux.c:153 153 if (sb.magic != grub_cpu_to_le32_compile_time (SB_MAGIC) (gdb) p/x sb.magic -$20 = 0x0 +$20 = 0xa92b4efc (gdb) p/x sb.super_offset -$21 = 0x0 +$21 = 0x13ff7f0 (gdb) p/x sector $22 = 0x13ff7f0 (gdb) n -155 continue; +154 || grub_le_to_cpu64 (sb.super_offset) != sector) +(gdb) n +157 if (sb.major_version != grub_cpu_to_le32_compile_time (1)) ... and latter, grub_diskfilter_make_raid() was only called on the 512 raid. Moreover, the next time this breakpoint is reached is for the other disk. ===== Quick note: ===== [root@rhel-7 grub-2.02~beta2]# grep -rnI "define SB_MAGIC" . ./grub-core/disk/mdraid1x_linux.c:32:#define SB_MAGIC 0xa92b4efc ./grub-core/disk/mdraid_linux.c:97:#define SB_MAGIC 0xa92b4efc ===== Thus, the conditionals on line 153 and 157 are always falling since sb is zeroed out; never allowing grub_diskfilter_make_raid() to be called. The sb variable should had been filled in the line before: ============================ grub-core/disk/mdraid1x_linux.c ============================ 149 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x), 150 &sb)) 151 return NULL; ============================ Going deeper in the rabbit hole and skipping a few steps (inside grub_disk_read() at grub-core/kern/disk.c:413; which is a function that calls grub_disk_read_small() on grub-core/kern/disk.c:396; which calls grub_disk_read_small_real() at grub-core/kern/disk.c:317): ============================ grub-core/kern/disk.c ============================ 380 if ((disk->dev->read) (disk, transform_sector (disk, aligned_sector), 381 num, tmp_buf)) ============================ We can observe that: ====== --- bad 2017-12-08 16:40:58.277936277 -0200 +++ good 2017-12-08 15:02:37.620925526 -0200 @@ -1,22 +1,22 @@ 380 if ((disk->dev->read) (disk, transform_sector (disk, aligned_sector), (gdb) p/x ((struct grub_raid_super_1x*) tmp_buf)->magic -$11 = 0xb7ed0ec8 +$11 = 0xb7ed09a8 (gdb) p/x ((struct grub_raid_super_1x*) tmp_buf)->super_offset $12 = 0x0 (gdb) p/x ((struct grub_raid_super_1x*) buf)->magic $13 = 0x12 (gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset $14 = 0x3ffffffde460 (gdb) n -Detaching after fork from child process 19992. +Detaching after fork from child process 19972. 389 grub_memcpy (buf, tmp_buf + offset, size); (gdb) p/x ((struct grub_raid_super_1x*) tmp_buf)->magic -$15 = 0x0 +$15 = 0xa92b4efc (gdb) p/x ((struct grub_raid_super_1x*) tmp_buf)->super_offset -$16 = 0x0 +$16 = 0x13ff7f0 (gdb) n 390 grub_free (tmp_buf); (gdb) p/x ((struct grub_raid_super_1x*) buf)->magic -$17 = 0x0 +$17 = 0xa92b4efc (gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset -$18 = 0x0 +$18 = 0x13ff7f0 ====== Further, disk->dev->read() first calls transform_sector(): ============================ grub-core/kern/disk_common.c ============================ 42 static inline grub_disk_addr_t 43 transform_sector (grub_disk_t disk, grub_disk_addr_t sector) 44 { 45 return sector >> (disk->log_sector_size - GRUB_DISK_SECTOR_BITS); 46 } ============================ And aftewards disk->dev->read() (pointer to grub_util_biosdisk_read() on grub-core/kern/emu/hostdisk.c:282) calls grub_util_fd_read(): ============================ grub-core/kern/emu/hostdisk.c ============================ 281 static grub_err_t 282 grub_util_biosdisk_read (grub_disk_t disk, grub_disk_addr_t sector, 283 grub_size_t size, char *buf) 284 { ... 305 if (grub_util_fd_read (fd, buf, max << disk->log_sector_size) 306 != (ssize_t) (max << disk->log_sector_size)) 307 return grub_error (GRUB_ERR_READ_ERROR, N_("cannot read `%s': %s"), 308 map[disk->id].device, grub_util_fd_strerror ()); ============================ Which results: ====== --- bad 2017-12-08 17:21:19.963486041 -0200 +++ good 2017-12-08 17:21:18.714475999 -0200 @@ -1,17 +1,17 @@ 305 if (grub_util_fd_read (fd, buf, max << disk->log_sector_size) (gdb) p/x ((struct grub_raid_super_1x*) buf)->magic -$18 = 0xb7ed0ec8 +$18 = 0xb7ed09a8 (gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset $19 = 0x0 (gdb) p max << disk->log_sector_size -$20 = 4096 +$20 = 512 (gdb) p/x sector -$23 = 0x27fffe +$23 = 0x13ffff0 (gdb) n 306 != (ssize_t) (max << disk->log_sector_size)) (gdb) 305 if (grub_util_fd_read (fd, buf, max << disk->log_sector_size) (gdb) p/x ((struct grub_raid_super_1x*) buf)->magic -$25 = 0x0 +$25 = 0xa92b4efc (gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset -$26 = 0x0 +$26 = 0x13ff7f0 ====== Actually, the whole *buf is zeroed out after this call. Going inside grub_util_fd_read() now: ============================ grub-core/osdep/unix/hostdisk.c ============================ 113 /* Read LEN bytes from FD in BUF. Return less than or equal to zero if an 114 error occurs, otherwise return LEN. */ 115 ssize_t 116 grub_util_fd_read (grub_util_fd_t fd, char *buf, size_t len) 117 { 118 ssize_t size = 0; 119 120 while (len) 121 { 122 ssize_t ret = read (fd, buf, len); ... 138 } 139 140 return size; ============================ Which led to: ====== --- bad 2017-12-08 18:42:33.937517190 -0200 +++ good 2017-12-08 18:39:08.588844344 -0200 @@ -1,16 +1,17 @@ -grub_util_fd_read (fd=8, buf=0x10184fd0 "\310\016\355\267\377?", len=4096) at grub-core/osdep/unix/hostdisk.c:118 +grub_util_fd_read (fd=8, buf=0x10183530 "\250\t\355\267\377?", len=512) at grub-core/osdep/unix/hostdisk.c:118 118 ssize_t size = 0; (gdb) n 120 while (len) (gdb) 122 ssize_t ret = read (fd, buf, len); (gdb) p/x ((struct grub_raid_super_1x*) buf)->magic -$31 = 0xb7ed0ec8 +$31 = 0xb7ed09a8 (gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset $32 = 0x0 (gdb) n 124 if (ret == 0) (gdb) p/x ((struct grub_raid_super_1x*) buf)->magic -$35 = 0x0 +$35 = 0xa92b4efc (gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset -$36 = 0x0 +$36 = 0x13ff7f0 (gdb) p ret -$37 = 4096 +$37 = 512 ====== Note that 4096 zeroes were read, and that's why no error is thrown on grub-core/kern/emu/hostdisk.c:305. Now the question is: are we reading on the wrong sector, or is this data really zeroed out (which would imply that the bug is on mdadm?)? I believe we have enough data to start an upstream discussion, which I plan to do soon. More to come ... ------- Comment From desnesn.com 2018-01-16 14:09 EDT------- Just for the record, I have also reproduced this bug in x86_64 with ubuntu 17.10. This works for me using grub2-2.02-0.65.el7_4.2 on an EFI machine with a 4k disk: [root@pjones3 tmp]# blockdev --getbsz /dev/sdb 4096 [root@pjones3 tmp]# blockdev --getbsz /dev/sdb2 4096 [root@pjones3 tmp]# blockdev --getbsz /dev/md0 4096 [root@pjones3 tmp]# ./usr/sbin/grub2-probe --target fs_uuid -d /dev/sdb2 c1b85a71-972d-4b69-84cc-e6a05326a4c8 [root@pjones3 tmp]# ./usr/sbin/grub2-probe --target fs_uuid -d /dev/md0 c1b85a71-972d-4b69-84cc-e6a05326a4c8 Note the detection in mdraid1x_linux.c still isn't right, because it's trying to find the raid superblock based on the location of the superblock data based on the size of /dev/sdb rather than /dev/sdb2, but grub2-probe and booting the machine with /boot on this raid are both successful. I see this was reported with grub2-2.02-0.44.el7 ; does the newer package work for you? ------- Comment From diegodo.com 2018-06-13 15:24 EDT------- Hi, I'm still getting the result: ./grub-probe: error: disk `mduuid/b184ce73be4a91ec1b586dcce8ee7f9b' not found. One thing that I noticed is that we have some sector lengths hardcoded for 512 bytes. Yet, it seems that grub is facing some problems when trying to find magic number for 1.0 metadata. I did dump the variables returned by the disk read when the mdraid1x_linux.c tries to find the magic number and its getting the wrong position. When I finally changed the hardcoded sector lenghts for 4k instead, the mdraid1x_linux.c was able to find the magic number, although it wasn't able to successful find the disk yet. I'm still investigating this problem and hope I find something in a couple of days. Thank you ok ... with no news for this bugzilla since exactly one year I am closing this Red Hat bugzilla now and please reopen if required with then using the current RHEL7.7 ... ... thanks for your support ... This issue actually still exists on CentOS8-Stream, but we find workaround: the EFI partition should be not first The workaround is: ``` ignoredisk --only-use=nvme0n1,nvme1n1 clearpart --none --initlabel part raid.149 --fstype="mdmember" --ondisk=nvme1n1 --size=55895 part raid.156 --fstype="mdmember" --ondisk=nvme1n1 --size=250 part raid.142 --fstype="mdmember" --ondisk=nvme0n1 --size=250 part raid.135 --fstype="mdmember" --ondisk=nvme0n1 --size=55895 raid / --device=root --fstype="xfs" --level=RAID1 raid.135 raid.149 raid /boot/efi --device=boot_efi --fstype="efi" --level=RAID1 --fsoptions="umask=0077,shortname=winnt" raid.142 raid.156 ``` System boot's fine: ``` [root@host]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme2n1 259:0 0 3.7T 0 disk nvme3n1 259:1 0 3.7T 0 disk nvme0n1 259:2 0 54.9G 0 disk |-nvme0n1p1 259:3 0 54.6G 0 part | `-md127 9:127 0 54.6G 0 raid1 / `-nvme0n1p2 259:4 0 250M 0 part `-md126 9:126 0 250M 0 raid1 /boot/efi nvme1n1 259:5 0 54.9G 0 disk |-nvme1n1p1 259:6 0 54.6G 0 part | `-md127 9:127 0 54.6G 0 raid1 / `-nvme1n1p2 259:7 0 250M 0 part `-md126 9:126 0 250M 0 raid1 /boot/efi ``` grub2-probes: ``` [root@host]# grub2-probe --target fs_uuid -d /dev/md126 0D55-B32D [root@host]# grub2-probe --target fs_uuid -d /dev/md127 595b5582-06b2-444a-966f-f4e3bd52d74c ``` Fdisk: ``` [root@host]# fdisk -l | grep "Disk /dev" -A 2 Disk /dev/nvme2n1: 3.7 TiB, 4000787030016 bytes, 976754646 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes -- Disk /dev/nvme3n1: 3.7 TiB, 4000787030016 bytes, 976754646 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes -- Disk /dev/nvme0n1: 54.9 GiB, 58977157120 bytes, 14398720 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes -- Disk /dev/nvme1n1: 54.9 GiB, 58977157120 bytes, 14398720 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes -- Disk /dev/md127: 54.6 GiB, 58575552512 bytes, 14300672 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes -- Disk /dev/md126: 250 MiB, 262078464 bytes, 63984 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes ``` packages versions: grub2-common-2.02-129.el8.noarch kernel-4.18.0-499.el8.x86_64 PR for anaconda docs "Common bugs and issues" [1] [1] https://github.com/rhinstaller/anaconda/pull/4880 |