1. Please describe the problem: 3 systems all using intel RST raid were updated to kernel 5.8.8-200.fc32.x86_64 concurrently along with a 4th not using Intel RST whose root vol is located on a stand-alone NVME. The 3 intel RST systems all failed to boot, dropping to an emergency shell after dracut-initqueue timeout. The 4th rebooted without issue. The symptom as all the same. in the shell, journalctl shows message "cannot activate LV's in VG Fedora_carrot while PV's appear on duplicate devices.". and "lvm vgscan" reflects similar messages (not using device /dev/sdb2 for PV ...", "PV ... prefers device /dev/sda2"). What concerns me also is this update for me also updated grub-tools which I don't know if it's related, but it appears dracut the md device as it should. I was able to reboot all three systems on the previous kernel, 5.8.7-200.fc32.x86_64. Per dnf history list, this update installed: Install kernel-5.8.8-200.fc32.x86_64 @updates Install kernel-core-5.8.8-200.fc32.x86_64 @updates Install kernel-modules-5.8.8-200.fc32.x86_64 @updates Install kernel-modules-extra-5.8.8-200.fc32.x86_64 @updates Upgrade grub2-common-1:2.04-22.fc32.noarch @updates Upgraded grub2-common-1:2.04-21.fc32.noarch @@System Upgrade grub2-efi-x64-1:2.04-22.fc32.x86_64 @updates Upgraded grub2-efi-x64-1:2.04-21.fc32.x86_64 @@System Upgrade grub2-tools-1:2.04-22.fc32.x86_64 @updates Upgraded grub2-tools-1:2.04-21.fc32.x86_64 @@System Upgrade grub2-tools-efi-1:2.04-22.fc32.x86_64 @updates Upgraded grub2-tools-efi-1:2.04-21.fc32.x86_64 @@System Upgrade grub2-tools-extra-1:2.04-22.fc32.x86_64 @updates Upgraded grub2-tools-extra-1:2.04-21.fc32.x86_64 @@System Upgrade grub2-tools-minimal-1:2.04-22.fc32.x86_64 @updates Upgraded grub2-tools-minimal-1:2.04-21.fc32.x86_64 @@System Upgrade kernel-headers-5.8.8-200.fc32.x86_64 @updates Upgraded kernel-headers-5.8.6-200.fc32.x86_64 @@System Removed kernel-5.6.6-300.fc32.x86_64 @@System Removed kernel-core-5.6.6-300.fc32.x86_64 @@System Removed kernel-modules-5.6.6-300.fc32.x86_64 @@System nothing shows in grubby info for the bad vs good, example from one system: index=0 kernel="/boot/vmlinuz-5.8.8-200.fc32.x86_64" args="ro resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet" root="/dev/mapper/fedora_popcorn-root" initrd="/boot/initramfs-5.8.8-200.fc32.x86_64.img" title="Fedora (5.8.8-200.fc32.x86_64) 32 (Thirty Two)" id="cd37b505ac9e4ac4bf9dbad3bdd26142-5.8.8-200.fc32.x86_64" index=1 kernel="/boot/vmlinuz-5.8.7-200.fc32.x86_64" args="ro resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet" root="/dev/mapper/fedora_popcorn-root" initrd="/boot/initramfs-5.8.7-200.fc32.x86_64.img" title="Fedora (5.8.7-200.fc32.x86_64) 32 (Thirty Two)" id="cd37b505ac9e4ac4bf9dbad3bdd26142-5.8.7-200.fc32.x86_64" from same system as above /etc/default/grub: GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet" GRUB_DISABLE_RECOVERY="true" GRUB_ENABLE_BLSCFG=true ls /etc/grub.d does not show any recent changes after the last three kernel updates so it should not be the problem. I did an lsinitrd on each initramfs followed by a vimdiff and the changes are all in kernel modules, 131 in total and the most relevant are: -rw-r--r-- 1 root root 23060 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/md/raid1.ko.xz -rw-r--r-- 1 root root 6552 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-broxton.ko.xz -rw-r--r-- 1 root root 6032 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-cannonlake.ko.xz -rw-r--r-- 1 root root 4572 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-geminilake.ko.xz -rw-r--r-- 1 root root 3768 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-jasperlake.ko.xz -rw-r--r-- 1 root root 3688 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-lewisburg.ko.xz -rw-r--r-- 1 root root 7556 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-lynxpoint.ko.xz -rw-r--r-- 1 root root 4796 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-sunrisepoint.ko.xz -rw-r--r-- 1 root root 4112 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-tigerlake.ko.xz -rw-r--r-- 1 root root 7648 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/pinctrl-amd.ko.xz 2. What is the Version-Release number of the kernel: 5.8.8-200.fc32.x86_64 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at Appeared in: 5.8.8-200.fc32.x86_64 Not present: 5.8.7-200.fc32.x86_64 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Yes. install updates, allow grubby to regen initramfs, and try to boot. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Have not tried due to space. 6. Are you running any modules that not shipped with directly Fedora's kernel?: no 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. lost after reboot, I posted the relevant messages from the console.
I just saw something similar. In my case, only my /boot filesystem in on the IMSM RAID, so it was easy enough to boot by commenting out that line in /etc/fstab. For whatever reason, it appears that the IMSM (Intel RAID) signature isn't recognized by the new kernel. Running 5.8.7-200.fc32.x86_64 (which works), I get the following: [pilcher@ian ~]$ uname -r 5.8.7-200.fc32.x86_64 [pilcher@ian ~]$ sudo mdadm --detail-platform Platform : Intel(R) Rapid Storage Technology Version : 11.2.0.1527 RAID Levels : raid0 raid1 raid10 raid5 Chunk Sizes : 4k 8k 16k 32k 64k 128k 2TB volumes : supported 2TB disks : supported Max Disks : 6 Max Volumes : 2 per array, 4 per controller I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA) Port3 : /dev/sdd (MSK5235H2PJ7TG) Port1 : /dev/sdb (43P2YEVGS) Port2 : /dev/sdc (MSK5235H29X18G) Port5 : - non-disk device (HL-DT-ST BD-RE WH16NS60) - Port0 : /dev/sda (S21CNSAG402179X) Port4 : - no device attached - [pilcher@ian ~]$ sudo mdadm --examine --verbose /dev/sdc /dev/sdc: Magic : Intel Raid ISM Cfg Sig. Version : 1.1.00 Orig Family : d7e8a7e3 Family : d7e8a7e3 Generation : 005a1992 Attributes : All supported UUID : 1ebd7712:2a74af1f:34298316:cb855b50 Checksum : d22bf36a correct MPB Sectors : 1 Disks : 2 RAID Devices : 1 Disk00 Serial : MSK5235H29X18G State : active Id : 00000002 Usable Size : 1953519616 (931.51 GiB 1000.20 GB) [Volume0]: UUID : 3d7bd72f:82a8cbcc:2d217397:12f3ff95 RAID Level : 1 Members : 2 Slots : [UU] Failed disk : none This Slot : 0 Sector Size : 512 Array Size : 1953519616 (931.51 GiB 1000.20 GB) Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB) Sector Offset : 0 Num Stripes : 7630936 Chunk Size : 64 KiB Reserved : 0 Migrate State : idle Map State : normal Dirty State : clean RWH Policy : off Disk01 Serial : MSK5235H2PJ7TG State : active Id : 00000003 Usable Size : 1953519616 (931.51 GiB 1000.20 GB) But running 5.8.8-200.fc32.x86_64 I see: [pilcher@ian system]$ uname -r 5.8.8-200.fc32.x86_64 [pilcher@ian system]$ sudo mdadm --detail-platform Platform : Intel(R) Rapid Storage Technology Version : 11.2.0.1527 RAID Levels : raid0 raid1 raid10 raid5 Chunk Sizes : 4k 8k 16k 32k 64k 128k 2TB volumes : supported 2TB disks : supported Max Disks : 6 Max Volumes : 2 per array, 4 per controller I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA) Port3 : /dev/sdd (MSK5235H2PJ7TG) Port1 : /dev/sdb (43P2YEVGS) Port2 : /dev/sdc (MSK5235H29X18G) Port5 : - non-disk device (HL-DT-ST BD-RE WH16NS60) - Port0 : /dev/sda (S21CNSAG402179X) Port4 : - no device attached - [pilcher@ian system]$ sudo mdadm --examine --verbose /dev/sdc /dev/sdc: MBR Magic : aa55 Partition[0] : 204800 sectors at 2048 (type 07) Partition[1] : 2048000 sectors at 206848 (type 83) Partition[2] : 61440000 sectors at 2254848 (type 07) Partition[3] : 1889824768 sectors at 63694848 (type 05)
Created attachment 1714949 [details] Output of 'strace mdadm --examine --verbose /dev/sdc' on kernel 5.8.7 (works)
Created attachment 1714950 [details] Output of 'strace mdadm --examine --detail /dev/sdc' on kernel 5.8.8 (doesn't work)
I've just attached the strace output of 'mdadm --examine --verbose /dev/sdc' on both kernel 5.8.7 (works) and 5.8.8 (doesn't work). The first significant difference I see is on line 185, where the error code returned by the BLKPG_DEL_PARTITION ioctl has changed from ENXIO to ENOMEM.
git bisect says: 692d0626557451c4b557397f20b7394b612d0289 is the first bad commit commit 692d0626557451c4b557397f20b7394b612d0289 Author: Christoph Hellwig <hch> Date: Tue Sep 1 11:59:41 2020 +0200 block: fix locking in bdev_del_partition [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ] We need to hold the whole device bd_mutex to protect against other thread concurrently deleting out partition before we get to it, and thus causing a use after free. Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition") Reported-by: syzbot+6448f3c229bc52b82f69.com Signed-off-by: Christoph Hellwig <hch> Signed-off-by: Jens Axboe <axboe> Signed-off-by: Sasha Levin <sashal> block/partitions/core.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-)
Check out: https://bugzilla.redhat.com/show_bug.cgi?id=1879287
And fixed by: commit 88ce2a530cc9865a894454b2e40eba5957a60e1a Author: Christoph Hellwig <hch> Date: Tue Sep 8 16:15:06 2020 +0200 block: restore a specific error code in bdev_del_partition mdadm relies on the fact that deleting an invalid partition returns -ENXIO or -ENOTTY to detect if a block device is a partition or a whole device. Fixes: 08fc1ab6d748 ("block: fix locking in bdev_del_partition") Reported-by: kernel test robot <rong.a.chen> Signed-off-by: Christoph Hellwig <hch> Signed-off-by: Jens Axboe <axboe> diff --git a/block/partitions/core.c b/block/partitions/core.c index 5b4869c08fb3..722406b841df 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -537,7 +537,7 @@ int bdev_del_partition(struct block_device *bdev, int partno) bdevp = bdget_disk(bdev->bd_disk, partno); if (!bdevp) - return -ENOMEM; + return -ENXIO; mutex_lock(&bdevp->bd_mutex); mutex_lock_nested(&bdev->bd_mutex, 1);
Fixed in kernel-5.8.9-200.fc32.x86_64. Update with 'dnf --enablerepo=updates-testing update kernel'.
Thank you Ian. That was amazingly fast.
I had a chance to install the kernel this evening and can confirm it is fixed.