Bug 1878970 - kernel-5.8.8-200.fc32.x86_64 - IMSM RAID not recognized
Summary: kernel-5.8.8-200.fc32.x86_64 - IMSM RAID not recognized
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-15 05:19 UTC by Clay Jordan
Modified: 2020-09-16 01:00 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-15 22:08:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Output of 'strace mdadm --examine --verbose /dev/sdc' on kernel 5.8.7 (works) (35.98 KB, text/plain)
2020-09-15 14:53 UTC, Ian Pilcher
no flags Details
Output of 'strace mdadm --examine --detail /dev/sdc' on kernel 5.8.8 (doesn't work) (15.37 KB, text/plain)
2020-09-15 14:54 UTC, Ian Pilcher
no flags Details

Description Clay Jordan 2020-09-15 05:19:48 UTC
1. Please describe the problem:

3 systems all using intel RST raid were updated to kernel 5.8.8-200.fc32.x86_64 concurrently along with a 4th not using Intel RST whose root vol is located on a stand-alone NVME. The 3 intel RST systems all failed to boot, dropping to an emergency shell after dracut-initqueue timeout. The 4th rebooted without issue.

The symptom as all the same. in the shell, journalctl shows message "cannot activate LV's in VG Fedora_carrot while PV's appear on duplicate devices.". and "lvm vgscan" reflects similar messages (not using device /dev/sdb2 for PV ...", "PV ... prefers device /dev/sda2"). 

What concerns me also is this update for me also updated grub-tools which I don't know if it's related, but it appears dracut the md device as it should. I was able to reboot all three systems on the previous kernel, 5.8.7-200.fc32.x86_64.

Per dnf history list, this update installed:

    Install  kernel-5.8.8-200.fc32.x86_64                 @updates
    Install  kernel-core-5.8.8-200.fc32.x86_64            @updates
    Install  kernel-modules-5.8.8-200.fc32.x86_64         @updates
    Install  kernel-modules-extra-5.8.8-200.fc32.x86_64   @updates
    Upgrade  grub2-common-1:2.04-22.fc32.noarch           @updates
    Upgraded grub2-common-1:2.04-21.fc32.noarch           @@System
    Upgrade  grub2-efi-x64-1:2.04-22.fc32.x86_64          @updates
    Upgraded grub2-efi-x64-1:2.04-21.fc32.x86_64          @@System
    Upgrade  grub2-tools-1:2.04-22.fc32.x86_64            @updates
    Upgraded grub2-tools-1:2.04-21.fc32.x86_64            @@System
    Upgrade  grub2-tools-efi-1:2.04-22.fc32.x86_64        @updates
    Upgraded grub2-tools-efi-1:2.04-21.fc32.x86_64        @@System
    Upgrade  grub2-tools-extra-1:2.04-22.fc32.x86_64      @updates
    Upgraded grub2-tools-extra-1:2.04-21.fc32.x86_64      @@System
    Upgrade  grub2-tools-minimal-1:2.04-22.fc32.x86_64    @updates
    Upgraded grub2-tools-minimal-1:2.04-21.fc32.x86_64    @@System
    Upgrade  kernel-headers-5.8.8-200.fc32.x86_64         @updates
    Upgraded kernel-headers-5.8.6-200.fc32.x86_64         @@System
    Removed  kernel-5.6.6-300.fc32.x86_64                 @@System
    Removed  kernel-core-5.6.6-300.fc32.x86_64            @@System
    Removed  kernel-modules-5.6.6-300.fc32.x86_64         @@System

nothing shows in grubby info for the bad vs good, example from one system:

index=0
kernel="/boot/vmlinuz-5.8.8-200.fc32.x86_64"
args="ro resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet"
root="/dev/mapper/fedora_popcorn-root"
initrd="/boot/initramfs-5.8.8-200.fc32.x86_64.img"
title="Fedora (5.8.8-200.fc32.x86_64) 32 (Thirty Two)"
id="cd37b505ac9e4ac4bf9dbad3bdd26142-5.8.8-200.fc32.x86_64"
index=1
kernel="/boot/vmlinuz-5.8.7-200.fc32.x86_64"
args="ro resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet"
root="/dev/mapper/fedora_popcorn-root"
initrd="/boot/initramfs-5.8.7-200.fc32.x86_64.img"
title="Fedora (5.8.7-200.fc32.x86_64) 32 (Thirty Two)"
id="cd37b505ac9e4ac4bf9dbad3bdd26142-5.8.7-200.fc32.x86_64"

from same system as above /etc/default/grub:

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

ls /etc/grub.d does not show any recent changes after the last three kernel updates so it should not be the problem.

I did an lsinitrd on each initramfs followed by a vimdiff and the changes are all in kernel modules, 131 in total and the most relevant are:

-rw-r--r--   1 root     root        23060 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/md/raid1.ko.xz
-rw-r--r--   1 root     root         6552 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-broxton.ko.xz
-rw-r--r--   1 root     root         6032 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-cannonlake.ko.xz
-rw-r--r--   1 root     root         4572 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-geminilake.ko.xz
-rw-r--r--   1 root     root         3768 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-jasperlake.ko.xz
-rw-r--r--   1 root     root         3688 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-lewisburg.ko.xz
-rw-r--r--   1 root     root         7556 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-lynxpoint.ko.xz
-rw-r--r--   1 root     root         4796 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-sunrisepoint.ko.xz
-rw-r--r--   1 root     root         4112 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-tigerlake.ko.xz
-rw-r--r--   1 root     root         7648 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/pinctrl-amd.ko.xz


2. What is the Version-Release number of the kernel:

5.8.8-200.fc32.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear?  Old kernels are available for download at

Appeared in: 5.8.8-200.fc32.x86_64
Not present: 5.8.7-200.fc32.x86_64


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Yes. install updates, allow grubby to regen initramfs, and try to boot.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Have not tried due to space.

6. Are you running any modules that not shipped with directly Fedora's kernel?: no


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

lost after reboot, I posted the relevant messages from the console.

Comment 1 Ian Pilcher 2020-09-15 14:11:31 UTC
I just saw something similar.  In my case, only my /boot filesystem in on the IMSM RAID, so it was easy enough to boot by commenting out that line in /etc/fstab.

For whatever reason, it appears that the IMSM (Intel RAID) signature isn't recognized by the new kernel.  Running 5.8.7-200.fc32.x86_64 (which works), I get the following:

[pilcher@ian ~]$ uname -r
5.8.7-200.fc32.x86_64

[pilcher@ian ~]$ sudo mdadm --detail-platform
       Platform : Intel(R) Rapid Storage Technology
        Version : 11.2.0.1527
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : supported
      Max Disks : 6
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
          Port3 : /dev/sdd (MSK5235H2PJ7TG)
          Port1 : /dev/sdb (43P2YEVGS)
          Port2 : /dev/sdc (MSK5235H29X18G)
          Port5 : - non-disk device (HL-DT-ST BD-RE  WH16NS60) -
          Port0 : /dev/sda (S21CNSAG402179X)
          Port4 : - no device attached -

[pilcher@ian ~]$ sudo mdadm --examine --verbose /dev/sdc
/dev/sdc:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.1.00
    Orig Family : d7e8a7e3
         Family : d7e8a7e3
     Generation : 005a1992
     Attributes : All supported
           UUID : 1ebd7712:2a74af1f:34298316:cb855b50
       Checksum : d22bf36a correct
    MPB Sectors : 1
          Disks : 2
   RAID Devices : 1

  Disk00 Serial : MSK5235H29X18G
          State : active
             Id : 00000002
    Usable Size : 1953519616 (931.51 GiB 1000.20 GB)

[Volume0]:
           UUID : 3d7bd72f:82a8cbcc:2d217397:12f3ff95
     RAID Level : 1
        Members : 2
          Slots : [UU]
    Failed disk : none
      This Slot : 0
    Sector Size : 512
     Array Size : 1953519616 (931.51 GiB 1000.20 GB)
   Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
  Sector Offset : 0
    Num Stripes : 7630936
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : idle
      Map State : normal
    Dirty State : clean
     RWH Policy : off

  Disk01 Serial : MSK5235H2PJ7TG
          State : active
             Id : 00000003
    Usable Size : 1953519616 (931.51 GiB 1000.20 GB)


But running 5.8.8-200.fc32.x86_64 I see:

[pilcher@ian system]$ uname -r
5.8.8-200.fc32.x86_64

[pilcher@ian system]$ sudo mdadm --detail-platform
       Platform : Intel(R) Rapid Storage Technology
        Version : 11.2.0.1527
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : supported
      Max Disks : 6
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
          Port3 : /dev/sdd (MSK5235H2PJ7TG)
          Port1 : /dev/sdb (43P2YEVGS)
          Port2 : /dev/sdc (MSK5235H29X18G)
          Port5 : - non-disk device (HL-DT-ST BD-RE  WH16NS60) -
          Port0 : /dev/sda (S21CNSAG402179X)
          Port4 : - no device attached -

[pilcher@ian system]$ sudo mdadm --examine --verbose /dev/sdc
/dev/sdc:
   MBR Magic : aa55
Partition[0] :       204800 sectors at         2048 (type 07)
Partition[1] :      2048000 sectors at       206848 (type 83)
Partition[2] :     61440000 sectors at      2254848 (type 07)
Partition[3] :   1889824768 sectors at     63694848 (type 05)

Comment 2 Ian Pilcher 2020-09-15 14:53:35 UTC
Created attachment 1714949 [details]
Output of 'strace mdadm --examine --verbose /dev/sdc' on kernel 5.8.7 (works)

Comment 3 Ian Pilcher 2020-09-15 14:54:36 UTC
Created attachment 1714950 [details]
Output of 'strace mdadm --examine --detail /dev/sdc' on kernel 5.8.8 (doesn't work)

Comment 4 Ian Pilcher 2020-09-15 15:02:52 UTC
I've just attached the strace output of 'mdadm --examine --verbose /dev/sdc' on both kernel 5.8.7 (works) and 5.8.8 (doesn't work).

The first significant difference I see is on line 185, where the error code returned by the BLKPG_DEL_PARTITION ioctl has changed from ENXIO to ENOMEM.

Comment 5 Ian Pilcher 2020-09-15 20:48:25 UTC
git bisect says:

692d0626557451c4b557397f20b7394b612d0289 is the first bad commit
commit 692d0626557451c4b557397f20b7394b612d0289
Author: Christoph Hellwig <hch>
Date:   Tue Sep 1 11:59:41 2020 +0200

    block: fix locking in bdev_del_partition
    
    [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
    
    We need to hold the whole device bd_mutex to protect against
    other thread concurrently deleting out partition before we get
    to it, and thus causing a use after free.
    
    Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
    Reported-by: syzbot+6448f3c229bc52b82f69.com
    Signed-off-by: Christoph Hellwig <hch>
    Signed-off-by: Jens Axboe <axboe>
    Signed-off-by: Sasha Levin <sashal>

 block/partitions/core.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

Comment 6 Michael Riss 2020-09-15 21:25:01 UTC
Check out: https://bugzilla.redhat.com/show_bug.cgi?id=1879287

Comment 7 Ian Pilcher 2020-09-15 21:44:59 UTC
And fixed by:

commit 88ce2a530cc9865a894454b2e40eba5957a60e1a
Author: Christoph Hellwig <hch>
Date:   Tue Sep 8 16:15:06 2020 +0200

    block: restore a specific error code in bdev_del_partition
    
    mdadm relies on the fact that deleting an invalid partition returns
    -ENXIO or -ENOTTY to detect if a block device is a partition or a
    whole device.
    
    Fixes: 08fc1ab6d748 ("block: fix locking in bdev_del_partition")
    Reported-by: kernel test robot <rong.a.chen>
    Signed-off-by: Christoph Hellwig <hch>
    Signed-off-by: Jens Axboe <axboe>

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 5b4869c08fb3..722406b841df 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -537,7 +537,7 @@ int bdev_del_partition(struct block_device *bdev, int partno)
 
        bdevp = bdget_disk(bdev->bd_disk, partno);
        if (!bdevp)
-               return -ENOMEM;
+               return -ENXIO;
 
        mutex_lock(&bdevp->bd_mutex);
        mutex_lock_nested(&bdev->bd_mutex, 1);

Comment 8 Ian Pilcher 2020-09-15 22:08:55 UTC
Fixed in kernel-5.8.9-200.fc32.x86_64.  Update with 'dnf --enablerepo=updates-testing update kernel'.

Comment 9 Clay Jordan 2020-09-15 22:36:36 UTC
Thank you Ian. That was amazingly fast.

Comment 10 Clay Jordan 2020-09-16 01:00:03 UTC
I had a chance to install the kernel this evening and can confirm it is fixed.


Note You need to log in before you can comment on or make changes to this bug.