1878970 – kernel-5.8.8-200.fc32.x86_64 - IMSM RAID not recognized

Bug 1878970 - kernel-5.8.8-200.fc32.x86_64 - IMSM RAID not recognized

Summary: kernel-5.8.8-200.fc32.x86_64 - IMSM RAID not recognized

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	32
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-15 05:19 UTC by Clay Jordan
Modified:	2020-09-16 01:00 UTC (History)
CC List:	21 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-09-15 22:08:55 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Output of 'strace mdadm --examine --verbose /dev/sdc' on kernel 5.8.7 (works) (35.98 KB, text/plain) 2020-09-15 14:53 UTC, Ian Pilcher	no flags	Details
Output of 'strace mdadm --examine --detail /dev/sdc' on kernel 5.8.8 (doesn't work) (15.37 KB, text/plain) 2020-09-15 14:54 UTC, Ian Pilcher	no flags	Details
View All

Description Clay Jordan 2020-09-15 05:19:48 UTC

1. Please describe the problem:

3 systems all using intel RST raid were updated to kernel 5.8.8-200.fc32.x86_64 concurrently along with a 4th not using Intel RST whose root vol is located on a stand-alone NVME. The 3 intel RST systems all failed to boot, dropping to an emergency shell after dracut-initqueue timeout. The 4th rebooted without issue.

The symptom as all the same. in the shell, journalctl shows message "cannot activate LV's in VG Fedora_carrot while PV's appear on duplicate devices.". and "lvm vgscan" reflects similar messages (not using device /dev/sdb2 for PV ...", "PV ... prefers device /dev/sda2"). 

What concerns me also is this update for me also updated grub-tools which I don't know if it's related, but it appears dracut the md device as it should. I was able to reboot all three systems on the previous kernel, 5.8.7-200.fc32.x86_64.

Per dnf history list, this update installed:

    Install  kernel-5.8.8-200.fc32.x86_64                 @updates
    Install  kernel-core-5.8.8-200.fc32.x86_64            @updates
    Install  kernel-modules-5.8.8-200.fc32.x86_64         @updates
    Install  kernel-modules-extra-5.8.8-200.fc32.x86_64   @updates
    Upgrade  grub2-common-1:2.04-22.fc32.noarch           @updates
    Upgraded grub2-common-1:2.04-21.fc32.noarch           @@System
    Upgrade  grub2-efi-x64-1:2.04-22.fc32.x86_64          @updates
    Upgraded grub2-efi-x64-1:2.04-21.fc32.x86_64          @@System
    Upgrade  grub2-tools-1:2.04-22.fc32.x86_64            @updates
    Upgraded grub2-tools-1:2.04-21.fc32.x86_64            @@System
    Upgrade  grub2-tools-efi-1:2.04-22.fc32.x86_64        @updates
    Upgraded grub2-tools-efi-1:2.04-21.fc32.x86_64        @@System
    Upgrade  grub2-tools-extra-1:2.04-22.fc32.x86_64      @updates
    Upgraded grub2-tools-extra-1:2.04-21.fc32.x86_64      @@System
    Upgrade  grub2-tools-minimal-1:2.04-22.fc32.x86_64    @updates
    Upgraded grub2-tools-minimal-1:2.04-21.fc32.x86_64    @@System
    Upgrade  kernel-headers-5.8.8-200.fc32.x86_64         @updates
    Upgraded kernel-headers-5.8.6-200.fc32.x86_64         @@System
    Removed  kernel-5.6.6-300.fc32.x86_64                 @@System
    Removed  kernel-core-5.6.6-300.fc32.x86_64            @@System
    Removed  kernel-modules-5.6.6-300.fc32.x86_64         @@System

nothing shows in grubby info for the bad vs good, example from one system:

index=0
kernel="/boot/vmlinuz-5.8.8-200.fc32.x86_64"
args="ro resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet"
root="/dev/mapper/fedora_popcorn-root"
initrd="/boot/initramfs-5.8.8-200.fc32.x86_64.img"
title="Fedora (5.8.8-200.fc32.x86_64) 32 (Thirty Two)"
id="cd37b505ac9e4ac4bf9dbad3bdd26142-5.8.8-200.fc32.x86_64"
index=1
kernel="/boot/vmlinuz-5.8.7-200.fc32.x86_64"
args="ro resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet"
root="/dev/mapper/fedora_popcorn-root"
initrd="/boot/initramfs-5.8.7-200.fc32.x86_64.img"
title="Fedora (5.8.7-200.fc32.x86_64) 32 (Thirty Two)"
id="cd37b505ac9e4ac4bf9dbad3bdd26142-5.8.7-200.fc32.x86_64"

from same system as above /etc/default/grub:

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora_popcorn-swap rd.lvm.lv=fedora_popcorn/root rd.lvm.lv=fedora_popcorn/swap nomodeset rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

ls /etc/grub.d does not show any recent changes after the last three kernel updates so it should not be the problem.

I did an lsinitrd on each initramfs followed by a vimdiff and the changes are all in kernel modules, 131 in total and the most relevant are:

-rw-r--r--   1 root     root        23060 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/md/raid1.ko.xz
-rw-r--r--   1 root     root         6552 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-broxton.ko.xz
-rw-r--r--   1 root     root         6032 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-cannonlake.ko.xz
-rw-r--r--   1 root     root         4572 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-geminilake.ko.xz
-rw-r--r--   1 root     root         3768 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-jasperlake.ko.xz
-rw-r--r--   1 root     root         3688 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-lewisburg.ko.xz
-rw-r--r--   1 root     root         7556 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-lynxpoint.ko.xz
-rw-r--r--   1 root     root         4796 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-sunrisepoint.ko.xz
-rw-r--r--   1 root     root         4112 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/intel/pinctrl-tigerlake.ko.xz
-rw-r--r--   1 root     root         7648 May 29 13:35 usr/lib/modules/5.8.8-200.fc32.x86_64/kernel/drivers/pinctrl/pinctrl-amd.ko.xz


2. What is the Version-Release number of the kernel:

5.8.8-200.fc32.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear?  Old kernels are available for download at

Appeared in: 5.8.8-200.fc32.x86_64
Not present: 5.8.7-200.fc32.x86_64


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Yes. install updates, allow grubby to regen initramfs, and try to boot.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Have not tried due to space.

6. Are you running any modules that not shipped with directly Fedora's kernel?: no


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

lost after reboot, I posted the relevant messages from the console.

Comment 1 Ian Pilcher 2020-09-15 14:11:31 UTC

I just saw something similar.  In my case, only my /boot filesystem in on the IMSM RAID, so it was easy enough to boot by commenting out that line in /etc/fstab.

For whatever reason, it appears that the IMSM (Intel RAID) signature isn't recognized by the new kernel.  Running 5.8.7-200.fc32.x86_64 (which works), I get the following:

[pilcher@ian ~]$ uname -r
5.8.7-200.fc32.x86_64

[pilcher@ian ~]$ sudo mdadm --detail-platform
       Platform : Intel(R) Rapid Storage Technology
        Version : 11.2.0.1527
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : supported
      Max Disks : 6
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
          Port3 : /dev/sdd (MSK5235H2PJ7TG)
          Port1 : /dev/sdb (43P2YEVGS)
          Port2 : /dev/sdc (MSK5235H29X18G)
          Port5 : - non-disk device (HL-DT-ST BD-RE  WH16NS60) -
          Port0 : /dev/sda (S21CNSAG402179X)
          Port4 : - no device attached -

[pilcher@ian ~]$ sudo mdadm --examine --verbose /dev/sdc
/dev/sdc:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.1.00
    Orig Family : d7e8a7e3
         Family : d7e8a7e3
     Generation : 005a1992
     Attributes : All supported
           UUID : 1ebd7712:2a74af1f:34298316:cb855b50
       Checksum : d22bf36a correct
    MPB Sectors : 1
          Disks : 2
   RAID Devices : 1

  Disk00 Serial : MSK5235H29X18G
          State : active
             Id : 00000002
    Usable Size : 1953519616 (931.51 GiB 1000.20 GB)

[Volume0]:
           UUID : 3d7bd72f:82a8cbcc:2d217397:12f3ff95
     RAID Level : 1
        Members : 2
          Slots : [UU]
    Failed disk : none
      This Slot : 0
    Sector Size : 512
     Array Size : 1953519616 (931.51 GiB 1000.20 GB)
   Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
  Sector Offset : 0
    Num Stripes : 7630936
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : idle
      Map State : normal
    Dirty State : clean
     RWH Policy : off

  Disk01 Serial : MSK5235H2PJ7TG
          State : active
             Id : 00000003
    Usable Size : 1953519616 (931.51 GiB 1000.20 GB)


But running 5.8.8-200.fc32.x86_64 I see:

[pilcher@ian system]$ uname -r
5.8.8-200.fc32.x86_64

[pilcher@ian system]$ sudo mdadm --detail-platform
       Platform : Intel(R) Rapid Storage Technology
        Version : 11.2.0.1527
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : supported
      Max Disks : 6
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
          Port3 : /dev/sdd (MSK5235H2PJ7TG)
          Port1 : /dev/sdb (43P2YEVGS)
          Port2 : /dev/sdc (MSK5235H29X18G)
          Port5 : - non-disk device (HL-DT-ST BD-RE  WH16NS60) -
          Port0 : /dev/sda (S21CNSAG402179X)
          Port4 : - no device attached -

[pilcher@ian system]$ sudo mdadm --examine --verbose /dev/sdc
/dev/sdc:
   MBR Magic : aa55
Partition[0] :       204800 sectors at         2048 (type 07)
Partition[1] :      2048000 sectors at       206848 (type 83)
Partition[2] :     61440000 sectors at      2254848 (type 07)
Partition[3] :   1889824768 sectors at     63694848 (type 05)

Comment 2 Ian Pilcher 2020-09-15 14:53:35 UTC

Created attachment 1714949 [details]
Output of 'strace mdadm --examine --verbose /dev/sdc' on kernel 5.8.7 (works)

Comment 3 Ian Pilcher 2020-09-15 14:54:36 UTC

Created attachment 1714950 [details]
Output of 'strace mdadm --examine --detail /dev/sdc' on kernel 5.8.8 (doesn't work)

Comment 4 Ian Pilcher 2020-09-15 15:02:52 UTC

I've just attached the strace output of 'mdadm --examine --verbose /dev/sdc' on both kernel 5.8.7 (works) and 5.8.8 (doesn't work).

The first significant difference I see is on line 185, where the error code returned by the BLKPG_DEL_PARTITION ioctl has changed from ENXIO to ENOMEM.

Comment 5 Ian Pilcher 2020-09-15 20:48:25 UTC

git bisect says:

692d0626557451c4b557397f20b7394b612d0289 is the first bad commit
commit 692d0626557451c4b557397f20b7394b612d0289
Author: Christoph Hellwig <hch>
Date:   Tue Sep 1 11:59:41 2020 +0200

    block: fix locking in bdev_del_partition
    
    [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
    
    We need to hold the whole device bd_mutex to protect against
    other thread concurrently deleting out partition before we get
    to it, and thus causing a use after free.
    
    Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
    Reported-by: syzbot+6448f3c229bc52b82f69.com
    Signed-off-by: Christoph Hellwig <hch>
    Signed-off-by: Jens Axboe <axboe>
    Signed-off-by: Sasha Levin <sashal>

 block/partitions/core.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

Comment 6 Michael Riss 2020-09-15 21:25:01 UTC

Check out: https://bugzilla.redhat.com/show_bug.cgi?id=1879287

Comment 7 Ian Pilcher 2020-09-15 21:44:59 UTC

And fixed by:

commit 88ce2a530cc9865a894454b2e40eba5957a60e1a
Author: Christoph Hellwig <hch>
Date:   Tue Sep 8 16:15:06 2020 +0200

    block: restore a specific error code in bdev_del_partition
    
    mdadm relies on the fact that deleting an invalid partition returns
    -ENXIO or -ENOTTY to detect if a block device is a partition or a
    whole device.
    
    Fixes: 08fc1ab6d748 ("block: fix locking in bdev_del_partition")
    Reported-by: kernel test robot <rong.a.chen>
    Signed-off-by: Christoph Hellwig <hch>
    Signed-off-by: Jens Axboe <axboe>

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 5b4869c08fb3..722406b841df 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -537,7 +537,7 @@ int bdev_del_partition(struct block_device *bdev, int partno)
 
        bdevp = bdget_disk(bdev->bd_disk, partno);
        if (!bdevp)
-               return -ENOMEM;
+               return -ENXIO;
 
        mutex_lock(&bdevp->bd_mutex);
        mutex_lock_nested(&bdev->bd_mutex, 1);

Comment 8 Ian Pilcher 2020-09-15 22:08:55 UTC

Fixed in kernel-5.8.9-200.fc32.x86_64.  Update with 'dnf --enablerepo=updates-testing update kernel'.

Comment 9 Clay Jordan 2020-09-15 22:36:36 UTC

Thank you Ian. That was amazingly fast.

Comment 10 Clay Jordan 2020-09-16 01:00:03 UTC

I had a chance to install the kernel this evening and can confirm it is fixed.

Note You need to log in before you can comment on or make changes to this bug.

acaringi
airlied
bskeggs
hdegoede
ichavero
ipilcher
itamar
jarodwilson
jeremy
jglisse
john.j5live
jonathan
josef
kernel-maint
lgoncalv
linville
masami256
mchehab
Michael.Riss
mjg59
steved