Bug 2362355 - Since kernel-6.15.0-0.rc3.20250421git9d7a0577c9db.28.fc43 system fails to shut down after software RAID install
Summary: Since kernel-6.15.0-0.rc3.20250421git9d7a0577c9db.28.fc43 system fails to shu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks: BetaBlocker, F43BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2025-04-25 19:02 UTC by Adam Williamson
Modified: 2025-05-01 15:23 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2025-05-01 15:23:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journal from 'good' case (booted installer from 20250421.n.0 nightly, storage evaluation completed) (799.31 KB, text/plain)
2025-04-25 19:10 UTC, Adam Williamson
no flags Details
journal from 'bad' case (booted installer from 20250425.n.0 nightly, storage evaluation hung) (727.96 KB, text/plain)
2025-04-25 19:10 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2025-04-25 19:02:10 UTC
1. Please describe the problem:
In openQA testing, since Fedora-Rawhide-20250422.n.0 , all software RAID install tests fail because after install is complete, when we click the Reboot System button to complete install, shutdown never completes and the system never reboots.

In manual testing, I can reproduce this. Additionally, if I force a reboot and boot the installer again now the RAID devices have been created, the installer gets stuck with Installation Destination showing "Probing storage...", and if I run `ps` at a console, it hangs forever.

2. What is the Version-Release number of the kernel:
6.15.0-0.rc3.20250421git9d7a0577c9db.28.fc43

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Yes. kernel-6.15.0-0.rc3.20250421git9d7a0577c9db.28.fc43 is the first affected kernel.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Yes. Get a recent Rawhide installer image (Everything netinst or Server DVD), attach it to a VM with two VirtIO disks, run the installer. Go to Installation Destination. Select both disks, and do "custom" partitioning. Delete all existing partitions on both disks. Select "standard partitioning" and then "create partitions for me". Now select the root partition and change it to be RAID type. Then complete the install. After install completes, click Reboot System, and the system will hang. Now you can try booting the installer again and see it will not complete storage probing.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Happens at least up to kernel-6.15.0-0.rc3.20250424gita79be02bba5c.31.fc43 in today's Rawhide.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
It's hard to get logs from the hang on shutdown, but I have logs from the 'anaconda gets stuck during device evaluation' case, which I'll attach. This is where it looks like anaconda/blivet/libblockdev get stuck. Here's logs from the 'good' case:

Apr 25 18:38:00 fedora org.fedoraproject.Anaconda.Modules.Storage[2384]: INFO:program:Running... umount /mnt/sysimage
Apr 25 18:38:00 fedora kernel: EXT4-fs (vda2): orphan cleanup on readonly fs
Apr 25 18:38:00 fedora kernel: EXT4-fs (vda2): mounted filesystem f7fd2290-706e-401f-85c4-d8ca9e210e55 ro with ordered data mode. Quota mode: none.
Apr 25 18:38:00 fedora systemd[1]: mnt-sysimage.mount: Deactivated successfully.
Apr 25 18:38:00 fedora kernel: EXT4-fs (vda2): unmounting filesystem f7fd2290-706e-401f-85c4-d8ca9e210e55.
Apr 25 18:38:00 fedora org.fedoraproject.Anaconda.Modules.Storage[2384]: DEBUG:program:Return code: 0
Apr 25 18:38:00 fedora org.fedoraproject.Anaconda.Modules.Storage[2384]: DEBUG:blivet:              PartitionDevice.teardown: vda2 ; status: True ; controllable: True ;
Apr 25 18:38:00 fedora org.fedoraproject.Anaconda.Modules.Storage[2384]: DEBUG:blivet:                  Ext4FS.teardown: device: /dev/vda2 ; type: ext4 ; status: False ;
Apr 25 18:38:00 fedora org.fedoraproject.Anaconda.Modules.Storage[2384]: DEBUG:blivet:                  Ext4FS.teardown: device: /dev/vda2 ; type: ext4 ; status: False ;
Apr 25 18:38:00 fedora org.fedoraproject.Anaconda.Modules.Storage[2384]: DEBUG:blivet:              MDRaidArrayDevice.setup: root ; orig: False ; status: False ; controllable: True ;
Apr 25 18:38:00 fedora org.fedoraproject.Anaconda.Modules.Storage[2384]: DEBUG:blivet:                  MDRaidArrayDevice.setup_parents: name: root ; orig: False ;
...

Here's the same point in the 'bad' case:

Apr 25 18:49:18 fedora org.fedoraproject.Anaconda.Modules.Storage[2398]: INFO:program:Running... umount /mnt/sysimage
Apr 25 18:49:18 fedora systemd[1]: mnt-sysimage.mount: Deactivated successfully.
Apr 25 18:49:18 fedora kernel: EXT4-fs (vda2): unmounting filesystem f7fd2290-706e-401f-85c4-d8ca9e210e55.
Apr 25 18:49:18 fedora org.fedoraproject.Anaconda.Modules.Storage[2398]: DEBUG:program:Return code: 0
Apr 25 18:49:18 fedora org.fedoraproject.Anaconda.Modules.Storage[2398]: DEBUG:blivet:              PartitionDevice.teardown: vda2 ; status: True ; controllable: True ;
Apr 25 18:49:18 fedora org.fedoraproject.Anaconda.Modules.Storage[2398]: DEBUG:blivet:                  Ext4FS.teardown: device: /dev/vda2 ; type: ext4 ; status: False ;
Apr 25 18:49:18 fedora org.fedoraproject.Anaconda.Modules.Storage[2398]: DEBUG:blivet:                  Ext4FS.teardown: device: /dev/vda2 ; type: ext4 ; status: False ;

and that's the last message with a 'DEBUG:blivet' string in it, we're stuck there. In the 'good' logs the next message is "MDRaidArrayDevice.setup: root ; orig: False ; status: False ; controllable: True", but we never get that one in the bad case.

Neither anaconda nor blivet nor libblockdev nor mdadm changed in the affected compose, though.

Comment 1 Adam Williamson 2025-04-25 19:07:27 UTC
Proposing as a Beta blocker as a violation of "When using both the installer-native and the blivet-gui-based custom partitioning flow on the GTK-based installer, and the Cockpit-based "storage editor" flow on the webui-based installer, the installer must be able to: ... Correctly interpret, and modify as described below, any disk with a valid ms-dos or gpt disk label and partition table containing ext4 partitions, LVM and/or btrfs volumes, and/or software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions" , because it hangs trying to interpret a disk with software RAID arrays.

Comment 2 Adam Williamson 2025-04-25 19:10:06 UTC
Created attachment 2087274 [details]
journal from 'good' case (booted installer from 20250421.n.0 nightly, storage evaluation completed)

Comment 3 Adam Williamson 2025-04-25 19:10:39 UTC
Created attachment 2087275 [details]
journal from 'bad' case (booted installer from 20250425.n.0 nightly, storage evaluation hung)

Comment 4 Adam Williamson 2025-05-01 15:23:23 UTC
It looks like the rc4 build in today's Rawhide fixed this, the openQA RAID tests all passed.


Note You need to log in before you can comment on or make changes to this bug.