Bug 2281423 - After upgrade to dracut-101, boot sequence stalls after entering LUKS passphrase, gdm chooser never appears
Summary: After upgrade to dracut-101, boot sequence stalls after entering LUKS passphr...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 40
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-05-18 18:35 UTC by Brian Morrison
Modified: 2025-05-20 08:48 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2025-05-20 08:48:49 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Brian Morrison 2024-05-18 18:35:06 UTC
After upgrading to kernel-6.8.10 next boot failed to reach gdm chooser after LUKS passphrase was accepted. Booting to previously installed kernel-6.8.9 which was installed with dracut-060 continued working normally.

Reverting to dracut-060 from dracut-101 and reinstalling kernel-6.8.10 causes this kernel to boot normally and gdm chooser appears after LUKS passphrase has been entered.

Reproducible: Always

Steps to Reproduce:
1.Upgrade kernel on a system with LUKS encrypted partition
2.Reboot to new kernel, enter LUKS passphrase
3.System boot stalls before gdm chooser appears.
Actual Results:  
Boot stalls before reaching gdm chooser screen if kernel upgrade is done with dracut-101 installed.

Expected Results:  
With dracut-060 newly installed kernels boot through to gdm chooser screen, this does not happen with dracut-101 packages installed.

Continued to create a functional initramfs on systems with LUKS-encrypted partitions as per dracut-060.

Another non-LUKS system with dracut-101 boots to the gdm chooser screen with newly installed kernels.

Comment 1 Brian Morrison 2024-05-21 13:50:54 UTC
https://bodhi.fedoraproject.org/updates/FEDORA-2024-92664ae6fe#comment-3530856

This bodhi comment may be relevant to the problem, has the partition UUID handling changed in dracut-101?

Comment 2 Christian Stadelmann 2024-05-23 20:32:14 UTC
(In reply to Brian Morrison from comment #1)
> https://bodhi.fedoraproject.org/updates/FEDORA-2024-92664ae6fe#comment-
> 3530856
> 
> This bodhi comment may be relevant to the problem, has the partition UUID
> handling changed in dracut-101?

The author of that comment later reverted their comment (same page below)

When you booted back into an older kernel, can you access syslog (`journalctl -b -1`) of the failed boot attempt? Does it show any relevant info?

Comment 3 Brian Morrison 2024-05-23 22:15:24 UTC
Yes, I saw that.

I have just had a look, the journal is well over 0.5 million lines so it took a while.

Unfortunately when I get to the end of the file using journalctl -b -1 all I find is this:

huge <snip>
....
May 23 22:26:40 deangelis.fenrir.org.uk systemd-shutdown[1]: Using hardware watchdog 'iTCO_wdt', version 6, device /dev/watchdog0
May 23 22:26:40 deangelis.fenrir.org.uk systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
May 23 22:26:40 deangelis.fenrir.org.uk systemd-shutdown[1]: Syncing filesystems and block devices.
May 23 22:26:40 deangelis.fenrir.org.uk systemd-shutdown[1]: Sending SIGTERM to remaining processes...
May 23 22:26:40 deangelis.fenrir.org.uk systemd-journald[2697]: Received SIGTERM from PID 1 (systemd-shutdow).
May 23 22:26:40 deangelis.fenrir.org.uk dnsmasq[5429]: exiting on receipt of SIGTERM
May 23 22:26:40 deangelis.fenrir.org.uk systemd-journald[2697]: Journal stopped

Whatever is happening the journal seems not to show anything useful. The boot start time for the new kernel was around 22.30 today, so nothing in it seems relevant.

Something a bit odd happened. I updated to dracut-101 again and then I installed kernel-6.9.1-200, again this wouldn't reach the gdm chooser. I tried again deliberately putting in the wrong LUKS pass phrase, I immediately got the pass phrase entry box back, so that part seems to work.

What is odd is that I then tried to boot kernel-6.8.10-300 which I had reinstalled when I downgraded to dracut-060, that booted and has been fine for several days. However after the kernel-6.9.1-200 update I found that this now would not boot to the gdm chooser and it displayed the same behaviour with the LUKS passphrase, correctly showing the pass phrase box again with an incorrect pass phrase but stalling after a correct pass phrase. I don't think I did anything to this kernel version, except installing the kernel-uki-virt package. Maybe that caused a rebuild of the initrd using the latest dracut?

Sorry that I am not able to help more, I am no expert about this whole area, I know very little about the systemd/dracut/kernel/LUKS interactions.

Comment 4 Brian Morrison 2024-05-23 23:00:48 UTC
Further update.

Downgraded system to dracut-060, removed kernel-6.8.10 and kernel-6.9.1.

Reinstalled kernel-6.8.10-300, rebooted, entered LUKS passphrase, gdm chooser screen appeared.

Reinstalled kernel-6.9.1-200, rebooted, entered LUKS passphrase, gdm chooser screen appeared.

System now operating normally. Other F40 system without LUKS partition(s) works with both of these kernel versions which were installed after dracut-101 packages were updated.

Please let me know how I can help to identify what is happening.

Comment 5 Brian Morrison 2024-05-26 22:10:39 UTC
I now have kernel-6.9.2-200 and it boots to the gdm chooser with dracut-060 packages installed.

I can investigate with dracut-101 but without some guidance I really don't know how to identify where the problem lies.

Comment 6 Brian Morrison 2024-05-30 21:39:24 UTC
Now on kernel-6.9.3-200, also boots to the gdm chooser. I have left dracut packages alone, they're excluded in dnf.conf at present.

Comment 7 js314592 2024-06-01 09:00:10 UTC
on my PC, it seems systemd is trying to mount partition with UUID which is used in resume=UUID=

Comment 8 js314592 2024-06-01 09:20:28 UTC
It seems it is old swap partition. Now system is only using swap on zram.

Comment 9 Brian Morrison 2024-06-01 10:42:02 UTC
I don't really follow how this all works, if this is what is happening how do I identify what is wrong and fix it?

I can't find any debug output that helps me at present.

Comment 10 MicMor 2024-06-01 12:28:57 UTC
Boot fails with kernel-6.8.12, dracut-101 and partition luks. Boot fails before reaching the LUKS password

Same with kernel 6.8-11, but it’s ok with kernel-6.8.10

Identical configuration


problem with dracut or something?

Comment 11 Brian Morrison 2024-06-01 13:00:53 UTC
OK, so I found how to show the boot progress (ESC key).

The last entry reads:

[  *** ] Job dev-mapper-fedora_localhost\x2d\x2dlive\x2dswap.device/start running (28min 18s / no limit)

and this continues until I ctrl-alt-del to reboot again. This is only after the initrd is created using dracut-101, it works normally with dracut-060 initrds.

If this is something to do with a swap partition, how do I find and correct/delete the incorrect UUID? As far as I can tell the swap is using zram:

bdm@deangelis:~$ swapon --show
NAME       TYPE      SIZE USED PRIO
/dev/zram0 partition   8G   0B  100

This system originally had Fedora 34, it has been updated via each new Fedora release to Fedora 40, so there could be a fair amount of cruft left around as Fedora has evolved over the last 4 years.

Comment 12 Brian Morrison 2024-06-01 14:21:49 UTC
I used lsinitrd on the two initramfs files and compared the results.

The thing that stands out is this:

Version: dracut-060-1.fc40

dracut modules:
bash
systemd
systemd-initrd
systemd-sysusers
nss-softokn
dbus-broker
rngd
dbus
i18n
network-manager
network
ifcfg
drm
plymouth
clevis
clevis-pin-null
clevis-pin-sss
clevis-pin-tang
clevis-pin-tpm2
btrfs
crypt
dm
kernel-modules
kernel-modules-extra
kernel-network-modules
rootfs-block
terminfo
udev-rules
dracut-systemd
usrmount
base
fs-lib
memstrack
shutdown


Version: dracut-101-1.fc40

dracut modules:
bash
systemd
systemd-initrd
systemd-sysusers
nss-softokn
dbus-broker
rngd
dbus
i18n
network-manager
network
net-lib
drm
plymouth
clevis
clevis-pin-null
clevis-pin-sss
clevis-pin-tang
clevis-pin-tpm2
btrfs
crypt
dm
kernel-modules
kernel-modules-extra
kernel-network-modules
resume
rootfs-block
terminfo
udev-rules
dracut-systemd
usrmount
base
fs-lib
memstrack
shutdown


There is one more module with dracut-101, which is the 'resume' module, which appears to be the one causing the problem.

I then created a noresume.conf file in /etc/dracut.conf.d with the contents:

omit_dracutmodules+=" resume "

and then rebuilt the initramfs file for kernel-6.8.12-200 which now boots to the gdm chooser.

It looks like something is broken with the inclusion of the resume module in dracut-101, but it could also be something to do with partition UUIDs changing.

Perhaps someone with greater knowledge can determine whether it's a user error or a buggy package update.

Comment 13 Brian Morrison 2024-06-01 14:39:04 UTC
The same workaround for kernel-6.9.3-200 is also successful.

The files removed from the initramfs are:

-rwxr-xr-x   1 root     root        28688 May 16 01:00 usr/lib/systemd/systemd-hibernate-resume
-rwxr-xr-x   1 root     root        28768 May 16 01:00 usr/lib/systemd/system-generators/systemd-hibernate-resume-generator
-rw-r--r--   1 root     root          666 May 16 01:00 usr/lib/systemd/system/systemd-hibernate-resume.service

so this could also be related to changes in the systemd packages that were mentioned in https://bodhi.fedoraproject.org/updates/FEDORA-2024-92664ae6fe#comment-3530856 as:

"Specifically, systemd-hibernate-resume.service is started by systems 255.6-1.fc40 but not by 255.4-1.fc40."

This is way beyond my knowledge now, but I hope this steers a resolution to the correct people.

Comment 14 Brian Morrison 2024-06-02 16:37:45 UTC
The kernel command line resume argument reads:

resume=/dev/mapper/fedora_localhost--live-swap

Then:

$ ll /dev/mapper/
total 0
crw-------. 1 root root 10, 236 Jun  1 15:41 control
lrwxrwxrwx. 1 root root       7 Jun  1 15:41 fedora_localhost--live-home -> ../dm-2
lrwxrwxrwx. 1 root root       7 Jun  1 15:41 fedora_localhost--live-root -> ../dm-3
lrwxrwxrwx. 1 root root       7 Jun  1 15:41 fedora_localhost--live-swap -> ../dm-1
lrwxrwxrwx. 1 root root       7 Jun  1 15:41 luks-8f7e929f-c3f1-4a33-8e54-dc87f811fe2a -> ../dm-0

and:

# blkid /dev/dm-1
/dev/dm-1: UUID="21f07df4-b45c-4403-b7ff-dd14e1f467b0" TYPE="swap"

finally:

# blkid -U 21f07df4-b45c-4403-b7ff-dd14e1f467b0
/dev/mapper/fedora_localhost--live-swap

That all seems consistent to me.

Maybe someone can make sense out of why the resume job never completes.

Comment 15 germ.van.eck 2024-06-11 08:51:06 UTC
I have the same issue. Let me know if I can help in testing a solution. The workaround works for me.

Comment 16 js314592 2024-06-23 09:39:03 UTC
I removed resume from /etc/kernel/cmdline as fix

Comment 17 Brun Gionni 2024-07-21 22:25:28 UTC
Same issue here.

Comment 18 Brian Morrison 2024-07-22 11:44:19 UTC
dracut-102 is now available, I have not yet been able to test with my https://bugzilla.redhat.com/show_bug.cgi?id=2281423#c12 workaround disabled.

Will report back when I have.

Comment 19 Brian Morrison 2024-07-23 20:41:10 UTC
With dracut-102 installed everything is the same after a kernel-6.9.10-200 update with the workaround active.

I have removed /etc/dracut.conf.d/noresume.conf containing

omit_dracutmodules+=" resume "

so that on the next kernel update the initrd should have the dracut resume module in it.

Expect a further report after this has been done.

Comment 20 Brian Morrison 2024-07-25 20:53:51 UTC
And the answer, with kernel-6.9.11-200, is that if I allow dracut to build initrd with the resume module included, then the gdm chooser never appears.

Is this meant to work? I realise that I am not actually resuming, but if I wanted to hibernate and resume then it should work shouldn't it?

Comment 21 Lukáš Nykrýn 2024-08-08 11:41:06 UTC
Can anyone here attach the logs from the broken boot? Ideally, with debug and rd.debug added to the kernel cmdline. To get a shell you should be able to use debug shell [1], just also add rd.systemd.debug_shell to kernel cmdline, press ctrl+alt+f9 and save somewhere the output of journalctl -b (probably the easiest way is to attach a USB drive and mount it manually). 

[1] https://www.freedesktop.org/software/systemd/man/latest/systemd-debug-generator.html

Comment 22 Aoife Moloney 2025-04-25 10:45:52 UTC
This message is a reminder that Fedora Linux 40 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 40 on 2025-05-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '40'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 40 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 23 Aoife Moloney 2025-05-20 08:48:49 UTC
Fedora Linux 40 entered end-of-life (EOL) status on 2025-05-13.

Fedora Linux 40 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.