2164594 – Systems that update from systemd-252.4-4.fc38 to systemd-253~rc1-1.fc38 fail to boot

Bug 2164594 - Systems that update from systemd-252.4-4.fc38 to systemd-253~rc1-1.fc38 fail to boot

Summary: Systems that update from systemd-252.4-4.fc38 to systemd-253~rc1-1.fc38 fail ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	openqa
Depends On:
Blocks:	F38BetaBlocker
TreeView+	depends on / blocked

Reported:	2023-01-25 18:53 UTC by Adam Williamson
Modified:	2023-01-26 11:02 UTC (History)
CC List:	11 users (show)
Fixed In Version:	systemd-253~rc1-3.fc38
Clone Of:
Environment:
Last Closed:	2023-01-26 10:51:09 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
debug boot log (1.32 MB, text/plain) 2023-01-25 19:21 UTC, Adam Williamson	no flags	Details
unexpected message from udev-builtin-keyboard (18.40 KB, image/png) 2023-01-25 22:47 UTC, Zbigniew Jędrzejewski-Szmek	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	systemd systemd issues 26216	0	None	open	PID1 enters infinite loop when trying to stop socket with incoming traffic	2023-01-26 11:02:13 UTC

Description Adam Williamson 2023-01-25 18:53:04 UTC

In openQA testing, and also in manual testing on a local VM, systems installed with systemd-252.4-4.fc38 (e.g. from https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20230123.n.0 ) and then updated to systemd-253~rc1-1.fc38 fail to boot. Systems installed directly with systemd-253~rc1-1.fc38 seem to boot OK (in openQA, at least, haven't verified that locally yet).

With default boot options, the boot just gets stuck at the bootsplash. Hitting Esc shows it stuck at "Stopped initrd-switch-root.service - Switch Root.". With systemd.log_level=debug systemd.log_target=console , I see an eternal loop of messages from systemd-journald-audit.socket - "Incoming traffic" and then "Suppressing connection request since unit stop is scheduled." Not sure if there was any more useful message before that.

This was easy to reproduce for me: run a minimal netinst using https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20230123.n.0/compose/Everything/x86_64/os/ as the install source , then update only systemd to systemd-253~rc1-1.fc38 and reboot. I expect this bug will hit anyone updating an existing Rawhide install to the new systemd.

Proposing as an F38 Beta blocker as it probably affects upgrades from F37 too (will try and confirm that in a bit).

Comment 1 Adam Williamson 2023-01-25 19:16:12 UTC

With logs redirected somewhere I can see 'em but not at debug level, I see 'systemd-journald.service: Scheduled restart job, restart counter is at 1.', then repeating 'Looping too fast. Throttling execution a little.' messages.

Comment 2 darrell pfeifer 2023-01-25 19:21:07 UTC

In rescue mode it gets stuck on fsck

Adding no fsck parameter gets to

Mounting sysroot mount - /sysroot ...

Then hangs

Comment 3 Adam Williamson 2023-01-25 19:21:40 UTC

Created attachment 1940497 [details]
debug boot log

OK, here's a full debug-level boot log up to the point where the 'systemd-journald-audit.socket' messages start looping.

Comment 4 David Tardon 2023-01-25 20:20:28 UTC

I think initrd needs to be regenerated, otherwise the socket is still started from there.

Comment 5 David Tardon 2023-01-25 20:28:27 UTC

(In reply to David Tardon from comment #4)
> I think initrd needs to be regenerated, otherwise the socket is still
> started from there.

No, scratch that. That should workaround the problem, but it's not a fix. The socket should be enabled via presets after the update, but it looks like it isn't?

Comment 6 Zbigniew Jędrzejewski-Szmek 2023-01-25 22:46:15 UTC

Hmm, so I'm testing with a VM here, and it fails reliably with kernel-6.2.0-0.rc4.20230120gitd368967cb103.35.fc38.x86_64. But with kernel-6.1.0-0.rc2.21.fc38.x86_64, things work fine.
With the 6.2 kernel, I'm getting an OOPS with a warning, triggered by udevd, about some mapping being done wrong (I'll try to capture it properly later). And very strange messages from udev, that stink of memory corruption.
/dev/vda is not detected at all by the kernel.

I'll try to figure out what is going on tomorrow. It's close to midnight and I need catch a nap.

Comment 7 Zbigniew Jędrzejewski-Szmek 2023-01-25 22:47:37 UTC

Created attachment 1940523 [details]
unexpected message from udev-builtin-keyboard

Comment 8 darrell pfeifer 2023-01-26 06:40:44 UTC

Booting a 6 1 kernel didn't work for me. I tried downgrading

1) boot a live usb
2) download previous systemd
3) mount the old root via gnome disks and chroot to it
4) verify dnf said it saw the new version
5) rpm -Uvh --oldpackage the older systemd version

When I reboot the bad RC version is still there. I'm sure I've done this in the past. What step did I miss?

Comment 9 Adam Williamson 2023-01-26 07:14:12 UTC

as long as you mounted the right partition and chroot'ed properly, that should be right...maybe you missed a step, just try again? I tend to use dnf downgrade rather than rpm, but it shouldn't matter. oh, and you'll want to do all subpackages of systemd and downgrade them all together; I usually use `koji download-build --arch=x86_64 --arch=noarch systemd-252.4-4.fc38` (or whatever package and arch), then `dnf downgrade *.rpm`.

Comment 10 Fedora Update System 2023-01-26 10:47:41 UTC

FEDORA-2023-326cfb9cf8 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-326cfb9cf8

Comment 11 Fedora Update System 2023-01-26 10:51:09 UTC

FEDORA-2023-326cfb9cf8 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 12 Zbigniew Jędrzejewski-Szmek 2023-01-26 11:02:13 UTC

https://github.com/systemd/systemd/issues/26216 is an upstream bug about PID1 handling this badly.

Note You need to log in before you can comment on or make changes to this bug.