Bug 2282195 - systemd-homed stalls the Cloud image boot
Summary: systemd-homed stalls the Cloud image boot
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks: F41BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2024-05-21 12:23 UTC by František Zatloukal
Modified: 2024-05-23 15:02 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-05-23 15:02:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Pagure fedora-kiwi-descriptions pull-request 56 0 None None None 2024-05-22 15:30:42 UTC

Description František Zatloukal 2024-05-21 12:23:21 UTC
Description of problem:
It seems that the systemd-homed is stalling the boot process of Fedora Cloud images (maybe even some more variants).

Version-Release number of selected component (if applicable):
systemd-256~rc1^20240509git1781de1-2.fc41
systemd-255.5-1.fc41 (custom built, still hangs)

How reproducible:
Always

Steps to Reproduce:
1. Boot the current Rawhide nightly (image from 14th of May 2024 or newer)

Actual results:
Starting systemd-homed-firstboot.s…ce - First Boot Home Area Wizard...
‣ Please enter user name to create (empty to skip): [  OK  ] Started systemd-logind.service - User Login Management.

...

<boot gets stuck until EOL/enter key is sent>
No data entered, skipping.
[  OK  ] Finished systemd-homed-firstboot.s…vice - First Boot Home Area Wizard.

Expected results:
systemd-homed shouldn't stall the boot, at least on images where some other service (eg. cloud-init on Cloud images) handle the user creation.

Additional info:

Comment 1 Adam Williamson 2024-05-22 15:11:14 UTC
So the change here is https://github.com/systemd/systemd/commit/3ccadbce3358ba1db7ce5fa3f8dd17c627ffd93b . From the docs there it looks like we can probably pass `systemd.firstboot=off` to disable it.

Moving to cloud-init just to get it somewhere approximately correct, I'll try and file a PR for this.

Comment 2 Adam Williamson 2024-05-22 15:24:10 UTC
This might cause trouble for other things too, though.

* ARM disk images - these expect to run initial-setup on first boot to create a user account (it also has other capabilities)
* Generic install - on everything but the Workstation live, it is possible (by design) to install Fedora without creating a user account, and again, the intent in this case is that our own initial-setup should run on boot, which provides a user creation interface and some other capabilities

Given that this is a case where we have clear pre-existing behaviour which this feature unexpectedly changes, I think perhaps we ought to just turn it off in the systemd build? I think adopting it in Fedora might require a proper Change with full consideration of all existing paths that already cope with a system being booted with no user accounts.

Comment 3 Adam Williamson 2024-05-22 15:26:31 UTC
hmm, I just checked the openQA tests and it looks like the ARM image deploy and install_no_user tests are passing; it seems like initial-setup somehow 'wins out' over this interface and still runs as expected.

Comment 4 Adam Williamson 2024-05-22 16:05:40 UTC
PR is merged, we'll see if it works with the next compose.

Comment 5 Zbigniew Jędrzejewski-Szmek 2024-05-22 16:14:39 UTC
Hmm, so this issue appears because we hit a cornercase:
those firstboot services are conditionalized on "first boot", i.e. that /etc is mostly empty.
We have other things conditionalized on systemd-firstboot, in particular systemd-firstboot.service.
But systemd-firstboot.service only prompts for three things (locale, timezone, root password),
and I assume that all those are already populated somehow in those test images and the boot
doesn't block. OTOH, systemd-homed-firstboot.service prompts if there is no regular user defined.
I assume that in the test images, only root user is created, so we hit the prompt.
I reviewed the patch adding the service, and I was thinking if there'd be unexpected blockage, 
but it's all conditionalized on "first boot", I thought we'd be fine.

I think that Adam's patch is a good way to resolve this. We certainly don't want to block
on any interactive stuff in CI.

Comment 6 Adam Williamson 2024-05-23 15:02:01 UTC
It looks like the kiwi descriptions did the trick for Cloud images, Cloud tests all passed on today's Rawhide. It's possible there are still issues caused by this on other paths, though.


Note You need to log in before you can comment on or make changes to this bug.