Bug 2121197

Summary: Failing services in new Fedora IoT installations with RO /sysroot
Product: [Fedora] Fedora Reporter: Paul Whalen <pwhalen>
Component: IoTAssignee: Peter Robinson <pbrobinson>
Status: CLOSED RAWHIDE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 37CC: akoutsou, awilliam, bcotton, dustymabe, jmarrero, jonathan, lucab, miabbott, pbrobinson, philip.wyett, robatino, robertthomasfairley, travier, walters
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: openqa
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-29 11:01:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269538, 2009537, 2060976, 2153434    

Description Paul Whalen 2022-08-24 19:13:15 UTC
Description of problem:

Recent installations of Fedora 37 IoT have a long list of failing services:

systemctl --all --failed
  UNIT                                LOAD   ACTIVE SUB    DESCRIPTION         >
● dbus-broker.service                 loaded failed failed D-Bus System Message>
● greenboot-grub2-set-counter.service loaded failed failed Set grub2 boot count>
● greenboot-healthcheck.service       loaded failed failed greenboot Health Che>
● NetworkManager.service              loaded failed failed Network Manager
● polkit.service                      loaded failed failed Authorization Manager
● redboot-auto-reboot.service         loaded failed failed Reboot on red boot s>
● rpm-ostreed.service                 loaded failed failed rpm-ostree System Ma>
● systemd-oomd.service                loaded failed failed Userspace Out-Of-Mem>
● systemd-resolved.service            loaded failed failed Network Name Resolut>
● systemd-userdbd.service             loaded failed failed User Database Manage>
● systemd-oomd.socket                 loaded failed failed Userspace Out-Of-Mem>
● systemd-userdbd.socket              loaded failed failed User Database Manage>

mount | grep /sysroot
/dev/mapper/fedora--iot_fedora-root on /sysroot type ext4 (ro,relatime,seclabel)

Remounting /sysroot as RW I can start all services again, this also does not affect upgrades from F36 where sysroot remains RW. 

Version-Release number of selected component (if applicable):
rpm-ostree-2022.12-4.fc37.aarch64
ostree-2022.5-2.fc37.aarch64
anaconda-37.12.1-1.fc37

How reproducible:
Everytime

Comment 1 Fedora Blocker Bugs Application 2022-08-25 17:48:18 UTC
Proposed as a Blocker for 37-final by Fedora user coremodule using the blocker tracking app because:

 Proposing as an F37 blocker as it appears to violate the following criterion:

All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present.
 
https://fedoraproject.org/wiki/Fedora_37_Final_Release_Criteria#System_services

Comment 2 Adam Williamson 2022-08-25 17:53:40 UTC
yeah, openQA has been hitting this for days. I mentioned it in IRC but didn't see any followup at the time.

This prevents all the openQA tests from working, so it might be a Beta blocker, really. Let's throw it on that list for now and I'll see if it's a case of "beta functionality really doesn't work" or just "the noise throws openQA off".

Comment 3 Ben Cotton 2022-08-25 20:54:36 UTC
This sounds like it could be a side effect of https://fedoraproject.org/wiki/Changes/Silverblue_Kinoite_readonly_sysroot

Setting to block 2060976, the tracker for that Change.

Comment 4 Adam Williamson 2022-08-25 21:08:38 UTC
Well, the fact that the change affects IoT isn't a side effect, it's in the description:

"This change applies to new and existing installations of Fedora Silverblue and Kinoite and only to new installations of Fedora IoT."

But the fact that it breaks everything is a problem, yeah. :D

Comment 5 Colin Walters 2022-08-25 21:18:19 UTC
Do you have `rw` on the kernel command line?

Comment 6 Adam Williamson 2022-08-25 22:01:15 UTC
Yeah, it does have that there. I didn't put it there, though. It's like that out of the "box" (the IoT dvd-ostree install image, in my case). Just do a fresh install of https://kojipkgs.fedoraproject.org/compose/iot/Fedora-IoT-37-20220825.0/compose/IoT/x86_64/iso/Fedora-IoT-ostree-x86_64-37-20220825.0.iso without doing anything unusual, boot it, and you have 'rw' in cmdline and hit this bug.

If I take that out of the cmdline or change it to 'ro', boot loops with ostree-prepare-root.service failing.

openQA tests fail because they try to switch to a different VT, and there is no console running on any VT besides 1 (probably the services that should run one fail to start). This is a clear violation of Beta criterion "A system installed without a graphical package set must boot to a working login prompt without any unintended user intervention, and all virtual consoles intended to provide a working login prompt must do so.", so that supports the Beta blocker nomination.

Comment 7 Adam Williamson 2022-08-25 22:02:32 UTC
Yeah, if I look at logs after switching to VT2 and back to VT1, I see "Failed to start autovt: Transport endpoint is not connected".

Comment 8 Colin Walters 2022-08-26 18:08:17 UTC
Try setting `tmp-is-dir: true` in the manifest in https://pagure.io/fedora-iot/ostree/blob/main/f/fedora-iot-base.json

Comment 9 Adam Williamson 2022-08-26 18:18:34 UTC
Peter, can you try Colin's suggestion?

Comment 10 Peter Robinson 2022-08-27 08:34:04 UTC
Pushed for rawhide and kicked off a compose

Comment 11 Adam Williamson 2022-08-27 16:19:43 UTC
That looks good for Rawhide, only two failed tests now:

https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=38&build=Fedora-IoT-38-20220827.0&groupid=1

I'll look into those. We'll need the change for F37 too.

Comment 12 Adam Williamson 2022-08-27 17:05:05 UTC
Remaining failed tests both look to be caused by https://bugzilla.redhat.com/show_bug.cgi?id=2121944 .

Comment 13 Peter Robinson 2022-08-29 11:01:45 UTC
Applied to F-37 too, thanks Adam and Colin.

Comment 14 Adam Williamson 2022-08-29 15:26:17 UTC
confirmed, we got an F37 compose that is in the same state as Rawhide (most things work, https://bugzilla.redhat.com/show_bug.cgi?id=2121944 is an issue).