Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2068495

Summary: System enters Emergency mode due to ostree-remount.service failing (hitting start limit)
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd-maint
Status: CLOSED DUPLICATE QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: high    
Version: 8.5CC: dtardon, systemd-maint-list
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-25 15:23:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Journal showing failing boot with systemd-239-51.el8_5.5
none
Journal showing successful boot with systemd-239-51.el8_5.3 none

Description Renaud Métrich 2022-03-25 14:05:54 UTC
Description of problem:

On a RHEL8.5 system with ostree package install, the ostree-remount.service makes the system enter Emergency target even though the service is a no-op (due to not having "ostree" on the kernel command line).

This happens when GUI related targets are re-enqueued multiple times due to failures, e.g. bluetooth.target, sound.target (in case of multiple sound cards).

The targets pull again ostree-remount.service and after 6 or 7 retries ostree-remount.service fires OnFailure which leads to entering Emergency.

Example using a QEMU/KVM reproducer where I configured 6 sound cards (2 of each kind), booted with debugging enabled:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
systemd[1]: ostree-remount.service: Installed new job ostree-remount.service/start as 148
systemd[1]: ostree-remount.service: ConditionKernelCommandLine=ostree failed.
systemd[1]: ostree-remount.service: Starting requested but condition failed. Not starting unit.
systemd[1]: ostree-remount.service: Job ostree-remount.service/start finished, result=done
systemd[1]: ostree-remount.service: Installed new job ostree-remount.service/start as 347
systemd[1]: ostree-remount.service: ConditionKernelCommandLine=ostree failed.
systemd[1]: ostree-remount.service: Starting requested but condition failed. Not starting unit.
systemd[1]: ostree-remount.service: Job ostree-remount.service/start finished, result=done
systemd[1]: ostree-remount.service: Installed new job ostree-remount.service/start as 422
systemd[1]: ostree-remount.service: ConditionKernelCommandLine=ostree failed.
systemd[1]: ostree-remount.service: Starting requested but condition failed. Not starting unit.
systemd[1]: ostree-remount.service: Job ostree-remount.service/start finished, result=done
systemd[1]: ostree-remount.service: Installed new job ostree-remount.service/start as 514
systemd[1]: ostree-remount.service: Merged into installed job ostree-remount.service/start as 514
systemd[1]: ostree-remount.service: Merged into installed job ostree-remount.service/start as 514
systemd[1]: ostree-remount.service: Merged into installed job ostree-remount.service/start as 514
systemd[1]: ostree-remount.service: ConditionKernelCommandLine=ostree failed.
systemd[1]: ostree-remount.service: Starting requested but condition failed. Not starting unit.
systemd[1]: ostree-remount.service: Job ostree-remount.service/start finished, result=done
systemd[1]: ostree-remount.service: Installed new job ostree-remount.service/start as 888
systemd[1]: ostree-remount.service: ConditionKernelCommandLine=ostree failed.
systemd[1]: ostree-remount.service: Starting requested but condition failed. Not starting unit.
systemd[1]: ostree-remount.service: Job ostree-remount.service/start finished, result=done
systemd[1]: ostree-remount.service: Installed new job ostree-remount.service/start as 981
systemd[1]: ostree-remount.service: Start request repeated too quickly.
systemd[1]: ostree-remount.service: Failed with result 'start-limit-hit'.
systemd[1]: ostree-remount.service: Changed dead -> failed
systemd[1]: ostree-remount.service: Job ostree-remount.service/start finished, result=failed
systemd[1]: ostree-remount.service: Unit entered failed state.
systemd[1]: ostree-remount.service: Triggering OnFailure= dependencies.

---> HERE OnFailure triggering

systemd[1]: ostree-remount.service: Changed dead -> failed
systemd[1]: ostree-remount.service: Installed new job ostree-remount.service/start as 1228
systemd[1]: ostree-remount.service: ConditionKernelCommandLine=ostree failed.
systemd[1]: ostree-remount.service: Starting requested but condition failed. Not starting unit.
systemd[1]: ostree-remount.service: Job ostree-remount.service/start finished, result=done
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Attaching the full journal. Below is the excerpt showing the targets being re-enqueued due to sound cards initialization:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Mar 25 14:29:37.718829 vm-workstation84 systemd[1]: initrd.target: Trying to enqueue job initrd.target/start/isolate
Mar 25 14:29:37.836965 vm-workstation84 systemd[1]: sys-fs-fuse-connections.mount: Trying to enqueue job sys-fs-fuse-connections.mount/start/fail
Mar 25 14:29:37.861212 vm-workstation84 systemd[1]: sys-kernel-config.mount: Trying to enqueue job sys-kernel-config.mount/start/fail
Mar 25 14:29:38.084474 vm-workstation84 systemd[1]: lvm2-pvscan@252:3.service: Trying to enqueue job lvm2-pvscan@252:3.service/start/fail
Mar 25 14:29:39.070500 vm-workstation84 systemd[1]: initrd-fs.target: Trying to enqueue job initrd-fs.target/start/replace
Mar 25 14:29:39.075523 vm-workstation84 systemd[1]: initrd-cleanup.service: Trying to enqueue job initrd-cleanup.service/start/replace
Mar 25 14:29:39.129489 vm-workstation84 systemd[1]: initrd-switch-root.target: Trying to enqueue job initrd-switch-root.target/start/isolate
Mar 25 14:29:39.684256 vm-workstation84 systemd[1]: graphical.target: Trying to enqueue job graphical.target/start/isolate
Mar 25 14:29:39.692033 vm-workstation84 systemd[1]: systemd-journald.service: Trying to enqueue job systemd-journald.service/restart/replace
Mar 25 14:29:40.016500 vm-workstation84 systemd[1]: sys-kernel-config.mount: Trying to enqueue job sys-kernel-config.mount/start/fail
Mar 25 14:29:40.020613 vm-workstation84 systemd[1]: sys-fs-fuse-connections.mount: Trying to enqueue job sys-fs-fuse-connections.mount/start/fail
Mar 25 14:29:40.075700 vm-workstation84 systemd[1]: qemu-guest-agent.service: Trying to enqueue job qemu-guest-agent.service/start/fail
Mar 25 14:29:40.085866 vm-workstation84 systemd[1]: spice-vdagentd.socket: Trying to enqueue job spice-vdagentd.socket/start/fail
Mar 25 14:29:40.143828 vm-workstation84 systemd[1]: lvm2-pvscan@252:3.service: Trying to enqueue job lvm2-pvscan@252:3.service/start/fail
Mar 25 14:29:40.440335 vm-workstation84 systemd[1]: sound.target: Trying to enqueue job sound.target/start/fail
Mar 25 14:29:40.441277 vm-workstation84 systemd[1]: sound.target: Trying to enqueue job sound.target/start/fail
Mar 25 14:29:40.442296 vm-workstation84 systemd[1]: sound.target: Trying to enqueue job sound.target/start/fail
Mar 25 14:29:40.444293 vm-workstation84 systemd[1]: sound.target: Trying to enqueue job sound.target/start/fail
Mar 25 14:29:42.370219 vm-workstation84 systemd[1]: sound.target: Trying to enqueue job sound.target/start/fail
Mar 25 14:29:42.383957 vm-workstation84 systemd[1]: sound.target: Trying to enqueue job sound.target/start/fail
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

For some reason the issue doesn't happen with systemd-239-51.el8_5.3, which doesn't have the fix for BZ #2037395.
Probably with this older systemd, the rate limit was not checked at this point in the boot (0677-core-Check-unit-start-rate-limiting-earlier.patch seems to have introduced the regression).


Version-Release number of selected component (if applicable):

systemd-239-51.el8_5.5


How reproducible:

Always on my QEMU/KVM but only in full debugging mode ("systemd.log_level=debug systemd.log_target=kmsg log_buf_len=15M printk.devkmsg=on")
The reason is likely a race condition due to increased slowness.

Steps to Reproduce:
1. Install a QEMU/KVM system with Graphical User Interface
2. Add 6 sound cards (2 "ac97", 2 "ich6", 2 "ich9")
3. Configure 2 CPUs
4. Boot with "systemd.log_level=debug systemd.log_target=kmsg log_buf_len=15M printk.devkmsg=on"


Actual results:

Emergency entered

Expected results:

No failure since ostree-remount.service is a no-op so should not trigger start-limit at all

Additional info:

This happens for a real customer system having 2 sound cards, 1 bluetooth and 1 softraid.

Comment 1 Renaud Métrich 2022-03-25 14:09:46 UTC
Created attachment 1868338 [details]
Journal showing failing boot with systemd-239-51.el8_5.5

Comment 2 Renaud Métrich 2022-03-25 14:10:17 UTC
Created attachment 1868339 [details]
Journal showing successful boot with systemd-239-51.el8_5.3

Comment 3 David Tardon 2022-03-25 15:23:07 UTC

*** This bug has been marked as a duplicate of bug 2065322 ***