Bug 2174645

Summary:	Failed to start Flush Journal to Persistent Storage (systemd-journal-flush.service)
Product:	Red Hat Enterprise Linux 8	Reporter:	libhe
Component:	systemd	Assignee:	David Tardon <dtardon>
Status:	CLOSED ERRATA	QA Contact:	Frantisek Sumsal <fsumsal>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	8.8	CC:	andavis, atodorov, bnerickson87, dreua, dtardon, hidenori.i, hongzliu, hshuai, jabia, jhughes, johannes.schischke, jpazdziora, juzhou, kdreyer, libhe, lijin, linl, litian, lizhu, lucasbaile14, matthew.lesieur, mdeng, meili, mhayden, minl, mosvald, mpitt, nmunoz, obudai, pvlasin, qzhang, roman.aleksic, scott, smitterl, systemd-maint-list, systemd-maint, tdawson, troels, tyan, tzheng, vkuznets, vogt, xiliang, xuli, xxiong, yacao, ymao, yoyang, yuxisun
Target Milestone:	rc	Keywords:	CustomerScenariosInitiative, Regression, Triaged
Target Release:	8.8
Hardware:	All
OS:	Linux
Whiteboard:	CockpitTest
Fixed In Version:	systemd-239-74.el8_8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2176892 (view as bug list)		Environment:
Last Closed:	2023-05-16 09:07:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2129764, 2176892

Description libhe 2023-03-02 05:46:46 UTC

Description of problem:
Failed to start Flush Journal to Persistent Storage. 

Below is log from journalctl:

Mar 02 03:19:44 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: Starting Flush Journal to Persistent Storage...
Mar 02 03:21:15 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: systemd-journal-flush.service: start operation timed out>
Mar 02 03:21:15 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: systemd-journal-flush.service: Main process exited, code>
Mar 02 03:21:15 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: systemd-journal-flush.service: Failed with result 'timeo>
Mar 02 03:21:15 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: Failed to start Flush Journal to Persistent Storage.


RHEL Version:
RHEL8.8(4.18.0-477.el8)
RHEL-8.8.0-20230301.1

How reproducible:
100%

Steps to Reproduce:

1. Launch an RHEL guest with the latest RHEL-8.8 build.
2. Run 'systemctl status systemd-journal-flush.service'.

Actual results:
[ec2-user@ip-10-0-28-169 ~]$ systemctl status systemd-journal-flush.service
● systemd-journal-flush.service - Flush Journal to Persistent Storage
   Loaded: loaded (/usr/lib/systemd/system/systemd-journal-flush.service; static; vendor preset: disabled)
   Active: failed (Result: timeout) since Thu 2023-03-02 03:21:15 UTC; 14s ago
     Docs: man:systemd-journald.service(8)
           man:journald.conf(5)
  Process: 7229 ExecStart=/usr/bin/journalctl --flush (code=killed, signal=TERM)
 Main PID: 7229 (code=killed, signal=TERM)

Mar 02 03:19:44 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: Starting Flush Journal to Persistent Storage...
Mar 02 03:21:15 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: systemd-journal-flush.service: start operation timed out>
Mar 02 03:21:15 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: systemd-journal-flush.service: Main process exited, code>
Mar 02 03:21:15 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: systemd-journal-flush.service: Failed with result 'timeo>
Mar 02 03:21:15 ip-10-0-28-169.us-west-2.compute.internal systemd[1]: Failed to start Flush Journal to Persistent Storage.
Expected results:
Flush Journal service should be started successfully.

Additional info:
- This is observed on both x86_64 and aarch64
- It seems a regression from build RHEL-8.8.0-20230301.1

Comment 3 Jan Pazdziora (Red Hat) 2023-03-03 10:41:01 UTC

This is a regression in systemd-239-73.el8, compared to systemd-239-68.el8_7.4.

Comment 8 David Tardon 2023-03-07 09:28:24 UTC

Indeed, `journalctl --flush` gets stuck if /var/log/journal doesn't exist. (Also, a repeated call of `journalctl --relinquish-var` gets stuck, but that's a smaller issue.)

Comment 14 David Tardon 2023-03-15 07:28:59 UTC

*** Bug 2178393 has been marked as a duplicate of this bug. ***

Comment 18 David Tardon 2023-03-16 16:18:48 UTC

*** Bug 2178897 has been marked as a duplicate of this bug. ***

Comment 19 Scott Brown 2023-03-18 17:38:34 UTC

This is effective as an interim workaround:

sudo -s
cd /var/log
mkdir journal
chown root.systemd-journal journal
chmod 2755 journal

Comment 20 David Tardon 2023-03-20 11:36:31 UTC

*** Bug 2179327 has been marked as a duplicate of this bug. ***

Comment 22 David Tardon 2023-03-28 20:15:21 UTC

*** Bug 2182446 has been marked as a duplicate of this bug. ***

Comment 23 Gerald Vogt 2023-04-03 05:58:23 UTC

(In reply to Scott Brown from comment #19)
> This is effective as an interim workaround:
> 
> sudo -s
> cd /var/log
> mkdir journal
> chown root.systemd-journal journal
> chmod 2755 journal

I don't know if I would call this a workaround. This switches to persistent logging which is describe also in https://access.redhat.com/solutions/696893

This causes a change of behaviour and writes the journal to the disk with all implications.

This isn't necessary, however: except for the systemd failed service unit and the delay during startup due to this bug the system still works as intended and there is no need to switch to persistent logging. So, you can safely ignore this until is has been fixed or you could run

 # systemctl reset-failed systemd-journal-flush.service

to reset the failed unit (until the next reboot).

That's more like a workaround to me instead of avoiding the error message by writing the journal to disk now...

Comment 24 Scott Brown 2023-04-03 14:39:09 UTC

(In reply to Gerald Vogt from comment #23)
> (In reply to Scott Brown from comment #19)
> > This is effective as an interim workaround:
> > 
> > sudo -s
> > cd /var/log
> > mkdir journal
> > chown root.systemd-journal journal
> > chmod 2755 journal
> 
> I don't know if I would call this a workaround. This switches to persistent
> logging which is describe also in https://access.redhat.com/solutions/696893
> 
> This causes a change of behaviour and writes the journal to the disk with
> all implications.
> 
> This isn't necessary, however: except for the systemd failed service unit
> and the delay during startup due to this bug the system still works as
> intended and there is no need to switch to persistent logging. So, you can
> safely ignore this until is has been fixed or you could run
> 
>  # systemctl reset-failed systemd-journal-flush.service
> 
> to reset the failed unit (until the next reboot).
> 
> That's more like a workaround to me instead of avoiding the error message by
> writing the journal to disk now...

Alright, thanks for the clarification. I would rather have this arrangement in the interim than the disruptive 90 second wait on boot, but once the underlying defect is fixed, I assume anyone who did this can return to the original non-persistent ring buffer config by doing sudo rm -rf /var/log/journal (if that is what they want)?

Comment 26 Ken Dreyer (Red Hat) 2023-04-21 18:01:11 UTC

I hit this on CentOS Stream today (CentOS-Stream-GenericCloud-8-20230404.0 with systemd-239-73.el8). Would you please fix future bugs in CentOS Stream first before RHEL? This will save engineering resources when we all know where to look for the latest code.

Here's how I customized CentOS 8's image to have the latest systemd package (systemd-239-75.el8):

  virt-customize -v -x -a CentOS-Stream-GenericCloud-8-20230404.0.x86_64.qcow2 --run update-cloud-init.sh --selinux-relabel

And my update-systemd.sh script:

  #!/bin/bash
  # For https://bugzilla.redhat.com/show_bug.cgi?id=2174645
  #
  # Inject this with:
  # virt-customize -v -x -a CentOS-Stream-GenericCloud-8-20230404.0.x86_64.qcow2 --run update-systemd.sh --selinux-relabel
  set -eux
  cat >/etc/yum.repos.d/baseos-dev.repo <<EOL
  [baseos-development]
  name=BaseOS Development
  baseurl=https://composes.stream.centos.org/stream-8/development/latest-CentOS-Stream/compose/BaseOS/x86_64/os/
  gpgcheck=0
  enabled=1
  EOL

  dnf -y update systemd
  rm /etc/yum.repos.d/baseos-dev.repo

Comment 27 Martin Osvald 🛹 2023-04-25 09:15:03 UTC

*** Bug 2189428 has been marked as a duplicate of this bug. ***

Comment 28 Troy Dawson 2023-04-25 15:10:33 UTC

Just so people know, systemd-239-75.el8 has been built for CentOS Stream 8.  It has this fix in it.
I don't know why it was so long in gating (testing) but it got tagged into c8s-pending an hour or two after the weekly Stream 8 compose started.  So it didn't make it into this weeks CentOS Stream 8 release.
It will be in next weeks Stream 8 release.

Comment 32 errata-xmlrpc 2023-05-16 09:07:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2985