1247768 – No systemd journal with readonly-root TEMPORARY_STATE

Bug 1247768 - No systemd journal with readonly-root TEMPORARY_STATE

Summary: No systemd journal with readonly-root TEMPORARY_STATE

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	initscripts
Sub Component:
Version:	21
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Lukáš Nykrýn
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:	874631
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-28 20:21 UTC by John Florian
Modified:	2015-12-02 18:14 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-12-02 14:41:25 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch to run fedora-readonly.service before systemd-journald.service (644 bytes, patch) 2015-07-30 15:04 UTC, John Florian	no flags	Details \| Diff
attempt #2 (649 bytes, patch) 2015-07-30 19:55 UTC, John Florian	no flags	Details \| Diff
View All

Description John Florian 2015-07-28 20:21:06 UTC

Description of problem:

In /etc/sysconfig/readonly-root I set TEMPORARY_STATE=yes so that various paths listed in /etc/rwtab get mounted as tmpfs, avoiding writes to the underlying disk in those paths.  Something has changed since F20 that causes the systemd-journald.service to start before those mounts are effected.  Once the tmpfs mounts occur the journal files are obscured and journalctl reports something like "No journal files found."

This support is critical for our stateless, livecd based, embedded deployments.  Without this support all such writes go to the livecd overlay which would be quickly exhausted by logging and other write activity, consequently causing the OS to crash.


Version-Release number of selected component (if applicable):
initscripts-9.56.1-7.fc21
systemd-216-25.fc21


How reproducible:
always

Comment 1 John Florian 2015-07-29 15:52:46 UTC

I have confirmed that this same problem exists with F22.  I have not yet uncovered what changed exactly to bring this problem about.  Still looking...

Comment 2 Zbigniew Jędrzejewski-Szmek 2015-07-29 17:39:58 UTC

You should be able to work around this by setting Storage=volative in journald.conf, or by removing /var/log/journal, which should have the same effect.

Read-only as implemented in initscripts is obsolete, and buggy. I wouldn't hold my breath for a proper fix.

Comment 3 John Florian 2015-07-29 19:37:49 UTC

Thanks Zbigniew, that work around does get me a journal.  I'm worried that more than the journal is affected.  Do you know what changed between F20 and F21 to bring this on?  It would seem that either systemd is starting the journal earlier now or that the tmpfs mounts are occurring later.

I know that the stateless systemd stuff will eventually make this obsolete and I really do look forward to that day.  However until that's ready I'd sure hate to see this existing feature die.  Is even it possible to use the systemd stateless features in Fedora yet?

Comment 4 Zbigniew Jędrzejewski-Szmek 2015-07-29 19:57:10 UTC

The way that journal is flashed to /var was reimplemented, and there were various new dependencies (e.g. tmpfiles now runs after module-load), etc. What you are observing could be a timing issue. From looking at fedora-readonly.service I don't see a direct After/Before relationship with the journal units.

Systemd generally works with read-only root, but there's some noise from things which cannot write to the filesystem as expected. It'd need some polishing.

Comment 5 John Florian 2015-07-29 20:05:25 UTC

Thank you immensely for that feedback.  Let me run with some poking at After/Before on fedora-readonly.service and see if I can resolve my issue.  I'll report back here my findings and you can either send me back to the drawing board or consider applying a patch.

Comment 6 John Florian 2015-07-30 15:04:29 UTC

Created attachment 1057715 [details]
Patch to run fedora-readonly.service before systemd-journald.service

This patch resolves this bug for me.

Comment 7 Zbigniew Jędrzejewski-Szmek 2015-07-30 15:49:46 UTC

This patch does not look good. If anything, the flush to /var should be delayed, not the start of journald, because this can cause boot to deadlock. (A good test would be running with systemd.log_level=debug on the kernel commandline. This produces more logs.)

Comment 8 John Florian 2015-07-30 19:55:22 UTC

Created attachment 1057830 [details]
attempt #2

I'm having trouble imagining the deadlock, but I agree that delaying only the flush to /var would make more sense if for no reason other than the tighter scope.  I've revised the patch (attached) based on what I think you're suggesting and was surprised it didn't work for me.  From what I can see of the debug messages, it would appear that fedora-readonly does finish before systemd-journal-flush starts, but if that's true I don't see why this wouldn't work.  Here's what I believe are the relevant bits of the log (which I could only get by first unmounting the tmpfs obscuring the journal in /var/log):

# journalctl -o short-monotonic | egrep '(fedora-readonly|systemd-journal-flush).service'
[   13.484116] localhost systemd[1]: Installed new job fedora-readonly.service/start as 108
[   13.524070] localhost systemd[1]: Installed new job systemd-journal-flush.service/start as 116
[   14.400919] localhost systemd[1]: Merged into installed job systemd-journal-flush.service/restart as 116
[   14.477356] localhost systemd[1]: Job systemd-journal-flush.service/restart finished, result=done
[   14.477412] localhost systemd[1]: Converting job systemd-journal-flush.service/restart -> systemd-journal-flush.service/start
[   25.730829] localhost systemd[1]: fedora-readonly.service changed dead -> start
[   30.492189] localhost systemd[1]: Merged into installed job fedora-readonly.service/start as 108
[   30.492313] localhost systemd[1]: Merged into installed job systemd-journal-flush.service/start as 116
[   31.625587] localhost systemd[1]: Child 463 belongs to fedora-readonly.service
[   31.655836] localhost systemd[1]: fedora-readonly.service: main process exited, code=exited, status=0/SUCCESS
[   31.656054] localhost systemd[1]: fedora-readonly.service changed start -> exited
[   31.656415] localhost systemd[1]: Job fedora-readonly.service/start finished, result=done
[   31.704257] localhost systemd[1]: fedora-readonly.service: cgroup is empty
[   32.565674] localhost systemd[1]: fedora-readonly.service: cgroup is empty
[   31.650531] localhost systemd[1]: systemd-journal-flush.service changed dead -> start
[   32.888310] localhost systemd[1]: Failed to send unit change signal for systemd-journal-flush.service: Transport endpoint is not connected
[   32.969283] localhost systemd[1]: Child 533 belongs to systemd-journal-flush.service
[   32.969412] localhost systemd[1]: systemd-journal-flush.service: main process exited, code=exited, status=0/SUCCESS
[   32.969550] localhost systemd[1]: systemd-journal-flush.service changed start -> exited
[   32.969692] localhost systemd[1]: Job systemd-journal-flush.service/start finished, result=done
[   32.970274] localhost systemd[1]: systemd-journal-flush.service: cgroup is empty

Comment 9 Zbigniew Jędrzejewski-Szmek 2015-07-31 02:21:41 UTC

My bad. journald will try run flush_to_var() when starting. So if /var/log/ is writable when it starts, it will write there.

But still, I think that the solution from comment #c2 is better. Mounting a tmpfs on /var/log and then copying the journals between one directory and the other is wasteful and unnecessary.

Comment 10 John Florian 2015-07-31 12:54:13 UTC

Unfortunately storage=volatile isn't a good solution for me.  Many of our embedded deployments where this is used have little RAM and I previously found that with a volatile journal early boot messages can be lost before I can log in to determine the cause of a problem.  With a storage=persistent, even if backed by tmpfs, things have worked out much better.  I realize RAM is RAM and that this is merely a proportion of allocations thing, but the last time I tried the journald config wouldn't allow me to express allocations as percentages[0] and that would be the only way I can think of to find a *single* config solution that fits a wide variety of hardware platforms having anywhere from 1G to 32G+ of RAM.  On the other hand tmpfs defaults to using half the RAM and journald adjusts itself to that amount, whatever it may be, quite nicely.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=874631

I'm certainly open to suggestions.  My priorities are, in order:

 1. rock solid reliability
 2. retention of as much log content as possible while dynamically fitting the available resources
 3. speed and efficiency

Comment 11 Zbigniew Jędrzejewski-Szmek 2015-07-31 13:19:31 UTC

OK, reopened 874631.

Comment 12 Fedora End Of Life 2015-11-04 10:59:15 UTC

This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2015-12-02 14:41:34 UTC

Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.