Red Hat Bugzilla – Bug 1247768
No systemd journal with readonly-root TEMPORARY_STATE
Last modified: 2015-12-02 13:14:46 EST
Description of problem:
In /etc/sysconfig/readonly-root I set TEMPORARY_STATE=yes so that various paths listed in /etc/rwtab get mounted as tmpfs, avoiding writes to the underlying disk in those paths. Something has changed since F20 that causes the systemd-journald.service to start before those mounts are effected. Once the tmpfs mounts occur the journal files are obscured and journalctl reports something like "No journal files found."
This support is critical for our stateless, livecd based, embedded deployments. Without this support all such writes go to the livecd overlay which would be quickly exhausted by logging and other write activity, consequently causing the OS to crash.
Version-Release number of selected component (if applicable):
I have confirmed that this same problem exists with F22. I have not yet uncovered what changed exactly to bring this problem about. Still looking...
You should be able to work around this by setting Storage=volative in journald.conf, or by removing /var/log/journal, which should have the same effect.
Read-only as implemented in initscripts is obsolete, and buggy. I wouldn't hold my breath for a proper fix.
Thanks Zbigniew, that work around does get me a journal. I'm worried that more than the journal is affected. Do you know what changed between F20 and F21 to bring this on? It would seem that either systemd is starting the journal earlier now or that the tmpfs mounts are occurring later.
I know that the stateless systemd stuff will eventually make this obsolete and I really do look forward to that day. However until that's ready I'd sure hate to see this existing feature die. Is even it possible to use the systemd stateless features in Fedora yet?
The way that journal is flashed to /var was reimplemented, and there were various new dependencies (e.g. tmpfiles now runs after module-load), etc. What you are observing could be a timing issue. From looking at fedora-readonly.service I don't see a direct After/Before relationship with the journal units.
Systemd generally works with read-only root, but there's some noise from things which cannot write to the filesystem as expected. It'd need some polishing.
Thank you immensely for that feedback. Let me run with some poking at After/Before on fedora-readonly.service and see if I can resolve my issue. I'll report back here my findings and you can either send me back to the drawing board or consider applying a patch.
Created attachment 1057715 [details]
Patch to run fedora-readonly.service before systemd-journald.service
This patch resolves this bug for me.
This patch does not look good. If anything, the flush to /var should be delayed, not the start of journald, because this can cause boot to deadlock. (A good test would be running with systemd.log_level=debug on the kernel commandline. This produces more logs.)
Created attachment 1057830 [details]
I'm having trouble imagining the deadlock, but I agree that delaying only the flush to /var would make more sense if for no reason other than the tighter scope. I've revised the patch (attached) based on what I think you're suggesting and was surprised it didn't work for me. From what I can see of the debug messages, it would appear that fedora-readonly does finish before systemd-journal-flush starts, but if that's true I don't see why this wouldn't work. Here's what I believe are the relevant bits of the log (which I could only get by first unmounting the tmpfs obscuring the journal in /var/log):
# journalctl -o short-monotonic | egrep '(fedora-readonly|systemd-journal-flush).service'
[ 13.484116] localhost systemd: Installed new job fedora-readonly.service/start as 108
[ 13.524070] localhost systemd: Installed new job systemd-journal-flush.service/start as 116
[ 14.400919] localhost systemd: Merged into installed job systemd-journal-flush.service/restart as 116
[ 14.477356] localhost systemd: Job systemd-journal-flush.service/restart finished, result=done
[ 14.477412] localhost systemd: Converting job systemd-journal-flush.service/restart -> systemd-journal-flush.service/start
[ 25.730829] localhost systemd: fedora-readonly.service changed dead -> start
[ 30.492189] localhost systemd: Merged into installed job fedora-readonly.service/start as 108
[ 30.492313] localhost systemd: Merged into installed job systemd-journal-flush.service/start as 116
[ 31.625587] localhost systemd: Child 463 belongs to fedora-readonly.service
[ 31.655836] localhost systemd: fedora-readonly.service: main process exited, code=exited, status=0/SUCCESS
[ 31.656054] localhost systemd: fedora-readonly.service changed start -> exited
[ 31.656415] localhost systemd: Job fedora-readonly.service/start finished, result=done
[ 31.704257] localhost systemd: fedora-readonly.service: cgroup is empty
[ 32.565674] localhost systemd: fedora-readonly.service: cgroup is empty
[ 31.650531] localhost systemd: systemd-journal-flush.service changed dead -> start
[ 32.888310] localhost systemd: Failed to send unit change signal for systemd-journal-flush.service: Transport endpoint is not connected
[ 32.969283] localhost systemd: Child 533 belongs to systemd-journal-flush.service
[ 32.969412] localhost systemd: systemd-journal-flush.service: main process exited, code=exited, status=0/SUCCESS
[ 32.969550] localhost systemd: systemd-journal-flush.service changed start -> exited
[ 32.969692] localhost systemd: Job systemd-journal-flush.service/start finished, result=done
[ 32.970274] localhost systemd: systemd-journal-flush.service: cgroup is empty
My bad. journald will try run flush_to_var() when starting. So if /var/log/ is writable when it starts, it will write there.
But still, I think that the solution from comment #c2 is better. Mounting a tmpfs on /var/log and then copying the journals between one directory and the other is wasteful and unnecessary.
Unfortunately storage=volatile isn't a good solution for me. Many of our embedded deployments where this is used have little RAM and I previously found that with a volatile journal early boot messages can be lost before I can log in to determine the cause of a problem. With a storage=persistent, even if backed by tmpfs, things have worked out much better. I realize RAM is RAM and that this is merely a proportion of allocations thing, but the last time I tried the journald config wouldn't allow me to express allocations as percentages and that would be the only way I can think of to find a *single* config solution that fits a wide variety of hardware platforms having anywhere from 1G to 32G+ of RAM. On the other hand tmpfs defaults to using half the RAM and journald adjusts itself to that amount, whatever it may be, quite nicely.
I'm certainly open to suggestions. My priorities are, in order:
1. rock solid reliability
2. retention of as much log content as possible while dynamically fitting the available resources
3. speed and efficiency
OK, reopened 874631.
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora 'version'
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 21 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
Thank you for reporting this bug and we are sorry it could not be fixed.