Bug 1265295 - f23 beta atomic - journal from first boot is lost
Summary: f23 beta atomic - journal from first boot is lost
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: ostree
Version: 24
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Colin Walters
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1289902 1353688 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-22 15:10 UTC by Dusty Mabe
Modified: 2023-09-14 03:05 UTC (History)
11 users (show)

Fixed In Version: ostree-2016.10-4.fc24 ostree-2016.10-8.fc24
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-23 00:23:24 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Kickstart patch (588 bytes, patch)
2016-01-10 22:44 UTC, Matthew Barnes
no flags Details | Diff

Description Dusty Mabe 2015-09-22 15:10:19 UTC
Description of problem:

The journal on first boot of the Atomic cloud image is lost because it is logged to /run vs /var. Filing against rpm-ostree as I don't really know a better component to use. Please move to correct component when that is determined.

##### FIRST BOOT ######

-bash-4.3# ls -l /proc/$(pgrep journal)/fd/ | grep journal
lrwx------. 1 root root 64 Sep 22 13:21 13 -> /run/log/journal/bd336ade3c5a47d9b59d7cd2fe01fa0a/system.journal

###### 1st REBOOT #########

-bash-4.3# journalctl --list-boots
 0 e5976df3fb6645edb10b9ca5a5269cb0 Tue 2015-09-22 13:57:31 UTC—Tue 2015-09-22 13:58:20 UTC
-bash-4.3# 
-bash-4.3# ls -l /proc/$(pgrep journal)/fd/ | grep journal
lrwx------. 1 root root 64 Sep 22 13:57 18 -> /var/log/journal/bd336ade3c5a47d9b59d7cd2fe01fa0a/system.journal
lrwx------. 1 root root 64 Sep 22 13:57 28 -> /var/log/journal/bd336ade3c5a47d9b59d7cd2fe01fa0a/user-1000.journal

###### 2nd REBOOT #########

-bash-4.3# journalctl --list-boots
-1 e5976df3fb6645edb10b9ca5a5269cb0 Tue 2015-09-22 13:57:31 UTC—Tue 2015-09-22 14:00:02 UTC
 0 ba533750e5534b9e860b457c8677dead Tue 2015-09-22 14:00:08 UTC—Tue 2015-09-22 14:03:13 UTC
-bash-4.3# 
-bash-4.3# ls -l /proc/$(pgrep journal)/fd/ | grep journal
lrwx------. 1 root root 64 Sep 22 14:00 14 -> /var/log/journal/bd336ade3c5a47d9b59d7cd2fe01fa0a/system.journal
lrwx------. 1 root root 64 Sep 22 14:03 26 -> /var/log/journal/bd336ade3c5a47d9b59d7cd2fe01fa0a/user-1000.journal

###########################

Version-Release number of selected component (if applicable):
-bash-4.3# rpm -q rpm-ostree ostree kernel systemd
rpm-ostree-2015.9-2.fc23.x86_64
ostree-2015.8-1.fc23.x86_64
kernel-4.2.0-300.fc23.x86_64
systemd-222-5.fc23.x86_64


How reproducible:
Always


Steps to Reproduce:
See description

Comment 1 Colin Walters 2015-09-22 19:03:44 UTC
Well, what's in /etc/systemd/journald.conf ?  storage=auto?

This might be due to a change with respect to journald in the initramfs.

Comment 2 Dusty Mabe 2015-09-22 19:10:44 UTC
Everything in journald.conf is commented out which is the same as it is in the cloud base image. The cloud base image does not have this problem.

Comment 3 Dusty Mabe 2015-10-13 13:44:11 UTC
This is still a problem as of 'Fedora-Cloud-Atomic-23-20151005.x86_64.qcow2'. I haven't tried a newer image but I assume it hasn't been fixed.

Comment 4 Stef Walter 2015-12-10 13:11:25 UTC
*** Bug 1289902 has been marked as a duplicate of this bug. ***

Comment 5 Colin Walters 2015-12-16 14:34:17 UTC
I'm thinking this is likely because we're pruning /var in rpm-ostree composes, likely need to do what we do in RHELAH to set storage=persistent.

Comment 6 Colin Walters 2016-01-10 22:06:48 UTC
Another alternative is to patch systemd to default to persistent.  I lean a bit towards this - the "auto" value is a bit weird - configuration-by-directory-presence.

Comment 7 Matthew Barnes 2016-01-10 22:44:06 UTC
Created attachment 1113405 [details]
Kickstart patch

I think this should do for now.  Based on RHELAH's treecompose-post.sh.

Still trying to (learn how to) get an image built so I can test this.

Comment 8 Matthew Barnes 2016-01-10 23:28:22 UTC
FTR, I'm doing:

# imagefactory --debug base_image --file-parameter install_script fedora-cloud-atomic.ks ../f23/fedora-atomic-f23.tdl

But that eventually aborts with:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/imgfac/Builder.py", line 132, in _build_image_from_template
    self.os_plugin.create_base_image(self, template, parameters)
  File "/usr/lib/python2.7/site-packages/imagefactory_plugins/TinMan/TinMan.py", line 365, in create_base_image
    builder.base_image.icicle = self.guest.customize_and_generate_icicle(libvirt_xml)
  File "/usr/lib/python2.7/site-packages/oz/Linux.py", line 353, in customize_and_generate_icicle
    return self._internal_customize(libvirt_xml, "gen_and_mod")
  File "/usr/lib/python2.7/site-packages/oz/Linux.py", line 303, in _internal_customize
    self._collect_setup(modified_xml)
  File "/usr/lib/python2.7/site-packages/oz/RedHat.py", line 421, in _collect_setup
    g_handle = self._guestfs_handle_setup(libvirt_xml)
  File "/usr/lib/python2.7/site-packages/oz/Guest.py", line 1061, in _guestfs_handle_setup
    raise oz.OzException.OzException("No operating systems found on the disk")
OzException: No operating systems found on the disk

Image build FAILED with error: No operating systems found on the disk


Not sure if the ksflatten step is necessary.  I get the same error either way.  Hints appreciated.

Comment 9 Colin Walters 2016-01-11 15:12:32 UTC
The problem with doing this in that kickstart is it doesn't apply to bare metal.  There's a different kickstart for that that we *could* do things in, but it's probably simplest to do it in the treecompose post (where the locale bits are) so it applies everywhere.

I lean a bit towards at least pinging the systemd guys and flipping the default.

Comment 11 Colin Walters 2016-09-14 16:57:45 UTC
https://github.com/ostreedev/ostree/pull/511

Comment 12 Colin Walters 2016-09-14 17:06:07 UTC
*** Bug 1353688 has been marked as a duplicate of this bug. ***

Comment 13 Fedora Update System 2016-09-16 11:39:04 UTC
ostree-2016.10-4.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-702fb80598

Comment 14 Fedora Update System 2016-09-17 00:53:22 UTC
ostree-2016.10-4.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-702fb80598

Comment 15 Jason Brooks 2016-09-19 18:25:14 UTC
I left feedback in bodhi, but I'm getting the same behavior w/ ostree-2016.10-4.fc24.

Comment 16 Jonathan Lebon 2016-09-20 15:50:08 UTC
> I left feedback in bodhi, but I'm getting the same behavior w/ ostree-2016.10-4.fc24.

I can confirm this as well. This is due to an SELinux violation:

Sep 20 03:08:47 localhost.localdomain audit[625]: AVC avc:  denied  { write } for  pid=625 comm="systemd-journal" name="log" dev="dm-0" ino=4756389 scontext=system_u:system_r:syslogd_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=dir permissive=0

What happens is that the journal tries to write to /var/log, but it's labeled as var_t instead of the expected var_log_t, so it gets blocked by SELinux. Now, /var/log *does* get relabeled later on by systemd-tmpfiles. And in fact, a following `systemctl restart systemd-journald.service` would do the trick.

The issue is that during installation, the dir $ANA_INSTALL_PATH/var is labeled as var_t by ostree when anaconda creates the deployment, but then anaconda copies log files into $ANA_INSTALL_PATH/var/log/anaconda, creating log in the process with an inherited var_t label (since they don't match the final paths SELinux expects). So a workaround here is to simply do:

https://github.com/ostreedev/ostree/pull/513

Or we can transfer this to anaconda so that they ensure that /var/log is properly labeled during installation.

Or we can pass inst.nosave=logs to anaconda so that it doesn't even create those log files.

I'm gonna let Colin chime in here.

Comment 17 Colin Walters 2016-09-20 18:02:45 UTC
Ah, so there were two bugs here.  I did test the ordering fix, but I tested it by nuking /var/log/journal, not /var/log.

Hmm.  I don't quite understand how your ostree PR helps - oh wait, I see, it's that ostree does a relabeling on the first deployment for an osname.

Eh.  Something is bugging me here - why can't we just have systemd-tmpfiles handle this, basically ensure that systemd-journal-flush is ordered after it?

A quick annotate leads to:

https://github.com/systemd/systemd/commit/74055aa76278232ff05574fc47c4e6b3560554a7

Which I've read three times and am still confused by.  Don't we want to order *before*?

Comment 18 Jonathan Lebon 2016-09-20 18:27:07 UTC
(In reply to Colin Walters from comment #17)
> Eh.  Something is bugging me here - why can't we just have systemd-tmpfiles
> handle this, basically ensure that systemd-journal-flush is ordered after it?
> 
> A quick annotate leads to:
> 
> https://github.com/systemd/systemd/commit/
> 74055aa76278232ff05574fc47c4e6b3560554a7
> 
> Which I've read three times and am still confused by.  Don't we want to
> order *before*?

Yeah, that's confusing indeed. I had tested this before by adding systemd-journal-flush.service as an After= in systemd-tmpfiles-setup.service and didn't investigate further when I saw the circular dependency error without realizing it came from right there in the systemd-journal-flush.service definition.

But I tried just now to simply move systemd-tmpfiles-setup.service from a Before= to an After= in systemd-journal-flush.service and everything went smooth, so I'm wondering if that isn't what was intended upstream.

Lennart, could you have a quick look? Is this ordered the other way around?

Comment 20 Jason Brooks 2016-09-21 01:44:56 UTC
Seems the same? https://paste.fedoraproject.org/431727/47442219/

Comment 21 Jonathan Lebon 2016-09-21 13:33:19 UTC
(In reply to Jason Brooks from comment #20)
> Seems the same? https://paste.fedoraproject.org/431727/47442219/

Works for me here. Are you using an installer that has this ostree build?

Comment 22 Fedora Update System 2016-09-22 00:22:17 UTC
ostree-2016.10-4.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 23 Fedora Update System 2016-09-22 01:24:51 UTC
ostree-2016.10-8.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-eea2281eff

Comment 24 Fedora Update System 2016-09-23 00:23:24 UTC
ostree-2016.10-8.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Jonathan Lebon 2016-09-23 13:55:50 UTC
We'll need the Koji createImage tasks to use newer boot media than what is currently in https://kojipkgs.fedoraproject.org/pub/alt/releases/24/Cloud/x86_64/os/ for this fix to take effect in the F24 cloud images.

Comment 26 Dusty Mabe 2016-09-27 19:01:29 UTC
Any update on this happening?

Comment 27 Colin Walters 2016-09-27 21:05:35 UTC
I moved that part to https://pagure.io/releng/issue/114

Comment 28 Dusty Mabe 2016-10-24 16:21:25 UTC
For the record it seems journal is getting written to /var/log/ just fine on F25 atomic host:

-bash-4.3# rpm-ostree status
State: idle
Deployments:
● fedora-atomic:fedora-atomic/25/x86_64/docker-host
       Version: 25.33 (2016-10-23 10:17:49)
        Commit: 049beb1175ad5763df5ba345c9f062fc56f5fd70c8f1d6a4f4c3cd2882192ae8
        OSName: fedora-atomic
-bash-4.3# ls -l /proc/659/fd | grep journal
lrwx------. 1 root root 64 Oct 24 16:15 23 -> /var/log/journal/9c797dc6c9a24adcbc1d88cf0cf932a9/system.journal
lrwx------. 1 root root 64 Oct 24 16:15 29 -> /var/log/journal/9c797dc6c9a24adcbc1d88cf0cf932a9/user-1000.journal

Comment 29 Red Hat Bugzilla 2023-09-14 03:05:41 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.