Bug 1121301 - Extensive mislabelling of /usr and/or /var on some Fedora 21 / Rawhide live images prevents them booting unless enforcing=0 is passed
Summary: Extensive mislabelling of /usr and/or /var on some Fedora 21 / Rawhide live i...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: selinux-policy-targeted
Version: 21
Hardware: x86_64
OS: All
unspecified
urgent
Target Milestone: ---
Assignee: Miroslav Grepl
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks: F21AlphaBlocker
TreeView+ depends on / blocked
 
Reported: 2014-07-19 00:46 UTC by Adam Williamson
Modified: 2016-02-01 10:13 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-07-30 16:42:51 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1121806 None None None Never
Red Hat Bugzilla 1303565 None None None Never

Internal Links: 1121806 1303565

Description Adam Williamson 2014-07-19 00:46:42 UTC
We noticed today that Fedora 21 x86_64 Workstation live images (probably other live builds too, but I haven't checked) have extensive SELinux mislabelling which prevent them booting at all unless enforcing=0 is passed. Many files are mislabelled, leading to an avalanche of AVCs on boot.

I went 'back in time' thanks to Koji and tried to bisect when this mislabelling started. The results are quite interesting: it seems to have shown up gradually, and differently in Rawhide nightly, F21 TC1, and F21 nightly images.

It gets a bit complex, so the summary is this:

== SUMMARY ==

Up to 2014-07-05 there was only one mislabel:

restorecon reset /var/log/cron context system_u:object_r:var_log_t:s0->system_u:object_r:cron_log_t:s0

that one seems to have been around for a long time, back to images from June. I don't think it's relevant to what's going on in this bug, probably some other issue. I should file it, though.

The next successful Workstation compose was 2014-07-08, when three mislabeled files appeared in /etc. All builds on and after this date have those three mislabeled files in /etc.

Up till 2014-07-11, things stayed the same. In an F21 Alpha TC1 compose attempt done that day, major mislabeling of /usr appeared alongside the /etc mislabeling. From that date on, different F21 and Rawhide Workstation images show different levels of mislabeling of /var and /usr , alongside the ongoing mislabeling of three files in /etc. Details follow!

== DETAILS ==

=== RAWHIDE IMAGES ===

In the 2014-07-05 image 'Fedora-Live-Workstation-x86_64-rawhide-20140705.iso', there is no mislabelling (beyond /var/log/cron which is mislabelled in all tests due to a separate bug, doesn't appear to be relevant here).

In the 2014-07-08 image ‘Fedora-Live-Workstation-x86_64-rawhide-20140708.iso’ (the next that composed successfully after 2014-07-05), the mislabelling of 3 files in /etc appears, but /usr and /var are correctly labelled. Things stayed the same through 2014-07-09 and 2014-07-10. In fact, *Rawhide* builds stayed the same up through 2014-07-16: every successful compose shows only the 3 /etc mislabels and the 1 unrelated /var mislabel. However, the 2014-07-17 Rawhide image ‘Fedora-Live-Workstation-x86_64-rawhide-20140717.iso’ shows 4 mislabels in /var (so three new ones) and 806 mislabels in /usr.

=== F21 IMAGES ===

In the 2014-07-11 image ‘Fedora-Live-Workstation-x86_64-21-Alpha-TC1-20140711.iso’ - http://koji.fedoraproject.org/koji/taskinfo?taskID=7130789 , the Rawhide nightly for Workstation that day failed to compose - /etc has 3 mislabels, /usr has 2275 mislabels, but /var still has only the 1 unrelated one.

In the 2014-07-14 image ‘Fedora-Live-Workstation-x86_64-21-Alpha-TC1--20140714.iso’ (a TC1 compose attempt, like the 2014-07-11 image), /etc has 3 mislabels, /usr has 1531 mislabels. /var has 4 mislabels.

in the 2014-07-17 image 'Fedora-Live-Workstation-x86_64-21-20140717.iso' (an F21 nightly build) /etc has 3, /var has 1096, but /usr has none.

== THE MISLABELS ==

The three mislabels in /etc are always:

restorecon reset /etc/.updated context system_u:object_r:etc_runtime_t:s0->system_u:object_r:etc_t:s0
restorecon reset /etc/passwd- context system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0
restorecon reset /etc/group- context system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0

/etc/passwd- and /etc/group- are created/touched when users are created, I believe. I have no idea what /etc/.updated is for.

The three new /var mislabels in the 2014-07-14 21-Alpha-TC1 image are:

restorecon reset /var/tmp/abrt context system_u:object_r:abrt_tmp_t:s0->system_u:object_r:abrt_var_cache_t:s0
restorecon reset /var/log/firewalld context system_u:object_r:var_log_t:s0->system_u:object_r:firewalld_var_log_t:s0
restorecon reset /var/log/wpa_supplicant.log context system_u:object_r:NetworkManager_var_lib_t:s0->system_u:object_r:NetworkManager_log_t:s0

I'll attach the other lists of mislabels, as they're hundreds or thousands of items long.

== NOTES ==

The dates when the behaviour seems to change don't line up with selinux-policy changes. The 2014-07-05 image with no labelling issues has selinux-policy-3.13.1-63.fc21 which landed on 2014-07-04, and another selinux-policy build didn't occur until 2014-07-14 (when -64 arrived); so we had the same selinux-policy version the whole time things were starting to break. libselinux hasn't changed since June, so it's probably not involved.

I don't see an obvious smoking gun in the spin-kickstarts log, but there is an interesting note:

commit 398ba1441bfb2be494934c02a5196e76fd93fc0f
Author: Matthew Miller <mattdm@mattdm.org>
Date:   Tue Jul 8 14:50:29 2014 -0400

    use hd0,0 in the grub.conf since we're switching to full-disk images instead of single partition

what's that about "full-disk images"?

livecd-tools hasn't had a build since early June, so we can rule it out.

This one seems pretty weird and slippery, I'm not entirely sure what's going on. It does seem notable that a lot of the affected files are ones that are not installed as part of a package but generated otherwise, but that doesn't apply to the /usr mislabels from the 2014-07-11 and 2014-07-14 F21 images, and the 2014-07-17 Rawhide image.

Proposing as an Alpha release blocker, https://fedoraproject.org/wiki/Fedora_21_Alpha_Release_Criteria#Release-blocking_images_must_boot , "All release-blocking images must boot in their supported configurations."

Comment 1 Kalev Lember 2014-07-20 14:22:19 UTC
My theory is that the selinux mislabelling is a fallout from filesystem corruption.

From the ‘Fedora-Live-Workstation-x86_64-21-Alpha-TC1-20140711.iso’ compose log linked above:

DEBUG util.py:281:  Unmounting directory /var/tmp/imgcreate-2Vz_Eu/install_root failed, using lazy umount
DEBUG util.py:281:  lazy umount succeeded on /var/tmp/imgcreate-2Vz_Eu/install_root

Something is preventing clean unmounting of the newly produced file system, which leads to livecd-creator falling back to 'umount -l' -- lazy unmounting. This likely means all data is not written out to the disk image at that point, but livecd-creator still goes on to use the not-cleanly-unmounted disk image to produce an iso. In particular, restorecon runs last and I would guess we'd need clean unmounting to make sure all its changes are actually written out to the image.

Comment 2 Adam Williamson 2014-07-20 23:22:29 UTC
Interesting. The 20140705 compose log does not have that error, indeed:

https://kojipkgs.fedoraproject.org//work/tasks/8744/7108744/root.log

DEBUG util.py:281:  Unmounting directory /var/tmp/imgcreate-3LvjD0/install_root
DEBUG util.py:281:  Losetup remove /dev/loop0

but neither does the 20140708 compose log - remember 20140708 already had the /etc mislabels:

https://kojipkgs.fedoraproject.org//work/tasks/6506/7116506/root.log

So I thought maybe that filesystem problem causes the /usr and /var mislabels, but not the /etc ones...but then, the 20140716 Rawhide compose has the filesystem problem:

https://kojipkgs.fedoraproject.org//work/tasks/455/7150455/root.log

DEBUG util.py:281:  Unmounting directory /var/tmp/imgcreate-6JtQtu/install_root
DEBUG util.py:281:  umount: /var/tmp/imgcreate-6JtQtu/install_root: target is busy
DEBUG util.py:281:          (In some cases useful info about processes that
DEBUG util.py:281:           use the device is found by lsof(8) or fuser(1).)
DEBUG util.py:281:  Unmounting directory /var/tmp/imgcreate-6JtQtu/install_root failed, using lazy umount
DEBUG util.py:281:  lazy umount succeeded on /var/tmp/imgcreate-6JtQtu/install_root
DEBUG util.py:281:  Losetup remove /dev/loop0

but only has the /etc mislabels, no mislabelled /usr or /var. So I'm not sure the symptoms match up with this as a potential cause...

Comment 3 Miroslav Grepl 2014-07-21 08:22:18 UTC
restorecon reset /etc/passwd- context system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0
restorecon reset /etc/group- context system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0

This is a problem which could prevent booting. But AFAIK it should be fixed in systemd.

Comment 4 Adam Williamson 2014-07-21 14:48:01 UTC
well, many of the mislabels cause various forms of chaos on boot. The point is we haven't actually figured out what's causing them yet. Why do you point to systemd? I haven't seen anything so far to indicate that it is the culprit.

Comment 5 Miroslav Grepl 2014-07-21 15:22:16 UTC

(In reply to Adam Williamson (Red Hat) from comment #4)
> well, many of the mislabels cause various forms of chaos on boot. The point
> is we haven't actually figured out what's causing them yet. Why do you point
> to systemd? I haven't seen anything so far to indicate that it is the
> culprit.

https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg20929.html

We need to find out why other labels are bad.


restorecon reset /var/tmp/abrt context system_u:object_r:abrt_tmp_t:s0->system_u:object_r:abrt_var_cache_t:s0

.. strange, there is filename transition to have it labeled as abrt_var_cache_t

restorecon reset /var/log/firewalld context system_u:object_r:var_log_t:s0->system_u:object_r:firewalld_var_log_t:s0

..the log file is not created by firewalld but a tool running without firewalld_t.

restorecon reset /var/log/wpa_supplicant.log context system_u:object_r:NetworkManager_var_lib_t:s0->system_u:object_r:NetworkManager_log_t:s0

.. also strange. It could be a move here.

Comment 6 Adam Williamson 2014-07-21 15:52:41 UTC
mgrepl: aha. so that could cause the problem with two of the files in /etc , indeed: perhaps we should consider the /etc mislabelling separate from the quasi-random mislabelling of large chunks of files in /var and /usr . However, note one other file in /etc is consistently mislabeled whenever /etc/passwd- and /etc/group- are mislabeled: /etc/.updated . Do you know if the same systemd problem applies to that file?

Comment 7 Adam Williamson 2014-07-21 15:53:57 UTC
It looks like the /etc fix should be in systemd-215-4.{fc21,fc22}:

- Various sysusers fixes, most importantly correct selinux labels

so I'll check recent builds and see about that.

Comment 8 Miroslav Grepl 2014-07-21 15:56:50 UTC
/etc/passwd- and /etc/group- should be OK because the labels are derived from /etc/passwd and /etc/group by shadow-utils AFAIK.

And yes the problem is with /etc/.updated. We need to find out how it is created. We could add a filename rule for it.

I am going to build own live image to see if can find out.

Comment 9 Adam Williamson 2014-07-21 16:15:01 UTC
2014-07-21 Rawhide nightly - http://koji.fedoraproject.org/koji/taskinfo?taskID=7171314 - still has mislabels of /etc/passwd- and /etc/group- (and /etc/.updated), even though it has systemd-215-4.fc22 . SO looks like it's not just that systemd issue.

Comment 10 Zbigniew Jędrzejewski-Szmek 2014-07-21 21:21:41 UTC
/etc/.updated is systemd's fault. I'll fix it.

Comment 11 Zbigniew Jędrzejewski-Szmek 2014-07-21 21:23:35 UTC
passwd- and group- too.

Comment 12 Adam Williamson 2014-07-21 23:36:40 UTC
I've spun off https://bugzilla.redhat.com/show_bug.cgi?id=1121806 for the /etc mislabels, as Zbigniew seems to know what's going on there. Zbigniew, can you please use that bug for tracking the fix for the three /etc file mislabels?

This bug now covers *only* the quasi-random mislabelling of /usr and/or /var in images built since 2014-07-11.

Comment 13 Zbigniew Jędrzejewski-Szmek 2014-07-22 13:25:00 UTC
(In reply to Miroslav Grepl from comment #3)
> restorecon reset /etc/passwd- context
> system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0
> restorecon reset /etc/group- context
> system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0
> 
> This is a problem which could prevent booting. But AFAIK it should be fixed
> in systemd.

I don't think that the backup files could cause a boot failure... They should not be read or written by anything in the normal case. Anyway, systemd-215-5 should label them correctly.

Comment 14 Adam Williamson 2014-07-22 20:51:38 UTC
The live session user is created on boot by an initscript (livesys or livesys-late, I forget which). It gets denied.

Comment 15 Adam Williamson 2014-07-23 16:27:23 UTC
just realized bcl isn't CCed on the bug, though i know he's aware of it. bcl, note kalev's theory in #c1 that this is caused by the filesystem issues in livecd-creator; I'm not sure if that's the case, but it certainly would bear investigation.

Comment 16 Brian Lane 2014-07-23 19:08:29 UTC
In the build I just did here locally I am seeing this in the livecd-creator output:

/etc/selinux/targeted/contexts/files/file_contexts: line 112 has invalid context system_u:object_r:openshift_script_exec_t:s0
/etc/selinux/targeted/contexts/files/file_contexts: line 475 has invalid context system_u:object_r:condor_conf_t:s0
/etc/selinux/targeted/contexts/files/file_contexts: line 486 has invalid context system_u:object_r:kmscon_conf_t:s0
/etc/selinux/targeted/contexts/files/file_contexts: line 608 has invalid context system_u:object_r:git_content_t:s0
/etc/selinux/targeted/contexts/files/file_contexts: line 799 has invalid context system_u:object_r:mediawiki_rw_content_t:s0
/etc/selinux/targeted/contexts/files/file_contexts: line 1000 has invalid context system_u:object_r:dspam_content_t:s0
/etc/selinux/targeted/contexts/files/file_contexts: line 1067 has invalid context system_u:object_r:webalizer_rw_content_t:s0
/etc/selinux/targeted/contexts/files/file_contexts: line 1075 has invalid context system_u:object_r:mediawiki_content_t:s0
/etc/selinux/targeted/contexts/files/file_contexts: line 1129 has invalid context system_u:object_r:preupgrade_exec_t:s0
Exiting after 10 errors.


I'm pretty sure this happens when we call this:

self.call(["/sbin/setfiles", "-p", "-e", "/proc", "-e", "/sys", "-e", "/dev", selinux.selinux_file_context_path(), "/"])

So something is going wrong with selinux. This build used selinux v3.13.1-66

Comment 17 Miroslav Grepl 2014-07-25 10:19:38 UTC
This is strange. I don't see it on my system. Is this run on f21 system? It looks there is used f20 file_context file.

Comment 18 Adam Williamson 2014-07-25 20:05:36 UTC
bcl wrote "This build used selinux v3.13.1-66"

i think by that he means selinux-policy-3.13.1-66 , and that build does not exist for f20, indeed it only exists for f21. therefore it seems pretty certain he ran it on f21.

Comment 19 Brian Lane 2014-07-25 21:59:08 UTC
Sorry, the build host is F20. selinux-policy-3.13.1-66 is what the livecd-creator run installed.

So have we reached a point where livecd-creator can't be used to generate the next release's images? That would be yet another reason to move to using livemedia-creator.

Comment 20 Adam Williamson 2014-07-25 22:09:50 UTC
bcl: for me it's been like that for years, i always use the same version for the build host and the target image. selinux does usually seem to be the issue, it seems like the policy needs to match between the build host and the image.

Comment 21 Miroslav Grepl 2014-07-28 12:59:29 UTC
So do we have still any issues?

Comment 22 Adam Williamson 2014-07-28 15:49:29 UTC
I don't believe anyone's explicitly fixed anything yet. The last nightlies I looked at had no labeling issues, but that could just be a coincidence, the bug doesn't seem to be entirely deterministic. I'll take a look at the last few days' worth of nightlies today.

Comment 23 Adam Williamson 2014-07-28 23:15:08 UTC
2014-07-27 Rawhide nightly (Workstation x86_64 - http://koji.fedoraproject.org/koji/taskinfo?taskID=7200357 ) has no mislabeling. Neither does 2014-07-26 nightly (http://koji.fedoraproject.org/koji/taskinfo?taskID=7198871 ). Still, I'd like to see a couple more F21 builds before being certain it's OK, 21 seems to be worse than Rawhide for some reason. We'll see when dgilmore gets back to doing F21 builds.

Comment 24 Miroslav Grepl 2014-07-29 06:08:58 UTC
The point is this bug was more about systemd issue. Sure we can still get 

restorecon reset /var/tmp/abrt context system_u:object_r:abrt_tmp_t:s0->system_u:object_r:abrt_var_cache_t:s0

restorecon reset /var/log/firewalld context system_u:object_r:var_log_t:s0->system_u:object_r:firewalld_var_log_t:s0

restorecon reset /var/log/wpa_supplicant.log context system_u:object_r:NetworkManager_var_lib_t:s0->system_u:object_r:NetworkManager_log_t:s0

but I would like to see it again. Basically this is more a setup issue than SELinux issue.

Comment 25 Adam Williamson 2014-07-30 16:42:51 UTC
Tested with 07-30 F21 nightly and the bug didn't show up again. For now we're just going to close this, as it doesn't seem reproducible right now; we'll assume it's either gotten magically fixed along the way, or it's an unpredictable consequence of the filesystem umount failure in image creation which we really need to file separately anyway (so I'll do that).

If this somehow comes back and turns out to be not the same thing as the filesystem umount issue, we can re-open it.

Comment 26 Adam Williamson 2014-07-30 16:43:07 UTC
Forgot to note, the above was: Discussed at 2014-07-30 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2014-07-30/f21-blocker-review.2014-07-30-15.59.log.txt .


Note You need to log in before you can comment on or make changes to this bug.