Bug 1121301
Summary: | Extensive mislabelling of /usr and/or /var on some Fedora 21 / Rawhide live images prevents them booting unless enforcing=0 is passed | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | selinux-policy-targeted | Assignee: | Miroslav Grepl <mgrepl> |
Status: | CLOSED WORKSFORME | QA Contact: | Ben Levenson <benl> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 21 | CC: | bcl, dwalsh, jreznik, kalevlember, robatino, zbyszek |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-07-30 16:42:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1043119 |
Description
Adam Williamson
2014-07-19 00:46:42 UTC
My theory is that the selinux mislabelling is a fallout from filesystem corruption. From the ‘Fedora-Live-Workstation-x86_64-21-Alpha-TC1-20140711.iso’ compose log linked above: DEBUG util.py:281: Unmounting directory /var/tmp/imgcreate-2Vz_Eu/install_root failed, using lazy umount DEBUG util.py:281: lazy umount succeeded on /var/tmp/imgcreate-2Vz_Eu/install_root Something is preventing clean unmounting of the newly produced file system, which leads to livecd-creator falling back to 'umount -l' -- lazy unmounting. This likely means all data is not written out to the disk image at that point, but livecd-creator still goes on to use the not-cleanly-unmounted disk image to produce an iso. In particular, restorecon runs last and I would guess we'd need clean unmounting to make sure all its changes are actually written out to the image. Interesting. The 20140705 compose log does not have that error, indeed: https://kojipkgs.fedoraproject.org//work/tasks/8744/7108744/root.log DEBUG util.py:281: Unmounting directory /var/tmp/imgcreate-3LvjD0/install_root DEBUG util.py:281: Losetup remove /dev/loop0 but neither does the 20140708 compose log - remember 20140708 already had the /etc mislabels: https://kojipkgs.fedoraproject.org//work/tasks/6506/7116506/root.log So I thought maybe that filesystem problem causes the /usr and /var mislabels, but not the /etc ones...but then, the 20140716 Rawhide compose has the filesystem problem: https://kojipkgs.fedoraproject.org//work/tasks/455/7150455/root.log DEBUG util.py:281: Unmounting directory /var/tmp/imgcreate-6JtQtu/install_root DEBUG util.py:281: umount: /var/tmp/imgcreate-6JtQtu/install_root: target is busy DEBUG util.py:281: (In some cases useful info about processes that DEBUG util.py:281: use the device is found by lsof(8) or fuser(1).) DEBUG util.py:281: Unmounting directory /var/tmp/imgcreate-6JtQtu/install_root failed, using lazy umount DEBUG util.py:281: lazy umount succeeded on /var/tmp/imgcreate-6JtQtu/install_root DEBUG util.py:281: Losetup remove /dev/loop0 but only has the /etc mislabels, no mislabelled /usr or /var. So I'm not sure the symptoms match up with this as a potential cause... restorecon reset /etc/passwd- context system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0 restorecon reset /etc/group- context system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0 This is a problem which could prevent booting. But AFAIK it should be fixed in systemd. well, many of the mislabels cause various forms of chaos on boot. The point is we haven't actually figured out what's causing them yet. Why do you point to systemd? I haven't seen anything so far to indicate that it is the culprit. (In reply to Adam Williamson (Red Hat) from comment #4) > well, many of the mislabels cause various forms of chaos on boot. The point > is we haven't actually figured out what's causing them yet. Why do you point > to systemd? I haven't seen anything so far to indicate that it is the > culprit. https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg20929.html We need to find out why other labels are bad. restorecon reset /var/tmp/abrt context system_u:object_r:abrt_tmp_t:s0->system_u:object_r:abrt_var_cache_t:s0 .. strange, there is filename transition to have it labeled as abrt_var_cache_t restorecon reset /var/log/firewalld context system_u:object_r:var_log_t:s0->system_u:object_r:firewalld_var_log_t:s0 ..the log file is not created by firewalld but a tool running without firewalld_t. restorecon reset /var/log/wpa_supplicant.log context system_u:object_r:NetworkManager_var_lib_t:s0->system_u:object_r:NetworkManager_log_t:s0 .. also strange. It could be a move here. mgrepl: aha. so that could cause the problem with two of the files in /etc , indeed: perhaps we should consider the /etc mislabelling separate from the quasi-random mislabelling of large chunks of files in /var and /usr . However, note one other file in /etc is consistently mislabeled whenever /etc/passwd- and /etc/group- are mislabeled: /etc/.updated . Do you know if the same systemd problem applies to that file? It looks like the /etc fix should be in systemd-215-4.{fc21,fc22}: - Various sysusers fixes, most importantly correct selinux labels so I'll check recent builds and see about that. /etc/passwd- and /etc/group- should be OK because the labels are derived from /etc/passwd and /etc/group by shadow-utils AFAIK. And yes the problem is with /etc/.updated. We need to find out how it is created. We could add a filename rule for it. I am going to build own live image to see if can find out. 2014-07-21 Rawhide nightly - http://koji.fedoraproject.org/koji/taskinfo?taskID=7171314 - still has mislabels of /etc/passwd- and /etc/group- (and /etc/.updated), even though it has systemd-215-4.fc22 . SO looks like it's not just that systemd issue. /etc/.updated is systemd's fault. I'll fix it. passwd- and group- too. I've spun off https://bugzilla.redhat.com/show_bug.cgi?id=1121806 for the /etc mislabels, as Zbigniew seems to know what's going on there. Zbigniew, can you please use that bug for tracking the fix for the three /etc file mislabels? This bug now covers *only* the quasi-random mislabelling of /usr and/or /var in images built since 2014-07-11. (In reply to Miroslav Grepl from comment #3) > restorecon reset /etc/passwd- context > system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0 > restorecon reset /etc/group- context > system_u:object_r:tmpfs_t:s0->system_u:object_r:passwd_file_t:s0 > > This is a problem which could prevent booting. But AFAIK it should be fixed > in systemd. I don't think that the backup files could cause a boot failure... They should not be read or written by anything in the normal case. Anyway, systemd-215-5 should label them correctly. The live session user is created on boot by an initscript (livesys or livesys-late, I forget which). It gets denied. just realized bcl isn't CCed on the bug, though i know he's aware of it. bcl, note kalev's theory in #c1 that this is caused by the filesystem issues in livecd-creator; I'm not sure if that's the case, but it certainly would bear investigation. In the build I just did here locally I am seeing this in the livecd-creator output: /etc/selinux/targeted/contexts/files/file_contexts: line 112 has invalid context system_u:object_r:openshift_script_exec_t:s0 /etc/selinux/targeted/contexts/files/file_contexts: line 475 has invalid context system_u:object_r:condor_conf_t:s0 /etc/selinux/targeted/contexts/files/file_contexts: line 486 has invalid context system_u:object_r:kmscon_conf_t:s0 /etc/selinux/targeted/contexts/files/file_contexts: line 608 has invalid context system_u:object_r:git_content_t:s0 /etc/selinux/targeted/contexts/files/file_contexts: line 799 has invalid context system_u:object_r:mediawiki_rw_content_t:s0 /etc/selinux/targeted/contexts/files/file_contexts: line 1000 has invalid context system_u:object_r:dspam_content_t:s0 /etc/selinux/targeted/contexts/files/file_contexts: line 1067 has invalid context system_u:object_r:webalizer_rw_content_t:s0 /etc/selinux/targeted/contexts/files/file_contexts: line 1075 has invalid context system_u:object_r:mediawiki_content_t:s0 /etc/selinux/targeted/contexts/files/file_contexts: line 1129 has invalid context system_u:object_r:preupgrade_exec_t:s0 Exiting after 10 errors. I'm pretty sure this happens when we call this: self.call(["/sbin/setfiles", "-p", "-e", "/proc", "-e", "/sys", "-e", "/dev", selinux.selinux_file_context_path(), "/"]) So something is going wrong with selinux. This build used selinux v3.13.1-66 This is strange. I don't see it on my system. Is this run on f21 system? It looks there is used f20 file_context file. bcl wrote "This build used selinux v3.13.1-66" i think by that he means selinux-policy-3.13.1-66 , and that build does not exist for f20, indeed it only exists for f21. therefore it seems pretty certain he ran it on f21. Sorry, the build host is F20. selinux-policy-3.13.1-66 is what the livecd-creator run installed. So have we reached a point where livecd-creator can't be used to generate the next release's images? That would be yet another reason to move to using livemedia-creator. bcl: for me it's been like that for years, i always use the same version for the build host and the target image. selinux does usually seem to be the issue, it seems like the policy needs to match between the build host and the image. So do we have still any issues? I don't believe anyone's explicitly fixed anything yet. The last nightlies I looked at had no labeling issues, but that could just be a coincidence, the bug doesn't seem to be entirely deterministic. I'll take a look at the last few days' worth of nightlies today. 2014-07-27 Rawhide nightly (Workstation x86_64 - http://koji.fedoraproject.org/koji/taskinfo?taskID=7200357 ) has no mislabeling. Neither does 2014-07-26 nightly (http://koji.fedoraproject.org/koji/taskinfo?taskID=7198871 ). Still, I'd like to see a couple more F21 builds before being certain it's OK, 21 seems to be worse than Rawhide for some reason. We'll see when dgilmore gets back to doing F21 builds. The point is this bug was more about systemd issue. Sure we can still get restorecon reset /var/tmp/abrt context system_u:object_r:abrt_tmp_t:s0->system_u:object_r:abrt_var_cache_t:s0 restorecon reset /var/log/firewalld context system_u:object_r:var_log_t:s0->system_u:object_r:firewalld_var_log_t:s0 restorecon reset /var/log/wpa_supplicant.log context system_u:object_r:NetworkManager_var_lib_t:s0->system_u:object_r:NetworkManager_log_t:s0 but I would like to see it again. Basically this is more a setup issue than SELinux issue. Tested with 07-30 F21 nightly and the bug didn't show up again. For now we're just going to close this, as it doesn't seem reproducible right now; we'll assume it's either gotten magically fixed along the way, or it's an unpredictable consequence of the filesystem umount failure in image creation which we really need to file separately anyway (so I'll do that). If this somehow comes back and turns out to be not the same thing as the filesystem umount issue, we can re-open it. Forgot to note, the above was: Discussed at 2014-07-30 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2014-07-30/f21-blocker-review.2014-07-30-15.59.log.txt . |