Created attachment 1933144 [details] console log from test run. Description of problem: After upgrading to the selinux-policy-38.2-1.fc38 rpm we have ran into a failing test on our rawhide branch. We have created an upstream ticket for this as well; here => https://github.com/coreos/fedora-coreos-tracker/issues/1364 . The test that is failing is called ostree.hotfix/persist (context) => https://github.com/coreos/coreos-assembler/blob/221a0d87a9ed10dcaa9bfec4e0963c23a0f080f2/mantle/kola/tests/ostree/unlock.go#L235 This test installs the rpm, and then reboots into a hotfix deployment. It seems that the systemd can not execute the system-generators, so we expect it to be somthing with overlay + SELinux. (link) =>https://github.com/coreos/fedora-coreos-tracker/issues/1364#issuecomment-1355466387 Version-Release number of selected component (if applicable): selinux-policy-38.2-1.fc38 How reproducible: Simply build rawhide with the selinux-policy-38.2-1.fc38 in the overrides folder. (We have pin'd an earlier version to get past this blocker selinux-policy-38.1-1.fc38) After building run the ostree.hotfix test Steps to Reproduce: 1.cosa init https://github.com/coreos/fedora-coreos-config --branch rawhide 2.cosa fetch 3.cosa init 4.pushd overrides/rpm 5.koji download-build selinux-policy-38.2-1.fc38 6.popd 7.cosa build 8.cosa kola run ostree.hotfix Actual results: === RUN ostree.hotfix 08:11:45 === RUN ostree.hotfix/unlock 08:11:48 === RUN ostree.hotfix/install 08:11:48 === RUN ostree.hotfix/uninstall 08:11:49 === RUN ostree.hotfix/persist 08:23:25 --- FAIL: ostree.hotfix (720.46s) 08:23:25 --- PASS: ostree.hotfix/unlock (5.01s) 08:23:25 --- PASS: ostree.hotfix/install (0.52s) 08:23:25 --- PASS: ostree.hotfix/uninstall (0.33s) 08:23:25 --- FAIL: ostree.hotfix/persist (612.68s) 08:23:25 unlock.go:240: Failed to reboot machine: machine "95f91dad-1372-4075-99ae-e824f3a39bc2" failed to start: ssh journalctl failed: time limit exceeded 08:23:25 cluster.go:184: "logger --tag kola '=== DONE: ostree.hotfix/persist ==='" failed: output , status ssh: handshake failed: read tcp 127.0.0.1:56220->127.0.0.1:43907: read: connection reset by peer 08:23:25 harness.go:1501: Found emergency shell on machine 95f91dad-1372-4075-99ae-e824f3a39bc2 console 08:23:25 harness.go:1501: Found systemd generator failure (/usr/lib/systemd/system-generators/coreos-installer-generator) on machine 95f91dad-1372-4075-99ae-e824f3a39bc2 console 08:23:25 FAIL, output in /home/jenkins/agent/workspace/build/tmp/kola-lW8u1/kola/rerun 08:23:25 Error: harness: test suite failed 08:23:25 2022-12-08T13:23:19Z cli: harness: test suite failed 08:23:25 error: failed to execute cmd-kola: exit status 1 Expected results: PASS Additional info:
Created attachment 1933145 [details] Journal Additionally if looking through the journal you can see AVC denied messages Dec 16 17:45:45.495000 audit[1270]: AVC avc: denied { sys_admin } for pid=1270 comm="mv" capability=21 scontext=system_u:system_r:NetworkManager_dispatcher_console_t:s0 tcontext=system_u:system_r:NetworkManager_dispatcher_console_t:s0 tclass=capability permissive=0 Dec 16 17:45:45.495000 audit[1270]: SYSCALL arch=c000003e syscall=196 success=yes exit=17 a0=3 a1=0 a2=0 a3=7ffe052549ff items=0 ppid=1266 pid=1270 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mv" exe="/usr/bin/mv" subj=system_u:system_r:NetworkManager_dispatcher_console_t:s0 key=(null) Dec 16 17:45:45.495000 audit: PROCTITLE proctitle=6D76002F72756E2F636F6E736F6C652D6C6F67696E2D68656C7065722D6D657373616765732F636F6E736F6C652D6C6F67696E2D68656C7065722D6D657373616765732E58586A567935535143542E746D70002F6574632F69737375652E642F32325F636C686D5F656E73342E6973737565 Dec 16 17:45:45.495000 audit[1270]: AVC avc: denied { sys_admin } for pid=1270 comm="mv" capability=21 scontext=system_u:system_r:NetworkManager_dispatcher_console_t:s0 tcontext=system_u:system_r:NetworkManager_dispatcher_console_t:s0 tclass=capability permissive=0 Dec 16 17:45:45.495000 audit[1270]: SYSCALL arch=c000003e syscall=196 success=yes exit=17 a0=3 a1=7ffe05253990 a2=11 a3=7ffe052549ff items=0 ppid=1266 pid=1270 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mv" exe="/usr/bin/mv" subj=system_u:system_r:NetworkManager_dispatcher_console_t:s0 key=(null)
I think those NetworkManager denials are probably unrelated (though there might be a separate bug we should open for that). The real problem we are seeing and why the test is failing is because all of the systemd generators are failing. From the console attachment: ``` [ 13.642326] systemd[1542]: Failed to execute /usr/lib/systemd/system-generators/coreos-platform-chrony: Permission denied [ 13.643313] systemd[1541]: Failed to execute /usr/lib/systemd/system-generators/coreos-liveiso-autologin-generator: Permission denied [ 13.644286] systemd[1543]: Failed to execute /usr/lib/systemd/system-generators/kdump-dep-generator.sh: Permission denied [ 13.646173] systemd[1540]: Failed to execute /usr/lib/systemd/system-generators/coreos-installer-generator: Permission denied [ 13.646944] systemd[1546]: Failed to execute /usr/lib/systemd/system-generators/selinux-autorelabel-generator.sh: Permission denied [ 13.654462] systemd[1539]: Failed to execute /usr/lib/systemd/system-generators/coreos-boot-mount-generator: Permission denied [ 13.678167] zram_generator::config[1559]: No configuration found. [ 13.692710] systemd[1538]: /usr/lib/systemd/system-generators/coreos-installer-generator failed with exit status 1. [ 13.697411] systemd[1538]: /usr/lib/systemd/system-generators/coreos-liveiso-autologin-generator failed with exit status 1. [ 13.698354] systemd[1538]: /usr/lib/systemd/system-generators/selinux-autorelabel-generator.sh failed with exit status 1. [ 13.702253] systemd[1538]: /usr/lib/systemd/system-generators/coreos-platform-chrony failed with exit status 1. [ 13.702994] systemd[1538]: /usr/lib/systemd/system-generators/kdump-dep-generator.sh failed with exit status 1. [ 13.704161] systemd[1538]: /usr/lib/systemd/system-generators/coreos-boot-mount-generator failed with exit status 1. ``` What's interesting is I don't see any denials around this point. Timing here is interesting because systemd generators run really early and I'm not sure of the interaction with the SELinux policy here. Though clearly something with SELinux is going on because we have pinned to the older version and the test is now behaving again.
For more information, what "ostree hotfix mode" means is basically that an overlay is added on top of /usr. This is done from the initramfs and so is already in place when we switchroot and systemd in the real root starts. It doesn't seem like the journal logs include anything after the reboot into the hotfix deployment.
So, probably the only change from that build that could have caused this is https://github.com/fedora-selinux/selinux-policy/pull/1475, which aims to restrict what the kernel_t domain is able to execute. The most likely explanation is that the systemd process that starts these generators (which notably has pid != 1 according to the console log) was forked from the pid 1 process before the policy was loaded (i.e. before Switch root) and didn't explicitly change its context to init_t (which systemd pid 1 does right after loading the policy, AFAIK), remaining with the kernel_t domain. I'm not sure how to verify this theory, though. Perhaps Michal could help us shed some light on this?
> After upgrading to the selinux-policy-38.2-1.fc38 rpm we have ran into a failing test on our rawhide branch. We have created an upstream ticket for this as well; here => https://github.com/coreos/fedora-coreos-tracker/issues/1364 . Does it mean the issue was not in place with selinux-policy-38.1-1.fc38? Can you try the latest rawhide build which is selinux-policy-38.4-1?
> Can you try the latest rawhide build which is selinux-policy-38.4-1? I've tested with selinux-policy-38.4-1 with the same result. > The most likely explanation is that the systemd process that starts these generators (which notably has pid != 1 according to the console log) was forked from the pid 1 process before the policy was loaded (i.e. before Switch root) and didn't explicitly change its context to init_t (which systemd pid 1 does right after loading the policy, AFAIK), remaining with the kernel_t domain. Looking at the systemd source, I think the pid in the logs is of systemd after fork() but before exec(), so it not being pid 1 that's logging the error would be expected. Re. the context change, again looking at the systemd source, I would've expected to see some error messages if it had failed to context switch: https://github.com/systemd/systemd/blob/f1c36537a935ef67e13fd84e04d94478bd99f2f1/src/core/selinux-setup.c#L70-L81.
(In reply to Dusty Mabe from comment #2) > The real problem we are seeing and why the test is failing is because all of > the systemd generators are failing. From the console attachment: > > ``` > [ 13.642326] systemd[1542]: Failed to execute > /usr/lib/systemd/system-generators/coreos-platform-chrony: Permission denied Do I understand correctly that only generators with init_exec_t SELinux type, while those with a private type succeed? > What's interesting is I don't see any denials around this point. Timing here > is interesting because systemd generators run really early and I'm not sure > of the interaction with the SELinux policy here. There are not even dmesg entries with avcs? Try to collect denials after daemon reload: systemctl daemon-reload ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts recent > Though clearly something with SELinux is going on because we have pinned to > the older version and the test is now behaving again. Reading it as selinux-policy-38.1-1.fc38 works, but selinux-policy-38.2-1.fc38 and later don't. Correct? Unfortunately I am unable to reproduce the issue. If I create my own debigging generator, nothing suspicious appears. Do you happen to know of some SELinux-related customizations? For instance, disabling the unconfined module?
I keep getting this particular denial for systemd-fstab-generator containing nfs entries: type=PROCTITLE msg=audit(01/04/2023 06:23:36.885:1592) : proctitle=/usr/lib/systemd/system-generators/systemd-fstab-generator /run/systemd/generator /run/systemd/generator.early /run/systemd/gene type=PATH msg=audit(01/04/2023 06:23:36.885:1592) : item=0 name= inode=98 dev=00:35 mode=dir,755 ouid=unknown(-2) ogid=unknown(-2) rdev=00:00 obj=system_u:object_r:nfs_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(01/04/2023 06:23:36.885:1592) : cwd=/ type=SYSCALL msg=audit(01/04/2023 06:23:36.885:1592) : arch=x86_64 syscall=newfstatat success=yes exit=0 a0=0x5 a1=0x7efe1b5b9dce a2=0x7fff9808a530 a3=0x1000 items=1 ppid=8950 pid=8964 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=systemd-fstab-g exe=/usr/lib/systemd/system-generators/systemd-fstab-generator subj=system_u:system_r:init_t:s0 key=(null) type=AVC msg=audit(01/04/2023 06:23:36.885:1592) : avc: denied { write } for pid=8964 comm=systemd-fstab-g scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=key permissive=0 type=AVC msg=audit(01/04/2023 06:23:36.885:1592) : avc: denied { write } for pid=8964 comm=systemd-fstab-g scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=key permissive=0 type=AVC msg=audit(01/04/2023 06:23:36.885:1592) : avc: denied { write } for pid=8964 comm=systemd-fstab-g scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=key permissive=0 type=AVC msg=audit(01/04/2023 06:23:36.885:1592) : avc: denied { write } for pid=8964 comm=systemd-fstab-g scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=key permissive=0 with this trace: systemd-fstab-g 8820 [000] 3194.621630: avc:selinux_audited: requested=0x4 denied=0x4 audited=0x4 result=-13 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=key ffffffff9273ff27 avc_audit_post_callback+0x207 ([kernel.kallsyms]) ffffffff9273ff27 avc_audit_post_callback+0x207 ([kernel.kallsyms]) ffffffff9276b37a common_lsm_audit+0x15a ([kernel.kallsyms]) ffffffff92740e0e slow_avc_audit+0x9e ([kernel.kallsyms]) ffffffff927417ec avc_has_perm+0xac ([kernel.kallsyms]) ffffffff9273e597 security_key_permission+0x37 ([kernel.kallsyms]) ffffffff9272d5dc request_key_and_link+0x53c ([kernel.kallsyms]) ffffffff9272da83 request_key_tag+0x43 ([kernel.kallsyms]) ffffffffc0991280 nfs_idmap_get_key+0x210 ([nfsv4]) ffffffffc0991e2d nfs_map_name_to_uid+0x15d ([nfsv4]) ffffffffc0984325 decode_getfattr_attrs+0xd45 ([nfsv4]) ffffffffc09847b9 decode_getfattr_generic.constprop.0+0x119 ([nfsv4]) ffffffffc04592c6 call_decode+0x246 ([sunrpc]) ffffffffc0476176 __rpc_execute+0xc6 ([sunrpc]) ffffffffc04768bc rpc_execute+0xdc ([sunrpc]) ffffffffc0459f5f rpc_run_task+0x15f ([sunrpc]) ffffffffc0961eaf nfs4_do_call_sync+0x5f ([nfsv4]) ffffffffc096200b _nfs4_proc_getattr+0x12b ([nfsv4]) ffffffffc096b6e9 nfs4_proc_getattr+0x79 ([nfsv4]) ffffffffc08cd6f5 __nfs_revalidate_inode+0xc5 ([nfs]) ffffffffc08cdede nfs_getattr+0x31e ([nfs]) ffffffff924a4242 vfs_statx+0xb2 ([kernel.kallsyms]) ffffffff924a4581 vfs_fstatat+0x51 ([kernel.kallsyms]) ffffffff924a484e __do_sys_newfstatat+0x2e ([kernel.kallsyms]) ffffffff9308d198 do_syscall_64+0x58 ([kernel.kallsyms]) ffffffff932000aa entry_SYSCALL_64_after_hwframe+0x72 ([kernel.kallsyms]) fd05e __fstatat+0xe (inlined) 18e905 chase_symlinks+0x5e5 (/usr/lib64/systemd/libsystemd-shared-252.4-598.fc38.so) 6ea6 parse_fstab+0x1a6 (/usr/lib/systemd/system-generators/systemd-fstab-generator) 4455 run_generator+0x2f5 (inlined) 4455 run+0x2f5 (inlined) 4455 main+0x2f5 (/usr/lib/systemd/system-generators/systemd-fstab-generator) 27a8f __libc_start_call_main+0x7f (/usr/lib64/libc.so.6) 27b48 __libc_start_main_alias_2+0x88 (inlined) 45f4 _start+0x24 (/usr/lib/systemd/system-generators/systemd-fstab-generator)
(In reply to Zdenek Pytela from comment #7) > Try to collect denials after daemon reload: > > systemctl daemon-reload > ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts recent Unfortunately, I can't even get to the prompt when this happens because too many things fail. That said, booting with enforcing=0 reveals a slew of AVC denials: ``` Jan 06 15:41:06 localhost audit[746]: AVC avc: denied { execute } for pid=746 comm="(direxec)" name="bash" dev="vda4" ino=1621433 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:shell_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[789]: AVC avc: denied { execute } for pid=789 comm="(mount)" name="mount" dev="vda4" ino=2896386 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:mount_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[798]: AVC avc: denied { execute } for pid=798 comm="(lvm)" name="lvm" dev="vda4" ino=2909812 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:lvm_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[802]: AVC avc: denied { execute } for pid=802 comm="(journald)" name="systemd-journald" dev="vda4" ino=1601655 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:syslogd_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[803]: AVC avc: denied { execute } for pid=803 comm="(les-load)" name="systemd-modules-load" dev="vda4" ino=2659388 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:systemd_modules_load_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[804]: AVC avc: denied { execute } for pid=804 comm="(enerator)" name="systemd-network-generator" dev="vda4" ino=1604044 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:systemd_network_generator_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[809]: AVC avc: denied { execute } for pid=809 comm="(emd-hwdb)" name="systemd-hwdb" dev="vda4" ino=2659328 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:systemd_hwdb_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[810]: AVC avc: denied { execute } for pid=810 comm="(d-sysctl)" name="systemd-sysctl" dev="vda4" ino=2658231 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:systemd_sysctl_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[813]: AVC avc: denied { execute } for pid=813 comm="(-userdbd)" name="systemd-userdbd" dev="vda4" ino=1601631 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:systemd_userdbd_exec_t:s0 tclass=file permissive=1 Jan 06 15:41:06 localhost audit[817]: AVC avc: denied { execute } for pid=817 comm="(tmpfiles)" name="systemd-tmpfiles" dev="vda4" ino=1596306 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:systemd_tmpfiles_exec_t:s0 tclass=file permissive=1 ``` That `kernel_t` scontext definitely looks off. And it seems to also be present in your systemd-fstab-generator example. > Unfortunately I am unable to reproduce the issue. If I create my own debigging generator, nothing suspicious appears. When trying this, are you turning `/usr` into an overlayfs? I think somehow that's the trigger to this. > Do you happen to know of some SELinux-related customizations? For instance, disabling the unconfined module? We don't do any major customizations. A few SELinux booleans but that's it (`container_use_cephfs` and `virt_use_samba`).
Testing with this systemd patch: ``` diff --git a/src/core/selinux-setup.c b/src/core/selinux-setup.c index 3f873baa91..68adbf3dc3 100644 --- a/src/core/selinux-setup.c +++ b/src/core/selinux-setup.c @@ -78,6 +78,7 @@ int mac_selinux_setup(bool *loaded_policy) { log_open(); if (r < 0) log_error("Failed to transition into init label '%s', ignoring.", label); + log_info("Set current SELinux context to %s.", label); } after_load = now(CLOCK_MONOTONIC); ``` I can see that systemd did successfully transition to init_t on switchroot: ``` Jan 06 15:41:06 localhost systemd[1]: Switching root. Jan 06 15:41:06 localhost systemd-journald[373]: Received SIGTERM from PID 1 (systemd). Jan 06 15:41:06 localhost kernel: SELinux: policy capability network_peer_controls=1 Jan 06 15:41:06 localhost kernel: SELinux: policy capability open_perms=1 Jan 06 15:41:06 localhost kernel: SELinux: policy capability extended_socket_class=1 Jan 06 15:41:06 localhost kernel: SELinux: policy capability always_check_network=0 Jan 06 15:41:06 localhost kernel: SELinux: policy capability cgroup_seclabel=1 Jan 06 15:41:06 localhost kernel: SELinux: policy capability nnp_nosuid_transition=1 Jan 06 15:41:06 localhost kernel: SELinux: policy capability genfs_seclabel_symlinks=1 Jan 06 15:41:06 localhost kernel: SELinux: policy capability ioctl_skip_cloexec=0 Jan 06 15:41:06 localhost systemd[1]: Set current SELinux context to system_u:system_r:init_t:s0. ``` All generators have the init_exec_t label (this is also run when in hotfix mode, i.e. with overlay on /usr): ``` [root@cosa-devsh ~]# ls -lZ /usr/lib/systemd/system-generators/* -r-xr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 4101 Jan 1 1970 /usr/lib/systemd/system-generators/coreos-boot-mount-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 1029 Jan 1 1970 /usr/lib/systemd/system-generators/coreos-installer-generator -r-xr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 3612 Jan 1 1970 /usr/lib/systemd/system-generators/coreos-liveiso-autologin-generator -r-xr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 3905 Jan 1 1970 /usr/lib/systemd/system-generators/coreos-platform-chrony -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 541 Jan 1 1970 /usr/lib/systemd/system-generators/kdump-dep-generator.sh -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 16304 Jan 1 1970 /usr/lib/systemd/system-generators/ostree-system-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 28728 Jan 1 1970 /usr/lib/systemd/system-generators/rpc-pipefs-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 747 Jan 1 1970 /usr/lib/systemd/system-generators/selinux-autorelabel-generator.sh -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 16200 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-bless-boot-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 41896 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-cryptsetup-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 20752 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-debug-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 46136 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-fstab-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 25128 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-getty-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 16656 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-hibernate-resume-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 25232 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-integritysetup-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 16128 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-rc-local-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 21176 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-run-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 16256 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-system-update-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 37488 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-sysv-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 33672 Jan 1 1970 /usr/lib/systemd/system-generators/systemd-veritysetup-generator -rwxr-xr-x. 3 root root system_u:object_r:init_exec_t:s0 962368 Jan 1 1970 /usr/lib/systemd/system-generators/zram-generator ```
Thank you Jonathan for the additional data. (In reply to Jonathan Lebon from comment #9) > (In reply to Zdenek Pytela from comment #7) > > Try to collect denials after daemon reload: > > > > systemctl daemon-reload > > ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts recent > > Unfortunately, I can't even get to the prompt when this happens because too > many things fail. That said, booting with enforcing=0 reveals a slew of AVC > denials: So far I understood there is a problem with generators only. So the system is unusable in selinux enforcing? In selinux permissive, are the generators executed correctly by daemon-reload? > That `kernel_t` scontext definitely looks off. And it seems to also be > present in your systemd-fstab-generator example. The fstab generator has been confirmed a different and unrelated issue, we managed to troubleshoot it. > > > Unfortunately I am unable to reproduce the issue. If I create my own debigging generator, nothing suspicious appears. > > When trying this, are you turning `/usr` into an overlayfs? I think somehow > that's the trigger to this. Not really, I was trying to reproduce any similar issue on a general rawhide system, not coreos. Can you give me a hint how to get to the setup you have? > > > Do you happen to know of some SELinux-related customizations? For instance, disabling the unconfined module? > > We don't do any major customizations. A few SELinux booleans but that's it > (`container_use_cephfs` and `virt_use_samba`). Should not play a role. (In reply to Jonathan Lebon from comment #10) > Testing with this systemd patch: > > ``` > diff --git a/src/core/selinux-setup.c b/src/core/selinux-setup.c > index 3f873baa91..68adbf3dc3 100644 > --- a/src/core/selinux-setup.c > +++ b/src/core/selinux-setup.c > @@ -78,6 +78,7 @@ int mac_selinux_setup(bool *loaded_policy) { > log_open(); > if (r < 0) > log_error("Failed to transition into init > label '%s', ignoring.", label); > + log_info("Set current SELinux context to %s.", > label); > } > > after_load = now(CLOCK_MONOTONIC); > ``` > > I can see that systemd did successfully transition to init_t on switchroot: That's good to have it confirmed. We now need to find out at which point generators are executed after boot. > ``` > > All generators have the init_exec_t label (this is also run when in hotfix > mode, i.e. with overlay on /usr): There are some with a different label, but it does not seem to be important now. You can anyway try to install cloud-init nfs-utils systemd-udev
I've uploaded an FCOS rawhide build with selinux-policy-38.4. You should be able to boot it using the following command: ``` curl -LO https://jlebon.fedorapeople.org/fedora-coreos-38.20221222.dev.2-qemu.x86_64.qcow2.xz unxz fedora-coreos-38.20221222.dev.2-qemu.x86_64.qcow2.xz curl -LO https://jlebon.fedorapeople.org/config-autologin.ign qemu-system-x86_64 -machine accel=kvm -cpu host -m 1G -smp 1 \ -drive if=virtio,file=fedora-coreos-38.20221222.dev.2-qemu.x86_64.qcow2,format=qcow2 -snapshot \ -nographic -fw_cfg name=opt/com.coreos/config,file=config-autologin.ign ``` This will drop you to a prompt on the serial console. Then to reproduce the issue: ``` ostree admin unlock --hotfix reboot ``` You can then e.g. interrupt the boot menu to add `enforcing=0`. If you'd like to build your own image with a custom selinux-policy build, you can follow the steps in https://coreos.github.io/coreos-assembler/building-fcos/ and https://coreos.github.io/coreos-assembler/working/#using-overrides. Normally you'd be able to just `rpm-ostree override replace`, but specifically for selinux-policy, there's a bug preventing that from working. (Feel free to drop by on the #fedora-coreos channel in Libera.Chat if you need any help.) Thanks for looking at this!
Thank you Jonathan for the scenario, it worked. I was not able, however, to reproduce any problem: no generators failing, no suspicious processes running in the kernel_t, init_t, or unconfined_* context. I was also unable to install audit because of mismatch of available versions.
Zdenek and I had a shared session today in which we were able to reproduce the issue and tried to debug this further. One theory was that somehow the label of the process after systemd does fork() reverted to kernel_t before it did the exec(). I've tested this theory with the following systemd patch: diff --git a/src/shared/exec-util.c b/src/shared/exec-util.c index f5283f9df4..bbcafbe0d3 100644 --- a/src/shared/exec-util.c +++ b/src/shared/exec-util.c @@ -6,6 +6,9 @@ #include <sys/types.h> #include <unistd.h> #include <stdio.h> +#if HAVE_SELINUX +#include <selinux/selinux.h> +#endif #include "alloc-util.h" #include "conf-files.h" @@ -70,6 +73,18 @@ static int do_spawn(const char *path, char *argv[], int stdout_fd, pid_t *pid, b } else argv[0] = (char*) path; +#if HAVE_SELINUX + { + char *con; + r = getcon_raw(&con); + if (r == 0 && con) { + log_info("Executing %s with current SELinux context %s.", path, con); + freecon(con); + } else + log_info("Failed to retrieve current SELinux context: %m"); + } +#endif + execv(path, argv); log_error_errno(errno, "Failed to execute %s: %m", path); _exit(EXIT_FAILURE); And this was the result: [ 3.824428] systemd[667]: Executing /usr/lib/systemd/system-generators/coreos-boot-mount-generator with current SELinux context system_u:system_r:init_t:s0. [ 3.824852] systemd[669]: Executing /usr/lib/systemd/system-generators/coreos-liveiso-autologin-generator with current SELinux context system_u:system_r:init_t:s0. [ 3.825094] systemd[670]: Executing /usr/lib/systemd/system-generators/coreos-platform-chrony with current SELinux context system_u:system_r:init_t:s0. [ 3.825589] systemd[669]: Failed to execute /usr/lib/systemd/system-generators/coreos-liveiso-autologin-generator: Permission denied [ 3.825625] systemd[670]: Failed to execute /usr/lib/systemd/system-generators/coreos-platform-chrony: Permission denied [ 3.826243] systemd[667]: Failed to execute /usr/lib/systemd/system-generators/coreos-boot-mount-generator: Permission denied ... Yet, looking at the AVC denials for e.g. pid 670: [root@cosa-devsh ~]# journalctl -b 0 _TRANSPORT=audit _PID=670 Jan 12 19:20:46 localhost audit[670]: AVC avc: denied { execute } for pid=670 comm="(direxec)" name="bash" dev="vda4" ino=2657021 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:shell_exec_t:s0 tclass=file permissive=1 Jan 12 19:20:46 localhost audit[670]: SYSCALL arch=c000003e syscall=59 success=yes exit=0 a0=a15730 a1=7ffdd752a440 a2=a14b30 a3=7ffdd7529385 items=1 ppid=665 pid=670 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0) It says the source context is kernel_t for some reason.
Example denial captured with full auditing enabled: type=PROCTITLE msg=audit(01/12/23 14:31:50.397:196) : proctitle=(chronyd) type=PATH msg=audit(01/12/23 14:31:50.397:196) : item=0 name=/proc/self/fd/3 inode=2847198 dev=00:1d mode=file,755 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:chronyd_exec_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(01/12/23 14:31:50.397:196) : cwd=/ type=SYSCALL msg=audit(01/12/23 14:31:50.397:196) : arch=x86_64 syscall=access success=yes exit=0 a0=0x7ffeedf37230 a1=X_OK a2=0x0 a3=0x7ffeedf36f77 items=1 ppid=1 pid=767 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=(chronyd) exe=/usr/lib/systemd/systemd subj=system_u:system_r:init_t:s0 key=(null) type=AVC msg=audit(01/12/23 14:31:50.397:196) : avc: denied { execute } for pid=767 comm=(chronyd) name=chronyd dev="vda4" ino=2847198 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:chronyd_exec_t:s0 tclass=file permissive=1
I think I finally know what's going on. The culprit here appears to be the additional permission check against the mounting task done by overlayfs [1]. Since the overlay is mounted before the SELinux policy is loaded, overlayfs remembers the mounter as system_u:system_r:kernel_t:s0 and checks against this context when doing the extra permission checks. I'm not sure if there is any viable to way to work around it :/ [1] https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#permission-model
(In reply to Ondrej Mosnacek from comment #16) > I think I finally know what's going on. The culprit here appears to be the > additional permission check against the mounting task done by overlayfs [1]. > Since the overlay is mounted before the SELinux policy is loaded, overlayfs > remembers the mounter as system_u:system_r:kernel_t:s0 and checks against > this context when doing the extra permission checks. > > I'm not sure if there is any viable to way to work around it :/ > > [1] > https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#permission- > model Ahh nice find. Is it exposed in userspace which label is being "remembered" for the mount? Also, do I understand correctly that this check always happened, but now with https://github.com/fedora-selinux/selinux-policy/pull/1475 it's being denied?
(In reply to Jonathan Lebon from comment #17) > Ahh nice find. Is it exposed in userspace which label is being "remembered" > for the mount? Not sure I understand the question - are you asking if userspace can change the label that ends up being used to represent the mounter? I'm afraid it can't when the mounting happens before the first policy load. Overlayfs just copies the creds of the mounting task at the moment the mount happens and then keeps it. And before the policy is loaded all processes can have only the kernel label (which becomes system_u:system_r:kernel_t:s0 once SELinux policy is loaded). The only way to fix the mount that I can see would be to delay mounting it to after the SELinux policy is loaded, but I'm not sure how viable that would be. > Also, do I understand correctly that this check always happened, but now > with https://github.com/fedora-selinux/selinux-policy/pull/1475 it's being > denied? Yes, I believe so. The point of the PR is to disallow the kernel to execute random executables, so it is in direct conflict with this situation :/ I recently had an idea about a small kernel feature that would happen to address this (effectively the mounter would be init_t instead of kernel_t), but it's probably not something I could get upstream quickly (and I can't guarantee upstream would even accept it). I have an unofficial plan to look into it by the end of this quarter, but it depends on how my other work will go. Until that materializes, we're left with either finding a workaround, leaving it broken, or reverting the kernel exec hardening (and deferring it for after we have the kernel feature).
(In reply to Ondrej Mosnacek from comment #18) > (In reply to Jonathan Lebon from comment #17) > > Ahh nice find. Is it exposed in userspace which label is being "remembered" > > for the mount? > > Not sure I understand the question - are you asking if userspace can change > the label that ends up being used to represent the mounter? I'm afraid it > can't when the mounting happens before the first policy load. Overlayfs just > copies the creds of the mounting task at the moment the mount happens and > then keeps it. And before the policy is loaded all processes can have only > the kernel label (which becomes system_u:system_r:kernel_t:s0 once SELinux > policy is loaded). Sorry, let me clarify. Is there a way to verify from userspace what are the copied creds of the overlayfs mount to sanity-check that it's indeed kernel_t (and we're not somehow hitting a different issue)? > The only way to fix the mount that I can see would be to delay mounting it > to after the SELinux policy is loaded, but I'm not sure how viable that > would be. I think that's possible, but it would make the hotfix mechanism significantly weaker since systemd (and any binary called/library loaded until we mount) would no longer come from the overlay. > > Also, do I understand correctly that this check always happened, but now > > with https://github.com/fedora-selinux/selinux-policy/pull/1475 it's being > > denied? > > Yes, I believe so. The point of the PR is to disallow the kernel to execute > random executables, so it is in direct conflict with this situation :/ > Until that materializes, we're left with either finding a workaround, leaving it broken, or reverting the kernel exec hardening (and deferring it for after we have the kernel feature). Hmm, it seems like we should distinguish between kernel_t before policy load (i.e. defaulted) and kernel_t after policy load (i.e. actual kernel workload). I wonder if the overlay code should be adapted so that it also remembers whether the policy was loaded at all at mount time, and if not, then SELinux should treat it like unconfined_t. After all, the main goal of the initramfs is to prepare the rootfs and in that capacity it needs sufficient privileges.
(In reply to Jonathan Lebon from comment #19) > (In reply to Ondrej Mosnacek from comment #18) > > (In reply to Jonathan Lebon from comment #17) > > > Ahh nice find. Is it exposed in userspace which label is being "remembered" > > > for the mount? > > > > Not sure I understand the question - are you asking if userspace can change > > the label that ends up being used to represent the mounter? I'm afraid it > > can't when the mounting happens before the first policy load. Overlayfs just > > copies the creds of the mounting task at the moment the mount happens and > > then keeps it. And before the policy is loaded all processes can have only > > the kernel label (which becomes system_u:system_r:kernel_t:s0 once SELinux > > policy is loaded). > > Sorry, let me clarify. Is there a way to verify from userspace what are the > copied creds of the overlayfs mount to sanity-check that it's indeed > kernel_t (and we're not somehow hitting a different issue)? Probably not without peeking into kernel internals through some tracing/debugging tool... I managed to get a backtrace of the denial using perf and the avc_selinux_audited tracepoint, but it doesn't have enough distinguishing information to tell if it happened in the mounter check or the regular check: (chronyd) 1318 [001] 112.523389: avc:selinux_audited: requested=0x4000 denied=0x4000 audited=0x4000 result=0 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:chronyd_exec_t:s0 tclass=file ffffffff8260dded avc_audit_post_callback+0x1ed ([kernel.kallsyms]) ffffffff8260dded avc_audit_post_callback+0x1ed ([kernel.kallsyms]) ffffffff82631d35 common_lsm_audit+0x155 ([kernel.kallsyms]) ffffffff8260eece slow_avc_audit+0x9e ([kernel.kallsyms]) ffffffff826121b2 audit_inode_permission+0x82 ([kernel.kallsyms]) ffffffff82616ee9 selinux_inode_permission+0x159 ([kernel.kallsyms]) ffffffff8260a6b7 security_inode_permission+0x37 ([kernel.kallsyms]) ffffffffc0672fbe ovl_permission+0x9e ([kernel.kallsyms]) ffffffff823d4e8c inode_permission+0x13c ([kernel.kallsyms]) ffffffff823c197d do_faccessat+0xbd ([kernel.kallsyms]) ffffffff82dd4158 do_syscall_64+0x58 ([kernel.kallsyms]) ffffffff82e0009b entry_SYSCALL_64_after_hwframe+0x63 ([kernel.kallsyms]) fdc2b __access+0xb (inlined) 18686c access_fd+0x5c (/usr/lib/systemd/libsystemd-shared-251.10-589.fc37.jl2.so) 1a4d86 [unknown] (/usr/lib/systemd/libsystemd-shared-251.10-589.fc37.jl2.so) 1a5079 find_executable_full+0x1b9 (/usr/lib/systemd/libsystemd-shared-251.10-589.fc37.jl2.so) 135449 [unknown] (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) ac1e9 exec_spawn+0xcf9 (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) f5f9f [unknown] (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) f82f5 [unknown] (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) f8a3c [unknown] (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) b66c1 [unknown] (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) b6a29 job_run_and_invalidate+0x2b9 (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) d7d53 [unknown] (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) 2519dc [unknown] (/usr/lib/systemd/libsystemd-shared-251.10-589.fc37.jl2.so) 251f1c sd_event_dispatch+0x10c (/usr/lib/systemd/libsystemd-shared-251.10-589.fc37.jl2.so) 253577 sd_event_run+0x117 (/usr/lib/systemd/libsystemd-shared-251.10-589.fc37.jl2.so) dc6ed manager_loop+0x4ad (/usr/lib/systemd/libsystemd-core-251.10-589.fc37.jl2.so) 9455 [unknown] (/usr/lib/systemd/systemd) 27a8f __libc_start_call_main+0x7f (/usr/lib64/libc.so.6) 27b48 __libc_start_main_alias_2+0x88 (inlined) b194 [unknown] (/usr/lib/systemd/systemd) Either way, I'm pretty convinced that the pinned mounter cred is the cause. It's very unlikely to be something else - it would have to be some crazy bug, while this explanation is basically the expected behavior. Anyway, I gave this some more thought and I realized that we actually could allow kernel_t the 'execute' permission on basically all files without breaking the goal of PR 1475, as long as we don't allow execute_no_trans and curate the set of permitted process transitions. So we can add something like 'allow kernel_t file_type:file mmap_execute_file_perms;' to cover any case of mounting overlayfs before the policy is loaded. Let me do a PR for this...
Opened a PR that should fix this: https://github.com/fedora-selinux/selinux-policy/pull/1574 Also requires a tweak to the tests: https://src.fedoraproject.org/tests/selinux/pull-request/358