Description of problem: Update systemd from 254.5-2.fc40 to 255~rc2-1.fc40 version SELinux is preventing systemd-userdbd from map_read, map_write access on the bpf labeled init_t. ***** Plugin catchall (100. confidence) suggests ************************** If you believe that systemd-userdbd should be allowed map_read map_write access on bpf labeled init_t by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'systemd-userdbd' --raw | audit2allow -M my-systemduserdbd # semodule -X 300 -i my-systemduserdbd.pp Additional Information: Source Context system_u:system_r:systemd_userdbd_t:s0 Target Context system_u:system_r:init_t:s0 Target Objects Unknown [ bpf ] Source systemd-userdbd Source Path systemd-userdbd Port <Unknown> Host (removed) Source RPM Packages Target RPM Packages SELinux Policy RPM selinux-policy-targeted-40.5-1.fc40.noarch Local Policy RPM selinux-policy-targeted-40.5-1.fc40.noarch Selinux Enabled True Policy Type targeted Enforcing Mode Permissive Host Name (removed) Platform Linux (removed) 6.7.0-0.rc2.22.fc40.x86_64+debug #1 SMP PREEMPT_DYNAMIC Mon Nov 20 14:05:16 UTC 2023 x86_64 Alert Count 1 First Seen 2023-11-22 01:55:20 +05 Last Seen 2023-11-22 01:55:20 +05 Local ID 29583649-9c34-4a40-baf9-6e29e99bfee3 Raw Audit Messages type=AVC msg=audit(1700600120.484:1623): avc: denied { map_read map_write } for pid=540775 comm="systemd-userdbd" scontext=system_u:system_r:systemd_userdbd_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=bpf permissive=1 Hash: systemd-userdbd,systemd_userdbd_t,init_t,bpf,map_read,map_write Version-Release number of selected component: selinux-policy-targeted-40.5-1.fc40.noarch Additional info: reporter: libreport-2.17.11 reason: SELinux is preventing systemd-userdbd from map_read, map_write access on the bpf labeled init_t. package: selinux-policy-targeted-40.5-1.fc40.noarch component: selinux-policy hashmarkername: setroubleshoot type: libreport kernel: 6.7.0-0.rc2.22.fc40.x86_64+debug comment: Update systemd from 254.5-2.fc40 to 255~rc2-1.fc40 version component: selinux-policy
Created attachment 2000749 [details] File: description
Created attachment 2000750 [details] File: os_info
Looks like every service now requires bpf: ---- type=PROCTITLE msg=audit(11/22/2023 02:38:53.547:100) : proctitle=/usr/sbin/sshd -D type=PATH msg=audit(11/22/2023 02:38:53.547:100) : item=1 name=/lib64/ld-linux-x86-64.so.2 inode=139475 dev=fc:02 mode=file,755 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:ld_so_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 type=PATH msg=audit(11/22/2023 02:38:53.547:100) : item=0 name=/usr/sbin/sshd inode=162518 dev=fc:02 mode=file,755 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:sshd_exec_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(11/22/2023 02:38:53.547:100) : cwd=/ type=EXECVE msg=audit(11/22/2023 02:38:53.547:100) : argc=2 a0=/usr/sbin/sshd a1=-D type=SYSCALL msg=audit(11/22/2023 02:38:53.547:100) : arch=x86_64 syscall=execve success=yes exit=0 a0=0x55e1896b9a90 a1=0x55e1896b9b30 a2=0x55e1896b98b0 a3=0x55e1896b9bc0 items=2 ppid=1 pid=726 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=sshd exe=/usr/sbin/sshd subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 key=(null) type=AVC msg=audit(11/22/2023 02:38:53.547:100) : avc: denied { map_read map_write } for pid=726 comm=sshd scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:system_r:init_t:s0 tclass=bpf permissive=0
*** Bug 2250947 has been marked as a duplicate of this bug. ***
*** Bug 2250933 has been marked as a duplicate of this bug. ***
*** Bug 2250932 has been marked as a duplicate of this bug. ***
*** Bug 2251042 has been marked as a duplicate of this bug. ***
This looks like systemd is failing to close some BPF map/prog file descriptor(s) before executing services (O_CLOEXEC?). See also: https://pagure.io/fedora-ci/general/issue/447
---- type=PROCTITLE msg=audit(11/23/2023 05:28:41.166:95) : proctitle=/usr/sbin/sshd -D type=PATH msg=audit(11/23/2023 05:28:41.166:95) : item=1 name=/lib64/ld-linux-x86-64.so.2 inode=139475 dev=fc:02 mode=file,755 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:ld_so_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 type=PATH msg=audit(11/23/2023 05:28:41.166:95) : item=0 name=/usr/sbin/sshd inode=162518 dev=fc:02 mode=file,755 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:sshd_exec_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(11/23/2023 05:28:41.166:95) : cwd=/ type=EXECVE msg=audit(11/23/2023 05:28:41.166:95) : argc=2 a0=/usr/sbin/sshd a1=-D type=SYSCALL msg=audit(11/23/2023 05:28:41.166:95) : arch=x86_64 syscall=execve success=yes exit=0 a0=0x55ec16ae5b10 a1=0x55ec16ae5bb0 a2=0x55ec16ae58b0 a3=0x55ec16ae5c40 items=2 ppid=1 pid=742 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=sshd exe=/usr/sbin/sshd subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 key=(null) type=AVC msg=audit(11/23/2023 05:28:41.166:95) : avc: denied { map_read map_write } for pid=742 comm=sshd scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:system_r:init_t:s0 tclass=bpf permissive=0 ---- ---- type=PROCTITLE msg=audit(11/23/2023 05:27:02.228:381) : proctitle=/usr/bin/mandb -q type=PATH msg=audit(11/23/2023 05:27:02.228:381) : item=1 name=/lib64/ld-linux-x86-64.so.2 inode=139475 dev=fc:02 mode=file,755 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:ld_so_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 type=PATH msg=audit(11/23/2023 05:27:02.228:381) : item=0 name=/usr/bin/mandb inode=162560 dev=fc:02 mode=file,755 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:mandb_exec_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(11/23/2023 05:27:02.228:381) : cwd=/ type=EXECVE msg=audit(11/23/2023 05:27:02.228:381) : argc=2 a0=/usr/bin/mandb a1=-q type=SYSCALL msg=audit(11/23/2023 05:27:02.228:381) : arch=x86_64 syscall=execve success=yes exit=0 a0=0x55f5f879b990 a1=0x55f5f8791900 a2=0x55f5f87872c0 a3=0x55f5f8791900 items=2 ppid=2036 pid=2039 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=mandb exe=/usr/bin/mandb subj=system_u:system_r:mandb_t:s0 key=(null) type=AVC msg=audit(11/23/2023 05:27:02.228:381) : avc: denied { map_read map_write } for pid=2039 comm=mandb scontext=system_u:system_r:mandb_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=bpf permissive=0 mandb 2039 [000] 5054.237000: avc:selinux_audited: requested=0x6 denied=0x6 audited=0x6 resul> ffffffff8e70ad35 avc_audit_post_callback+0x205 ([kernel.kallsyms]) ffffffff8e70ad35 avc_audit_post_callback+0x205 ([kernel.kallsyms]) ffffffff8e733d6f common_lsm_audit+0x2af ([kernel.kallsyms]) ffffffff8e70bf4c slow_avc_audit+0xbc ([kernel.kallsyms]) ffffffff8e70c7c1 avc_has_perm+0xc1 ([kernel.kallsyms]) ffffffff8e70e1b8 file_has_perm+0xa8 ([kernel.kallsyms]) ffffffff8e7120b4 match_file+0x34 ([kernel.kallsyms]) ffffffff8e49b621 iterate_fd+0x61 ([kernel.kallsyms]) ffffffff8e7101b9 selinux_bprm_committing_creds+0xf9 ([kernel.kallsyms]) ffffffff8e705833 security_bprm_committing_creds+0x23 ([kernel.kallsyms]) ffffffff8e47cda5 begin_new_exec+0x6b5 ([kernel.kallsyms]) ffffffff8e4fef3d load_elf_binary+0x2bd ([kernel.kallsyms]) ffffffff8e47ac34 bprm_execve+0x294 ([kernel.kallsyms]) ffffffff8e47c34d do_execveat_common.isra.0+0x1ad ([kernel.kallsyms]) ffffffff8e47d236 __x64_sys_execve+0x36 ([kernel.kallsyms]) ffffffff8eff0461 do_syscall_64+0x61 ([kernel.kallsyms]) ffffffff8f2000ea entry_SYSCALL_64_after_hwframe+0x6e ([kernel.kallsyms])
The BPF FD 'bpf_outer_map_fd' is 'special' in some imperscrutable way, and cannot be closed without raising asserts. It needs to be looked at by somebody with an advanced understanding of the kernel's BPF internals. Until that happens, yes the access will need to be allowed.
@asavkov do you have ideas if this is an issue on the BPF side? Thank you in advance.
A possible option to reduce impact could be to serialize the bpf_outer_map_fd over to sd-executor only when some other option is enabled, however I don't really use any of that bpf filtering options, so I would feel more comfortable if someone who understood them tested that stuff still works after doing such a change
(In reply to Andrei Stepanov from comment #11) > @asavkov do you have ideas if this is an issue on the BPF side? it is not > Thank you in advance. you are welcome
(In reply to Luca Boccassi from comment #10) > The BPF FD 'bpf_outer_map_fd' is 'special' in some imperscrutable way, and > cannot be closed without raising asserts. What asserts? Can't systemd do fcntl(bpf_outer_map_fd, F_SETFD, O_CLOEXEC) before executing the service binary? (I presume it has to temporarily unset the flag in order to pass the fd from the daemon to the executor.) Sorry if I'm asking dumb questions - I don't know much about systemd internals, just trying to understand the problem...
The BPF FD is somehow 'special', and closing it in the child broke the parent's copy too, so things start to fail left and right. It's especially difficult as this feature has no tests. This is a purely speculative change that makes it pass the FD over only if the feature is actually enabled for the service, which should help reduce the impact: https://github.com/systemd/systemd/pull/30170
*** Bug 2251302 has been marked as a duplicate of this bug. ***
(In reply to Luca Boccassi from comment #15) > The BPF FD is somehow 'special', and closing it in the child broke the > parent's copy too, so things start to fail left and right. It's especially > difficult as this feature has no tests. That smells of a kernel bug that you are just papering over... (Maybe it's more tricky and you really need to hold that fd, but so far I don't understand why.) I just tried building systemd with the below patch and it made the SELinux denials go away, while not producing any visible errors. I even tried adding RestrictFileSystems= to one of the services (and restarting it a couple of times) and still no problems seen (and the filesystem blocking worked when I specified a too narrow set). Is there something else needed to reproduce the problem? Maybe it has been fixed on the kernel side in the meantime? diff --git a/src/core/execute-serialize.c b/src/core/execute-serialize.c index 342883994a..5bc903082a 100644 --- a/src/core/execute-serialize.c +++ b/src/core/execute-serialize.c @@ -1637,6 +1637,11 @@ static int exec_parameters_deserialize(ExecParameters *p, FILE *f, FDSet *fds) { if (fd < 0) continue; + /* DEBUG */ + r = fd_cloexec(fd, true); + if (r < 0) + return r; + p->bpf_outer_map_fd = fd; } else if ((val = startswith(l, "exec-parameters-notify-socket="))) { r = free_and_strdup(&p->notify_socket, val);
I have only vague memories as it was a few months ago, as mentioned I don't really use BPF anywhere so I cannot say for sure. If you are confident after that change the BPF features are still functioning, then please do send a PR on Github. Please bear in mind there are no tests for those BPF features so it has to be validated manually.
*** Bug 2252117 has been marked as a duplicate of this bug. ***
ping, any progress on this? the failing 'installability' CI jobs are inconveniencing quite a few folks. Since the kernel was mentioned, let's CC jforbes...
Solved in main, will be in RC4 as soon as we tag it
well hey, that sounds like backport time to me! thanks.
Should be fixed by https://bodhi.fedoraproject.org/updates/FEDORA-2023-32e53ae9b1 , please test.
systemd-255~rc3-4.fc40 fixes the issue for me on one bare-metal and 4 virtual systems.
Great, let's call it fixed. If anyone still has issues, yell and we can reopen. Build will be in the next Rawhide compose.