Bug 232371
Summary: | SELinux: problem debugging bind. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dave Jones <davej> | ||||||||
Component: | kernel | Assignee: | Eric Paris <eparis> | ||||||||
Status: | CLOSED UPSTREAM | QA Contact: | Ben Levenson <benl> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | rawhide | CC: | cagney, eparis, jan.kratochvil, ma, notting, pfrields, roland, sdsmall | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-10-25 20:18:14 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 256361 | ||||||||||
Attachments: |
|
Description
Dave Jones
2007-03-15 01:15:44 UTC
I easily reproduced the same gdb lossage. When I quit gdb so it detached, the named process was stopped and responded normally to kill -9 or -CONT. Can you give more details about this "weird state" (ps, /proc/PID/status, details of behavior when you try kills of various sorts, etc)? That was on my development kernel build (which is PREEMPT among other things); on 2.6.20-1.2925.fc6 it didn't even leave it stopped, it was just working completely fine (answered a query from dig). I'll let Jan figure out what happened that confused gdb. That might be a ptrace misbehavior, but I'll have to know what results it delivered to userland to say. Created attachment 150130 [details]
Testcase NOT reproducing anything.
On kernel-2.6.20-1.2925.fc6.x86_64 (and kernel-2.6.18-1.2798.fc6.x86_64) I get
from GDB (FC6 or RawHide):
ptrace(PTRACE_ATTACH, 6816, 0, 0) = 0
wait4(6816, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 6816
ptrace(PTRACE_ATTACH, 6822, 0, 0) = 0
wait4(6822, 0x7fffbb94163c, 0, NULL) = -1 ECHILD (No child processes)
wait4(6822, 0x7fffbb94163c, __WCLONE, NULL) = -1 ECHILD (No child processes)
The last wait4() should be successful. Unfortunaly the attached testcase
works:
ptrace(PTRACE_ATTACH, 8582, 0, 0) = 0
wait4(8582, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 8582
ptrace(PTRACE_ATTACH, 8583, 0, 0) = 0
wait4(8583, 0x7fff8a10cfac, 0, NULL) = -1 ECHILD (No child processes)
wait4(8583, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WCLONE, NULL) = 8583
It is IMO a kernel bug although I am not aware of some abvious reproducibility
case.
GREPped strace(1) dump of GDB:
ptrace(PTRACE_ATTACH, 2994, 0, 0) = 0
wait4(2994, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 2994
ptrace(PTRACE_GETREGS, 2994, 0, 0x7fff9149a130) = 0
ptrace(0x4200 /* PTRACE_??? */, 2994, 0, 0x2) = 0
wait4(3009, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 3009
ptrace(0x4200 /* PTRACE_??? */, 3009, 0, 0x2) = 0
ptrace(0x4200 /* PTRACE_??? */, 3009, 0, 0x22) = 0
ptrace(PTRACE_CONT, 3009, 0, SIG_0) = 0
wait4(3009, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP} | 0x10000], 0, NULL) =
3009
ptrace(0x4201 /* PTRACE_??? */, 3009, 0, 0x7fff9149a7b8) = 0
wait4(3010, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 3010
ptrace(PTRACE_KILL, 3010, 0, 0) = 0
wait4(3010, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 3010
ptrace(PTRACE_KILL, 3009, 0, 0) = 0
wait4(3009, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3009
ptrace(0x4200 /* PTRACE_??? */, 2994, 0, 0x3e) = 0
ptrace(PTRACE_ATTACH, 3000, 0, 0) = 0
wait4(3000, 0x7fff91499adc, 0, NULL) = -1 ECHILD (No child processes)
wait4(3000, 0x7fff91499adc, __WCLONE, NULL) = -1 ECHILD (No child processes)
Roland, I am sorry but on http://porkchop.devel.redhat.com/brewroot/packages/kernel/2.6.20/1.2928.rm1.fc6/x86_64/kernel-2.6.20-1.2928.rm1.fc6.x86_64.rpm this bug is still present the same way. FYI the bug is not reproducible on kernel-xen-2.6.19-1.2898.2.3.fc7.i686 although I did not test if it is x86_64 (vs. i686) related at all. Created attachment 150183 [details]
gdb-6.6-5 debugging patch I did not intend to ever publish.
According to Roland's analysis it is caused by selinux's AVC: avc: denied { signal } for pid=2803 comm="gdb" scontext=root:system_r:named_t:s0 tcontext=root:system_r:unconfined_t:s0-s0:c0.c1023 tclass=process It is right GDB should fail. Just it should provide a more meaningful error message to the user, according to Roland. Going to patch GDB appropriately. Committed to RawHide; still open as waiting to be pushed for FC6 later: * Thu Mar 15 2007 Jan Kratochvil <jan.kratochvil> - 6.6-7 - Suggest SELinux permissions problem; no assertion failure anymore (BZ 232371). There is at least an selinux policy bug here, and perhaps it should be considered a kernel bug (in the selinux code). selinux's security_task_wait chooses what AV to check based on child->exit_signal, which is -1 in NPTL threads being waited for by ptrace. Since such threads also send SIGCHLD via ptrace rather than the bogus signal -1, perhaps it should be checking against SIGCHLD if ptraced. But it makes more sense to me for the wait check always to be PROCESS__SIGCHLD rather than PROCESS__SIGNAL, even for a process using a strange exit_signal. Verified it behaves the same way using a vanilla upstream kernel. Sorry, Roland, but I am really not going to try judge selinux myself. selinux: As far as I understand Comment 7 the waitpid(2) execution should not be restricted so much as sending a signal is. I am not sure this is an SELinux problem. Being able to sens a signal from a confined app to an unconfined app is bad. So either this should be moved to gdb or I am going to close this. In order to debug this you would either need to run named outside of a locked down more, put the machine in permissive mode or add a local policy customization. SELinux core bug: PROCESS__SIGNAL instead of PROCESS__SIGCHLD vector used even though SIGCHLD is the signal actually sent. Policy bug: policy permits ptrace on named, but does not permit the waits on named that are necessary for ptrace to be usable. This is internally inconsistent. So you are saying this is a kernel bug? I've said two things seem wrong. Perhaps there is no policy preventing the wait once the kernel is changed to use PROCESS__SIGCHLD for that permission check. However, a debugger also needs to use kill sometimes. It makes no sense to have a policy that permits ptrace but denies kill. gdm is running unconfined_t so it can send the kill signal to bind_t, The problem reported above was named_t sending a signal to unconfined_t. I see. In that case, it looks like policy is OK and there is only the kernel bug. Indeed, when I try a kernel changed to test the PROCESS__SIGCHLD av instead, the test scenario works fine. This should be reassigned to whoever handles SELinux issues in the kernel. I think we also want a RHEL5 clone of this bug. Created attachment 150546 [details]
kernel patch, maybe not exactly right
This is the patch I experimented with. Note this code will look different in
an upstream kernel without utrace (i.e. vs FC-6 and RHEL-5). This gets it
closer to testing the signal that actually gets sent, but it still does not
match in all cases. For example, all stops send SIGCHLD and not exit_signal,
so it could check p->exit_state or something. To me, it really does not make
sense to restrict the wait call based on the exit_signal setting. I understand
the rationale that the non-SIGCHLD signal will not have been sent if policy
denied it, but denying wait doesn't make the child not exist or no longer need
to be waited for. This should be resolved upstream with SELinux folks, please
CC me on that email.
Possibly we should just alway check SIGCHLD on task_wait and leave the precise signal checking to the task_kill hook only. Logically policy should only allow the parent to wait on the child if the child is also allowed to deliver its exit signal to the parent for notification, so there was no reason to introduce a separate wait permission from parent to child; we could just apply the signal-based check from child to parent. Likely requires reparenting the child to init in the denied case too. Should be fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=73243284463a761e04d69d22c7516b2be7de096c please confirm. For James Morris: (In reply to comment #18) > Should be fixed by > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=73243284463a761e04d69d22c7516b2be7de096c Tried to build patched kernel-2.6.20-1.2954.fc6.x86_64 but no change seen, kernel available incl. its .src.rpm at: /mnt/brew/scratch/jkratoch/task_800249 The process / first task: ptrace(PTRACE_ATTACH, 2532, 0, 0) = 0 wait4(2532, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 2532 The first (second) thread: write(1, "[New Thread 1115699520 (LWP 2536)]\n", 35) = 35 ptrace(PTRACE_ATTACH, 2536, 0, 0) = 0 wait4(2536, 0x7fff0bfc53e4, 0, NULL) = -1 ECHILD (No child processes) wait4(2536, 0x7fff0bfc53e4, __WCLONE, NULL) = -1 ECHILD (No child processes) ... write(1, "../../gdb/linux-nat.c:1002: internal-error: lin_lwp_attach_lwp: Assertion `pid == GET_LWP (ptid) && WIFSTOPPED (status) && WSTOPSIG (status)\' failed.\n", 150) = 150 kernel: audit(1180545694.371:12): avc: denied { signal } for pid=2544 comm="gdb" scontext=root:system_r:named_t:s0 tcontext=root:system_r:unconfined_t:s0-s0:c0.c1023 tclass=process Not sure if I miss something but it still gives ECHILD. I just tried this on a vanilla upstream kernel du jour. The second wait4 (with __WCLONE) returned EACCES, not ECHILD. So in the last couple of days this has bitten me and a couple other people. I'll take another look. *** Bug 326801 has been marked as a duplicate of this bug. *** A patch based on comment #16 solved this problem with a 2.6.23 kernel. Does anyone have thoughts if we want to always check SIGCHLD or follow this patch and just check SIGCHLD when we know that's what it will be? I'd favor keeping it simple and consistent - always check PROCESS__SIGCHLD in task_wait. A fix has been committed in F-8 kernels and it has been accepted into linus's kernel. |