Bug 480323
Summary: | RHEL 4.8 PTRACE_ATTACH failure after auditd start | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Vivek Goyal <vgoyal> | ||||
Component: | kernel | Assignee: | Oleg Nesterov <onestero> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 4.8 | CC: | jan.kratochvil, jburke, roland | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-20 16:01:31 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Vivek Goyal
2009-01-16 14:20:19 UTC
I spent a lot of time trying to re-produce but failed. Because I know nothing about gdb, looking at these logs I am not sure I understand what really happens. I guess I have to dig into gdb's sources, but perhaps Jan or Roland already have the answer. What seem to happen is, We have small.c: static int wait; static void handle_alrm (int signo) { if (wait) for (;;) pause (); kill (getpid(), SIGSEGV); abort (); } int main (int argc, char **argv) { struct itimerval itimerval; int i; wait = (argc > 1); signal (SIGALRM, handle_alrm); memset (&itimerval, 0, sizeof (itimerval)); itimerval.it_value.tv_usec = 1000000 / 10; i = setitimer (ITIMER_REAL, &itimerval, NULL); assert (i == 0); pause (); abort (); return 0; } we have gdbinit-maps1: set width 0 set height 0 gcore core.gcore quit and runtest.sh does: ./small wait & PID=$! sleep 5 gdb -silent --command=./gdbinit-maps1 ./small $PID and, according to http://rhts.redhat.com/testlogs/42002/145415/1207053/current.log the last command "hangs" and outputs: Using host libthread_db library "/lib64/tls/libthread_db.so.1". Attaching to program: /mnt/tests/kernel/syscalls/vsyscall/small, process 14579 Redelivering pending Trace/breakpoint trap. Redelivering pending Trace/breakpoint trap. Program process 0 exited: Unknown signal 0 (terminated) /mnt/tests/kernel/syscalls/vsyscall/14579: No such file or directory. ./gdbinit-maps1:3: Error in sourced command file: You can't do that without a process to debug. (gdb) this is why the test did not finish, gdb can't proceed and waits for the input. Note that the tracee has really exited, there is no "small" process in sysrq-t output. What does this "Redelivering" mean? Google finds this patch: http://sourceware.org/ml/gdb-patches/2007-06/msg00059.html Trace/breakpoint trap? strings `which gdb` shows this means SIGTRAP. So. It looks like gdb does PTRACE_ATTACH, ptrace(PTRACE_CONT, SIGSTOP), and gets WIFSTOPPED() == SIGTRAP ? Currently, I don't see how this is possible. Perhaps gdb does something strange. Will continue tomorrow, unless somebody knowledgeable can save me from studying gdb's sources ;) As for the small.c, I think it could be just int main() { pause(); } but again, I can't reproduce the problem, not sure. Message `Redelivering pending ...' was present in RHEL-5.2 and it was a bug in: gdb-6.5-bz292971-attach-signalled-fix.patch The defect-by-design of this message was found by Roland in: http://sourceware.org/ml/archer/2008-q3/msg00003.html (I did mean it originally for SIGSTOP but it is wrong for other signals.) This GDB defect is no longer present in RHEL-5.3. Still going to find out how it can meet the SIGTRAP ("Trace/breakpoint trap") signal at all as small.c does not generate any SIGTRAP (it looks suspicious). Created attachment 330548 [details]
PTRACE_ATTACH reproducer.
You can check the vsyscall RHTS test had very predecessing tests on that host.
It is an interaction between auditd and kernel-ptrace.
IIRC according to Roland kernel switches the syscall enter/exit code after starting auditd to a slower path which can be undone only by a reset.
After starting auditd simple PTRACE_ATTACH generates SIGTRAP instead of SIGSTOP.
Curiously it may be reproducible only on this RHTS host.
Anyway it is unrelated to GDB therefore giving away this Bug.
HOSTNAME=dell-per905-01.rhts.bos.redhat.com
JOBID=44290
DISTRO=RHEL4-U8-re20090128.1
ARCHITECTURE=x86_64
# cat /proc/version
Linux version 2.6.9-80.ELlargesmp (mockbuild.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Fri Jan 23 16:39:07 EST 2009
# gcc -o attach-ok attach-ok.c -Wall -g
# ./attach-ok
PASS - SIGSTOP
Optionally (it has no effect on the results): # setenforce 0
# /etc/init.d/auditd start
Starting auditd: [ OK ]
# ./attach-ok
FAIL - SIGTRAP
# /etc/init.d/auditd stop
Stopping auditd: [ OK ]
# ./attach-ok
FAIL - SIGTRAP
# rpm --qf '%{name}-%{version}-%{release}.%{arch}\n' -q audit kernel-largesmp
audit-1.0.16-4.el4.x86_64
kernel-largesmp-2.6.9-80.EL.x86_64
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
57G 1.4G 53G 3% /
/dev/sda1 99M 15M 80M 16% /boot
none 3.9G 0 3.9G 0% /dev/shm
Original RHTS test had these messages there but according to my tests the disk-full state is not required for the `attach-ok' reproducer:
messages.gz:
Jan 15 18:48:28 dell-per905-01 rhts: /mnt/tests/kernel/security/audit/audit-test /
...
Jan 15 18:52:27 dell-per905-01 auditd[11029]: Audit daemon has no space left on logging partition
Jan 15 18:52:27 dell-per905-01 auditd[11029]: The audit daemon is now halting the system due to no space left on logging partition
Jan 15 18:52:27 dell-per905-01 auditd[11029]: Record was not written to disk (No space left on device)
Jan 15 18:52:27 dell-per905-01 auditd[11029]: write: Audit daemon detected an error writing an event to disk (No space left on device)
I can't see how the RHEL4 code might produce that SIGTRAP. Jan mentioned maybe this doesn't reproduce on the same kernel on all machines. If that's so, it's especially weird and the distinguishing factor should be figured out. But some hardware weirdness is almost easier to believe than a plain bug off hand since I really can't see where it would come from in the code we have. Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |