Bug 480323

Summary:

RHEL 4.8 PTRACE_ATTACH failure after auditd start

Product:

Red Hat Enterprise Linux 4

Reporter:

Vivek Goyal <vgoyal>

Component:

kernel

Assignee:

Oleg Nesterov <onestero>

Status:

CLOSED WONTFIX

QA Contact:

Martin Jenner <mjenner>

Severity:

medium

Docs Contact:

Priority:

low

Version:

4.8

CC:

jan.kratochvil, jburke, roland

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-06-20 16:01:31 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
PTRACE_ATTACH reproducer.	none

Description Vivek Goyal 2009-01-16 14:20:19 UTC

Description of problem:

rhts localwatchdog hit because vsyscall test did not finish in time.

http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/kernel/syscalls/vsyscall&result=Warn&rwhiteboard=kernel%202.6.9-78.30.EL%20largesmp&arch=x86_64&jobids=42002

Version-Release number of selected component (if applicable):

2.6.9-78.30.EL

How reproducible:
I have seen it 2-3 times now during various rhts runs.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Oleg Nesterov 2009-01-20 19:00:44 UTC

I spent a lot of time trying to re-produce but failed. Because I know
nothing about gdb, looking at these logs I am not sure I understand what
really happens. I guess I have to dig into gdb's sources, but perhaps
Jan or Roland already have the answer.

What seem to happen is,
We have small.c:

        static int wait;

        static void handle_alrm (int signo)
        {
                if (wait)
                        for (;;)
                                pause ();
                kill (getpid(), SIGSEGV);
                abort ();
        }

        int main (int argc, char **argv)
        {
                struct itimerval itimerval;
                int i;

                wait = (argc > 1);
                signal (SIGALRM, handle_alrm);
                memset (&itimerval, 0, sizeof (itimerval));
                itimerval.it_value.tv_usec = 1000000 / 10;
                i = setitimer (ITIMER_REAL, &itimerval, NULL);
                assert (i == 0);
                pause ();
                abort ();
                return 0;
        }

we have gdbinit-maps1:
        
        set width 0
        set height 0
        gcore core.gcore
        quit

and runtest.sh does:

        ./small wait &
        PID=$!
        sleep 5
        gdb -silent --command=./gdbinit-maps1 ./small $PID

and, according to http://rhts.redhat.com/testlogs/42002/145415/1207053/current.log
the last command "hangs" and outputs:

        Using host libthread_db library "/lib64/tls/libthread_db.so.1".
        Attaching to program: /mnt/tests/kernel/syscalls/vsyscall/small, process 14579
        Redelivering pending Trace/breakpoint trap.
        Redelivering pending Trace/breakpoint trap.
        Program process 0 exited: Unknown signal 0 (terminated)

        /mnt/tests/kernel/syscalls/vsyscall/14579: No such file or directory.
        ./gdbinit-maps1:3: Error in sourced command file:
        You can't do that without a process to debug.
        (gdb)

this is why the test did not finish, gdb can't proceed and waits for the
input. Note that the tracee has really exited, there is no "small" process
in sysrq-t output.

What does this "Redelivering" mean? Google finds this patch:

        http://sourceware.org/ml/gdb-patches/2007-06/msg00059.html

Trace/breakpoint trap? strings `which gdb` shows this means SIGTRAP.

So. It looks like gdb does PTRACE_ATTACH, ptrace(PTRACE_CONT, SIGSTOP),
and gets WIFSTOPPED() == SIGTRAP ?

Currently, I don't see how this is possible. Perhaps gdb does something
strange. Will continue tomorrow, unless somebody knowledgeable can save
me from studying gdb's sources ;)


As for the small.c, I think it could be just

        int main()
        {
                pause();
        }

but again, I can't reproduce the problem, not sure.

Comment 2 Jan Kratochvil 2009-01-20 22:31:29 UTC

Message `Redelivering pending ...' was present in RHEL-5.2 and it was a bug in:
gdb-6.5-bz292971-attach-signalled-fix.patch

The defect-by-design of this message was found by Roland in:
  http://sourceware.org/ml/archer/2008-q3/msg00003.html

(I did mean it originally for SIGSTOP but it is wrong for other signals.)

This GDB defect is no longer present in RHEL-5.3.

Still going to find out how it can meet the SIGTRAP ("Trace/breakpoint trap") signal at all as small.c does not generate any SIGTRAP (it looks suspicious).

Comment 3 Jan Kratochvil 2009-02-01 13:59:47 UTC

Created attachment 330548 [details]
PTRACE_ATTACH reproducer.

You can check the vsyscall RHTS test had very predecessing tests on that host.
It is an interaction between auditd and kernel-ptrace.
IIRC according to Roland kernel switches the syscall enter/exit code after starting auditd to a slower path which can be undone only by a reset.
After starting auditd simple PTRACE_ATTACH generates SIGTRAP instead of SIGSTOP.
Curiously it may be reproducible only on this RHTS host.
Anyway it is unrelated to GDB therefore giving away this Bug.

HOSTNAME=dell-per905-01.rhts.bos.redhat.com
JOBID=44290
DISTRO=RHEL4-U8-re20090128.1
ARCHITECTURE=x86_64
# cat /proc/version
Linux version 2.6.9-80.ELlargesmp (mockbuild.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Fri Jan 23 16:39:07 EST 2009
# gcc -o attach-ok attach-ok.c -Wall -g
# ./attach-ok 
PASS - SIGSTOP
Optionally (it has no effect on the results): # setenforce 0
# /etc/init.d/auditd start
Starting auditd: [  OK  ]
# ./attach-ok 
FAIL - SIGTRAP
# /etc/init.d/auditd stop
Stopping auditd: [  OK  ]
# ./attach-ok 
FAIL - SIGTRAP
# rpm --qf '%{name}-%{version}-%{release}.%{arch}\n' -q audit kernel-largesmp
audit-1.0.16-4.el4.x86_64
kernel-largesmp-2.6.9-80.EL.x86_64
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       57G  1.4G   53G   3% /
/dev/sda1              99M   15M   80M  16% /boot
none                  3.9G     0  3.9G   0% /dev/shm


Original RHTS test had these messages there but according to my tests the disk-full state is not required for the `attach-ok' reproducer:
messages.gz:
Jan 15 18:48:28 dell-per905-01 rhts: /mnt/tests/kernel/security/audit/audit-test /
...
Jan 15 18:52:27 dell-per905-01 auditd[11029]: Audit daemon has no space left on logging partition
Jan 15 18:52:27 dell-per905-01 auditd[11029]: The audit daemon is now halting the system due to no space left on logging partition
Jan 15 18:52:27 dell-per905-01 auditd[11029]: Record was not written to disk (No space left on device)
Jan 15 18:52:27 dell-per905-01 auditd[11029]: write: Audit daemon detected an error writing an event to disk (No space left on device)

Comment 4 Roland McGrath 2009-02-07 02:36:31 UTC

I can't see how the RHEL4 code might produce that SIGTRAP.  Jan mentioned maybe this doesn't reproduce on the same kernel on all machines.  If that's so, it's especially weird and the distinguishing factor should be figured out.  But some hardware weirdness is almost easier to believe than a plain bug off hand since I really can't see where it would come from in the code we have.

Comment 5 Jiri Pallich 2012-06-20 16:01:31 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.