Description of problem:
While runnning long regression test with latest oracle database release
(11.1.0) found that many processes hungs in T state when Oracle is taking
diag. using gdb in RHEL 3 U8 beta kernel (2.4.21-43.ELhugemem).
This results in higher memory usage including swap, evantually leading to OOM
killing of Oracle server processes. This was not happening in RHEL 3 U7
Here is kernel stack for hung process.
gdb T 43300000 4 1334 1 1324 (NOTLB)
Call Trace: [<02137f64>] get_signal_to_deliver [kernel] 0xd4 (0x432ffef0)
[<0210c6d0>] do_signal [kernel] 0x0 (0x432fff1c)
[<0210c734>] do_signal [kernel] 0x64 (0x432fff20)
[<02163b8c>] put_user_size [kernel] 0x3c (0x432fff80)
[<02123ac0>] schedule_tail [kernel] 0xa0 (0x432fff9c)
Version-Release number of selected component (if applicable):
everytime an oracle process is attached by gdb
Steps to Reproduce:
1.install oracle 10g 10201 on RHEL3U8beta kernel
2. using gdb , debug a process and take backtrace
gdb session hang
gdb session should quit when ran pstack pid
startup oracle 11.1.0.
pstack any oracle process. (example - ora_q000_oastoltp process)
pstack command will hang as above.
Workaround is to 'kill -CONT pid' of all the processes that are stuck in 'T'
Verified that this hang also occurs on production version of 10.2.0.2 with
the same beta 8 kernel (2.4.21-43ELhugemem.
So existing 10.2.0.2 customer may also run into this issue.
Problem still exists in 2.4.21-44.ELhugemem (RHEL 3 U8 Beta 2) kernel.
RHEL3 is now closed.
Guru, does the problem only occur while running under gdb or do you get random
hangs while running the regression tests outside the debugger? Also, did this
work in a previous update release?
Have not seen hang outside the debugger yet.
Also problem was *not* present in RHEL 3 U7 (2.4.21-40.ELhugemem).
This problem started with RHEL 3 U7 Beta 1 release(2.4.21-43ELhugemem.)
The problem is seen with any process that is being debugged. after gdb quits,
the gdb process stays in 'T' state as if gdb itself being traced. its not a
hang rather the process is stopped in 'T' state. The problem only seen with
Guru -- I was able to reproduce this behavior in U8 and verified that it
doesn't not occur in U7. This is: When debugging a process with gdb, after
quiting, the gdb process remains on T state, while in U7 it terminates
successfully. However, this had no effect on the process that was being
debugged or the system performance. Are the test systems being impacted,
performance-wise, due to this? Or is your intention for this issue to only
report that gdb should not remain in T state?
SEG has identified one of the changes that could be responsible for this
behavior and has made Engineering aware of it. However, given that U8 is
scheduled to be released in less than two weeks, it will not be possible
for us to pursue a fix for U8. As you are probably aware, U8 is the last
official update scheduled for RHEL3. Management has yet to decide on how
outstanding RHEL3 issues, like this one, will be addressed. As soon as a
decision is made, we will be sharing it with you.
Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client
This event sent from IssueTracker by martinez
Thank you. As reported earlier, Oracle RDBMS, in situations where it
perceives a hang, initiates diagnostic dumps which includes stack dumps by
invoking gdb. So with gdb in 'T' state, and since the memory is not
released back, over a period of time .like few hours to a day, the box is
running out of memory after using all the swap. We think this will have
tremendous impact on RAC customers where such diagnostic is often taken
when a process/node is seen not responding.
This event sent from IssueTracker by martinez
Note that this regression only occurs when "gdb" is used to attach to
a previously existing process (with the "attach" command). Also, in
both cases (of running the process under "gdb" or attaching to one),
the RHEL3 U8 kernel causes another probably related regression:
pesto 1# cat << EOF > xyz.c
? for (;;)
pesto 15# cc -o xyz xyz.c
pesto 16# gdb xyz
GNU gdb Red Hat Linux (188.8.131.52-1.62rh)
Starting program: /home/ernie/xyz
warning: linux_test_for_tracefork: unexpected result from waitpid (4994,
warning: linux_test_for_tracefork: failed to kill child
I'm assigning this to PeterS, who worked on the U8 zap-threads patch.
Patch posted for internal review on 5-Jul-2006.
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-47.EL).
I have verified that this issue is fixed in RHEL 3 U8 release. (
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.