Red Hat Bugzilla – Bug 101479
SYSTEM FREEZE FOR HALF HOUR DURING STRESS TEST.
Last modified: 2007-11-30 17:06:57 EST
Created attachment 93335 [details]
tar file includes meinfo, slabinfo and alt sysre output during freeze (files are bzipped)
What is the 'oast' test tool ?
I'd like to try reproducing the bug here, so we can see exactly what's going on.
Martin Jenner has this tool. "Oracle automated stress
tool". But you may not able to reproduce this bug as
you may not have database which I have in size.
Hete are steps to reproduce this bug.
1) up kernel
2) 10i databse ( I do not know martin has that or
not? I can provide tarball to martin but I do not
3) 4 GB of memory
4) create 200 ware house database using oast tool (
I am using 5 data disks of 18 GB)
5) use directIO option in init.ora file
6) run 1000 user test with 1.8 GB of sga
May be it is difficult to reproduce this test at Red
Hat. I will help my best so that you can reproduce it
at Red Hat.
Please let me know if you need any other information.
I can reproduce it very easily at Oracle.
I have verified this bug in 411 kernel. I have colloected meminfo, slabinfo top
at every minutes. You can see in meminfo there is no collection during 20-30
minutes of interval. Also profile during that time is colloected at every
minute. Also this test is done on up and ODIRECT is enabled.
Created attachment 93970 [details]
meminfo, slabinfo, top and profile collected during test.
I have verified this bug on 2.EL kernel as per Larry's request. I have
colloected meminfo, slabinfo at every minutes and top at every 10
minutes. You can see in meminfo there is no collection during 20-30
minutes of interval. Also this test is done on up and ODIRECT is enabled.
I was running tpcc (oast) test on 10g database with 1000 user. Test
should be done in about 45 to 50 minutes, but took 1 hour and 40
minutes. Log for meminfo, slabinfo, top is attached. Please let me know
if you need anything
Created attachment 94613 [details]
meminfo, slabinfo, top collected during test.
The attached didnt show us what was going on unfortunately. Can you
get us AltSysrq-M, AltSysrq-T and AltSysrq-W outputs when the system
is in the frozen state.
Thanks, Larry Woodman
Requested out for alt-sysrq,m,p,t and w attached.
Created attachment 94646 [details]
alt sysrq output - bzipped
OK, we are making progress here(albeit slowly!). Please get the system into
this state and get several(hundreds) "AltSysrq P" outputs so we can see if
one process is stuck in update_queue()/try_atomic_semop() or there is a bunch
of context switching between several processes in this state.
alt-sysrq-p (hundreds) output attached. During hang
two or three different time output was taken.
Created attachment 94657 [details]
alt sysrq p output - bzipped
I can reproduce this bug in smp and
hugemem kernel both. I will update bug
with alt-sysrq-p, m and w as soon as I
Please use this smp kernel to collect the AltSysrq data.
With 3.EL kernel I reproduced bug using
hugemem kernel. System hangs. BUt it did
not came out like it used to. When I tried
to take alt-sysrq-m it hung. SO I have to
I will try debug kernel given by you. I
will also attach output of
meminfo,slabinfo and top. Also output of
alt-sysrq-m (after this machine hangs and
I have to switch off machine)
Created attachment 94835 [details]
memory and top output when hugemem kernel hangs
Created attachment 94836 [details]
alt-sysrq-m output when hugemem hangs
Please retry this rest with the kernel in:
We think we hit the same problem in an other case and this
kernel fixed that hang. Please let us know how it worked ASAP.
I tried with Kernel
2.4.21-3.EL.sock.kswapd.io.debug.20smp. I still got
similar behavior in test. Machine got freeze (paused)
for long time while running same tests. I could not
take alt-sysrq output while machine was paused.
Suhua, please try this new kernel. It has more debug output and
a panic if some bad things happen.
Attached is a low-budget simulation of the problem
seen in this case. Hundreds of processes are doing
sys_semtimedop() calls with a timeout of 10
milliseconds (HZ resolution). There is therefore
a scheduling storm from schedule_timeout() with each
The attached program replaces the sys_semtimedop()
calls with empty select() calls with a timeout of
10000 microseconds. When run with an argument of
say, 2000 processes, the same system "freeze" occurs.
When run on AS2.1, no such problem occurs with
either the same Oracle test or running 2000
processes from this "select.c" program.
Dave Anderson (entered while at Oracle)
Created attachment 95624 [details]
A correction to my posting re: the select.c program, where I stated
that "When run on AS2.1, no such problem occurs".
If I'm not mistaken, the quick test of 2000 processes on an AS2.1
machine we did at Oracle was most likely done on an SMP machine.
When I run it here on a UP AS2.1 box, the same problem occurs as with
RHEL3 UP. If I run 2000 users on a UP using a 2.4.23-based kernel.org
kernel (which doesn't have the O(1) scheduler used in AS2.1 and RHEL3)
the performance is even worse. My guess is that when we ran the 2000
user test on that "other" OS while at Oracle, it was also an SMP
In any case, unless proven otherwise, it does not appear to be
a regression from AS2.1.
I verified that this select program given by you only hangs on up
kernel. Both on AS 2.1 and RHEL 3.
Program do not hang on smp kernel on AS 2.1 or RHEL 3.
But my original test hangs/freeze machine for some time (10 minutes
to 3 hours ) on up, smp and hugemem. On hugemem and smp is little
difficult to reproduce.
Right -- if you increase the number of users to the select program
(from 2000 upward) it will eventually hang on an SMP as well.
It's due to the the in-kernel schedule_timeout() that gets done
by both your oracle stress test and the select.c program, which in
both programs is a timeout of 1 clock tick. In both cases, the
individual tasks timeout, return to user space, and then immediately
go back into the kernel for another 1-clock-tick timeout. The oracle
tasks do it via the new sys_semtimedop() system call, while the
"select" programs do it via the select() system call. The oracle
tasks do quite a bit more work in the kernel, and perhaps in user
space upon return from the sys_semtimedop with an EAGAIN errno,
before coming right back into the kernel and calling sys_semtimedop()
again. Eventually there are hundreds of these backed-up, constantly
timing-out, oracle tasks on the runqueue, each timing out for a clock
tick, getting scheduled to run, running and returning to user space,
calling sys_semtimedop again, etc. This floods the runqueues with
ready-to-run processes that make no progress, but end up flooding the
highest priority runqueue on each CPU, keeping other processes from
running for a long time. Eventually the job gets done, but the manner
in which the oracle stress tests use the new sys_semtimedop() (new in
RHEL3) ends up impeding their own progress.
Not really. But unless #43793 shows several hundred
processes blocked in sys_semtimedop(), it's doubtful
that it's the same issue.
Actually, make that "several thousand" on an SMP machine...
Suhua, does this bugzilla need to remain private? If not, please uncheck
the "Oracle Confidential Group" box below. Thanks.
Anyone know whether this issue was resolved? And if so, by what patch?
Making this bug private as a whole, but marking initial comment private
(in lieu of Suhua's response to comment #30).
Is this problem still reproducable with the latest kernel???
I don't have system to verify the latest kernel.
Larry, this bug may be closed.
Closing based on last comment.