Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 3.0
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
Depends On:
Blocks: 103278
TreeView+ depends on / blocked
Reported: 2003-08-01 17:30 UTC by Suhua Ding
Modified: 2007-11-30 22:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-12-02 00:07:56 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
tar file includes meinfo, slabinfo and alt sysre output during freeze (files are bzipped) (280.00 KB, application/octet-stream)
2003-08-01 17:36 UTC, Suhua Ding
no flags Details
meminfo, slabinfo, top and profile collected during test. (180.00 KB, application/octet-stream)
2003-08-27 06:53 UTC, Suhua Ding
no flags Details
meminfo, slabinfo, top collected during test. (40.00 KB, application/octet-stream)
2003-09-20 17:34 UTC, Suhua Ding
no flags Details
alt sysrq output - bzipped (164.56 KB, application/octet-stream)
2003-09-23 05:20 UTC, Suhua Ding
no flags Details
alt sysrq p output - bzipped (30.00 KB, text/plain)
2003-09-23 18:36 UTC, Suhua Ding
no flags Details
memory and top output when hugemem kernel hangs (80.00 KB, application/octet-stream)
2003-09-29 21:33 UTC, Suhua Ding
no flags Details
alt-sysrq-m output when hugemem hangs (1.40 KB, text/plain)
2003-09-29 21:35 UTC, Suhua Ding
no flags Details
select.c file (675 bytes, text/plain)
2003-10-31 04:37 UTC, Suhua Ding
no flags Details

Comment 1 Suhua Ding 2003-08-01 17:36:02 UTC
Created attachment 93335 [details]
tar file includes meinfo, slabinfo and alt sysre output during freeze (files are bzipped)

Comment 2 Rik van Riel 2003-08-01 19:01:55 UTC
What is the 'oast' test tool ?

I'd like to try reproducing the bug here, so we can see exactly what's going on.

Comment 3 Suhua Ding 2003-08-02 00:30:17 UTC
Martin Jenner has this tool. "Oracle automated stress 
tool". But you may not able to reproduce this bug as 
you may not have database which I have in size.

Hete are steps to reproduce this bug.

1) up kernel
2) 10i databse ( I do not know martin has that or 
not? I can provide tarball to martin but I do not 
know how?)
3) 4 GB of memory
4) create 200 ware house database using oast tool ( 
I am using 5 data disks of 18 GB)
5) use directIO option in init.ora file
6) run 1000 user test with 1.8 GB of sga

May be it is difficult to reproduce this test at Red 
Hat. I will help my best so that you can reproduce it 
at Red Hat.

Please let me know if you need any other information. 
I can reproduce it very easily at Oracle.

Comment 4 Suhua Ding 2003-08-27 06:50:49 UTC
I have verified this bug in 411 kernel. I have colloected meminfo, slabinfo top 
at every minutes. You can see in meminfo there is no collection during 20-30 
minutes of interval. Also profile during that time is colloected at every 
minute. Also this test is done on up and ODIRECT is enabled.

Comment 5 Suhua Ding 2003-08-27 06:53:16 UTC
Created attachment 93970 [details]
meminfo, slabinfo, top and profile collected during test.

Comment 6 Suhua Ding 2003-09-20 17:32:36 UTC
I have verified this bug on 2.EL kernel as per Larry's request. I have 
colloected meminfo, slabinfo at every minutes and top at every 10 
minutes. You can see in meminfo there is no collection during 20-30 
minutes of interval. Also this test is done on up and ODIRECT is enabled. 
 I was running tpcc (oast) test on 10g database with 1000 user. Test 
should be done in about 45 to 50 minutes, but took 1 hour and 40 
minutes. Log for meminfo, slabinfo, top is attached. Please let me know 
if you need anything

Comment 7 Suhua Ding 2003-09-20 17:34:08 UTC
Created attachment 94613 [details]
meminfo, slabinfo, top collected during test.

Comment 8 Larry Woodman 2003-09-22 19:15:24 UTC
The attached didnt show us what was going on unfortunately.  Can you
get us AltSysrq-M, AltSysrq-T and AltSysrq-W outputs when the system
is in the frozen state.

Thanks, Larry Woodman

Comment 9 Suhua Ding 2003-09-23 05:19:43 UTC
Requested out for alt-sysrq,m,p,t and w attached.

Comment 10 Suhua Ding 2003-09-23 05:20:54 UTC
Created attachment 94646 [details]
alt sysrq output - bzipped

Comment 11 Larry Woodman 2003-09-23 15:59:56 UTC
OK, we are making progress here(albeit slowly!).  Please get the system into
this state and get several(hundreds) "AltSysrq P" outputs so we can see if
one process is stuck in update_queue()/try_atomic_semop() or there is a bunch
of context switching between several processes in this state.


Comment 12 Suhua Ding 2003-09-23 18:35:43 UTC
alt-sysrq-p (hundreds) output attached. During hang 
two or three different time output was taken.

Comment 13 Suhua Ding 2003-09-23 18:36:58 UTC
Created attachment 94657 [details]
alt sysrq p output - bzipped

Comment 14 Suhua Ding 2003-09-29 18:08:01 UTC
I can reproduce this bug in smp and 
hugemem kernel both. I will update bug 
with alt-sysrq-p, m and w as soon as I 

Comment 15 Larry Woodman 2003-09-29 19:27:59 UTC
Please use this smp kernel to collect the AltSysrq data.


Thanks, Larry

Comment 16 Suhua Ding 2003-09-29 21:32:26 UTC
With 3.EL kernel I reproduced bug using 
hugemem kernel. System hangs. BUt it did 
not came out like it used to. When I tried 
to take alt-sysrq-m it hung. SO I have to 
reboot machine.

I will try debug kernel given by you. I 
will also attach output of 
meminfo,slabinfo and top. Also output of 
alt-sysrq-m (after this machine hangs and 
I have to switch off machine)

Comment 17 Suhua Ding 2003-09-29 21:33:36 UTC
Created attachment 94835 [details]
memory and top output when hugemem kernel hangs

Comment 18 Suhua Ding 2003-09-29 21:35:07 UTC
Created attachment 94836 [details]
alt-sysrq-m output when hugemem hangs

Comment 19 Larry Woodman 2003-10-18 22:18:46 UTC
Please retry this rest with the kernel in:

We think we hit the same problem in an other case and this
kernel fixed that hang.  Please let us know how it worked ASAP.

Larry Woodman

Comment 20 Suhua Ding 2003-10-20 18:56:27 UTC
I tried with Kernel 
2.4.21-3.EL.sock.kswapd.io.debug.20smp. I still got 
similar behavior in test. Machine got freeze (paused) 
for long time while running same tests. I could not 
take alt-sysrq output while machine was paused. 

Comment 21 Larry Woodman 2003-10-21 21:15:49 UTC
Suhua, please try this new kernel.  It has more debug output and
a panic if some bad things happen.



Comment 22 Suhua Ding 2003-10-31 04:35:47 UTC
Attached is a low-budget simulation of the problem
seen in this case.  Hundreds of processes are doing
sys_semtimedop() calls with a timeout of 10 
milliseconds (HZ resolution).  There is therefore
a scheduling storm from schedule_timeout() with each 
clock tick.

The attached program replaces the sys_semtimedop()
calls with empty select() calls with a timeout of 
10000 microseconds.  When run with an argument of 
say, 2000 processes, the same system "freeze" occurs.

When run on AS2.1, no such problem occurs with 
either the same Oracle test or running 2000 
processes from this "select.c" program.

Dave Anderson (entered while at Oracle)

Comment 23 Suhua Ding 2003-10-31 04:37:03 UTC
Created attachment 95624 [details]
select.c file

Comment 24 Dave Anderson 2003-11-05 15:06:39 UTC
A correction to my posting re: the select.c program, where I stated
that "When run on AS2.1, no such problem occurs".

If I'm not mistaken, the quick test of 2000 processes on an AS2.1
machine we did at Oracle was most likely done on an SMP machine.
When I run it here on a UP AS2.1 box, the same problem occurs as with
RHEL3 UP.  If I run 2000 users on a UP using a 2.4.23-based kernel.org
kernel (which doesn't have the O(1) scheduler used in AS2.1 and RHEL3)
the performance is even worse.  My guess is that when we ran the 2000
user test on that "other" OS while at Oracle, it was also an SMP

In any case, unless proven otherwise, it does not appear to be
a regression from AS2.1.

Comment 25 Suhua Ding 2003-11-06 01:59:38 UTC
I verified that this select program given by you only hangs on up 
kernel. Both on AS 2.1 and RHEL 3. 

Program do not hang on smp kernel on AS 2.1 or RHEL 3.

But my original test hangs/freeze machine for some time (10 minutes 
to 3 hours ) on up, smp and hugemem. On hugemem and smp is little 
difficult to reproduce.

Comment 26 Dave Anderson 2003-11-06 13:27:56 UTC
Right -- if you increase the number of users to the select program
(from 2000 upward) it will eventually hang on an SMP as well.

It's due to the the in-kernel schedule_timeout() that gets done
by both your oracle stress test and the select.c program, which in
both programs is a timeout of 1 clock tick.  In both cases, the
individual tasks timeout, return to user space, and then immediately
go back into the kernel for another 1-clock-tick timeout.  The oracle
tasks do it via the new sys_semtimedop() system call, while the
"select" programs do it via the select() system call.  The oracle
tasks do quite a bit more work in the kernel, and perhaps in user
space upon return from the sys_semtimedop with an EAGAIN errno,
before coming right back into the kernel and calling sys_semtimedop()
again.  Eventually there are hundreds of these backed-up, constantly
timing-out, oracle tasks on the runqueue, each timing out for a clock
tick, getting scheduled to run, running and returning to user space,
calling sys_semtimedop again, etc.  This floods the runqueues with
ready-to-run processes that make no progress, but end up flooding the
highest priority runqueue on each CPU, keeping other processes from
running for a long time.  Eventually the job gets done, but the manner
in which the oracle stress tests use the new sys_semtimedop() (new in
RHEL3) ends up impeding their own progress.

Comment 28 Dave Anderson 2004-07-27 18:19:04 UTC
Not really.  But unless #43793 shows several hundred
processes blocked in sys_semtimedop(), it's doubtful
that it's the same issue.  

Comment 29 Dave Anderson 2004-07-27 19:23:08 UTC
Actually, make that "several thousand" on an SMP machine...

Comment 30 Ernie Petrides 2005-10-10 23:39:56 UTC
Suhua, does this bugzilla need to remain private?  If not, please uncheck
the "Oracle Confidential Group" box below.  Thanks.

Anyone know whether this issue was resolved?  And if so, by what patch?

Comment 31 Ernie Petrides 2005-10-22 00:56:51 UTC
Making this bug private as a whole, but marking initial comment private
(in lieu of Suhua's response to comment #30).

Comment 32 Larry Woodman 2006-12-01 20:10:13 UTC
Is this problem still reproducable with the latest kernel???

Larry Woodman

Comment 33 Suhua Ding 2006-12-01 20:22:03 UTC
I don't have system to verify the latest kernel.

Comment 34 Suhua Ding 2006-12-01 23:32:20 UTC
Larry, this bug may be closed.

Comment 35 Ernie Petrides 2006-12-02 00:07:56 UTC
Closing based on last comment.

Note You need to log in before you can comment on or make changes to this bug.