Created attachment 93335 [details] tar file includes meinfo, slabinfo and alt sysre output during freeze (files are bzipped)
What is the 'oast' test tool ? I'd like to try reproducing the bug here, so we can see exactly what's going on.
Martin Jenner has this tool. "Oracle automated stress tool". But you may not able to reproduce this bug as you may not have database which I have in size. Hete are steps to reproduce this bug. 1) up kernel 2) 10i databse ( I do not know martin has that or not? I can provide tarball to martin but I do not know how?) 3) 4 GB of memory 4) create 200 ware house database using oast tool ( I am using 5 data disks of 18 GB) 5) use directIO option in init.ora file 6) run 1000 user test with 1.8 GB of sga May be it is difficult to reproduce this test at Red Hat. I will help my best so that you can reproduce it at Red Hat. Please let me know if you need any other information. I can reproduce it very easily at Oracle.
I have verified this bug in 411 kernel. I have colloected meminfo, slabinfo top at every minutes. You can see in meminfo there is no collection during 20-30 minutes of interval. Also profile during that time is colloected at every minute. Also this test is done on up and ODIRECT is enabled.
Created attachment 93970 [details] meminfo, slabinfo, top and profile collected during test.
I have verified this bug on 2.EL kernel as per Larry's request. I have colloected meminfo, slabinfo at every minutes and top at every 10 minutes. You can see in meminfo there is no collection during 20-30 minutes of interval. Also this test is done on up and ODIRECT is enabled. I was running tpcc (oast) test on 10g database with 1000 user. Test should be done in about 45 to 50 minutes, but took 1 hour and 40 minutes. Log for meminfo, slabinfo, top is attached. Please let me know if you need anything
Created attachment 94613 [details] meminfo, slabinfo, top collected during test.
The attached didnt show us what was going on unfortunately. Can you get us AltSysrq-M, AltSysrq-T and AltSysrq-W outputs when the system is in the frozen state. Thanks, Larry Woodman
Requested out for alt-sysrq,m,p,t and w attached.
Created attachment 94646 [details] alt sysrq output - bzipped
OK, we are making progress here(albeit slowly!). Please get the system into this state and get several(hundreds) "AltSysrq P" outputs so we can see if one process is stuck in update_queue()/try_atomic_semop() or there is a bunch of context switching between several processes in this state. Larry
alt-sysrq-p (hundreds) output attached. During hang two or three different time output was taken.
Created attachment 94657 [details] alt sysrq p output - bzipped
I can reproduce this bug in smp and hugemem kernel both. I will update bug with alt-sysrq-p, m and w as soon as I can.
Please use this smp kernel to collect the AltSysrq data. http://people.redhat.com/~lwoodman/.for_oracle/kernel-smp-2.4.21-3.EL.debug.i686.rpm Thanks, Larry
With 3.EL kernel I reproduced bug using hugemem kernel. System hangs. BUt it did not came out like it used to. When I tried to take alt-sysrq-m it hung. SO I have to reboot machine. I will try debug kernel given by you. I will also attach output of meminfo,slabinfo and top. Also output of alt-sysrq-m (after this machine hangs and I have to switch off machine)
Created attachment 94835 [details] memory and top output when hugemem kernel hangs
Created attachment 94836 [details] alt-sysrq-m output when hugemem hangs
Please retry this rest with the kernel in: http://people.redhat.com:/~lwoodman/.for_oracle We think we hit the same problem in an other case and this kernel fixed that hang. Please let us know how it worked ASAP. Larry Woodman
I tried with Kernel 2.4.21-3.EL.sock.kswapd.io.debug.20smp. I still got similar behavior in test. Machine got freeze (paused) for long time while running same tests. I could not take alt-sysrq output while machine was paused.
Suhua, please try this new kernel. It has more debug output and a panic if some bad things happen. http://people.redhat.com/~lwoodman/.for_oracle/ Larry
Attached is a low-budget simulation of the problem seen in this case. Hundreds of processes are doing sys_semtimedop() calls with a timeout of 10 milliseconds (HZ resolution). There is therefore a scheduling storm from schedule_timeout() with each clock tick. The attached program replaces the sys_semtimedop() calls with empty select() calls with a timeout of 10000 microseconds. When run with an argument of say, 2000 processes, the same system "freeze" occurs. When run on AS2.1, no such problem occurs with either the same Oracle test or running 2000 processes from this "select.c" program. Dave Anderson (entered while at Oracle)
Created attachment 95624 [details] select.c file
A correction to my posting re: the select.c program, where I stated that "When run on AS2.1, no such problem occurs". If I'm not mistaken, the quick test of 2000 processes on an AS2.1 machine we did at Oracle was most likely done on an SMP machine. When I run it here on a UP AS2.1 box, the same problem occurs as with RHEL3 UP. If I run 2000 users on a UP using a 2.4.23-based kernel.org kernel (which doesn't have the O(1) scheduler used in AS2.1 and RHEL3) the performance is even worse. My guess is that when we ran the 2000 user test on that "other" OS while at Oracle, it was also an SMP machine. In any case, unless proven otherwise, it does not appear to be a regression from AS2.1.
I verified that this select program given by you only hangs on up kernel. Both on AS 2.1 and RHEL 3. Program do not hang on smp kernel on AS 2.1 or RHEL 3. But my original test hangs/freeze machine for some time (10 minutes to 3 hours ) on up, smp and hugemem. On hugemem and smp is little difficult to reproduce.
Right -- if you increase the number of users to the select program (from 2000 upward) it will eventually hang on an SMP as well. It's due to the the in-kernel schedule_timeout() that gets done by both your oracle stress test and the select.c program, which in both programs is a timeout of 1 clock tick. In both cases, the individual tasks timeout, return to user space, and then immediately go back into the kernel for another 1-clock-tick timeout. The oracle tasks do it via the new sys_semtimedop() system call, while the "select" programs do it via the select() system call. The oracle tasks do quite a bit more work in the kernel, and perhaps in user space upon return from the sys_semtimedop with an EAGAIN errno, before coming right back into the kernel and calling sys_semtimedop() again. Eventually there are hundreds of these backed-up, constantly timing-out, oracle tasks on the runqueue, each timing out for a clock tick, getting scheduled to run, running and returning to user space, calling sys_semtimedop again, etc. This floods the runqueues with ready-to-run processes that make no progress, but end up flooding the highest priority runqueue on each CPU, keeping other processes from running for a long time. Eventually the job gets done, but the manner in which the oracle stress tests use the new sys_semtimedop() (new in RHEL3) ends up impeding their own progress.
Not really. But unless #43793 shows several hundred processes blocked in sys_semtimedop(), it's doubtful that it's the same issue.
Actually, make that "several thousand" on an SMP machine...
Suhua, does this bugzilla need to remain private? If not, please uncheck the "Oracle Confidential Group" box below. Thanks. Anyone know whether this issue was resolved? And if so, by what patch?
Making this bug private as a whole, but marking initial comment private (in lieu of Suhua's response to comment #30).
Is this problem still reproducable with the latest kernel??? Larry Woodman
I don't have system to verify the latest kernel.
Larry, this bug may be closed.
Closing based on last comment.