Bug 138349 - Oracle dbwr process I/O hang / slowness when writing in async mode
Summary: Oracle dbwr process I/O hang / slowness when writing in async mode
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 3.0
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2004-11-08 14:49 UTC by Peter Martuccelli
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-10-19 19:14:37 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Peter Martuccelli 2004-11-08 14:49:49 UTC
The Oracle database is running async i/o against an ext3 file system
and the problem reproduces on any of the following AS3.0 smp kernels
2.4.21-15.EL, 2.4.21-20.EL, 2.4.21-21.EL and 2.4.21-22.EL.  This did
not occur on the AS2.1 2.4.9-e34 kernel with the exact same setup.

What we see when tracing the Oracle dbwr process is it  waiting in
io_getevents as the example below show.
1098080568.521062 io_getevents(0xb75ca000, 0x1, 0x200, 0xbfff6b14,
0xbfff6acc) = 1 <50.564850>

When the dbwr processes is in the state it stops writing out dirty
blocks which results in the database not being able to load any new
blocks.  This essentially causes the database to grind to a halt after
a short period of time.



Action by: nhorman
Well, if larry asked for it, I'll just send it up to engineering.....

Issue escalated to Sustaining Engineering by: nhorman.
nhorman assigned to issue for Support Engineering Group.

Action by: jneedle
Larry and wli figure this out and Larry's building a new kernel with
the fix.  Assigning to Peter to track.

peterm assigned to issue for Sustaining Engineering.

Action by: lwoodman

With WLI's help at amazon we determined that keventd was not running
for several seconds each time it
yields the cpu in context_thread when there is work to be done.  This
is probably caused by several other 
processes being runnable.  The context thread in the current RHEL3
kernel looks like this: 
        for (;;) {
                set_task_state(curtask, TASK_INTERRUPTIBLE);
                add_wait_queue(&context_task_wq, &wait);
                if (TQ_ACTIVE(tq_context))
                        set_task_state(curtask, TASK_RUNNING);
                remove_wait_queue(&context_task_wq, &wait);

I have changed context_thread to so it will not reschedule if there is
work to do in the kernel 
Amazon is currently testing, this is also what is done in the current
2.6 kernel:
        for (;;) {
                set_task_state(curtask, TASK_INTERRUPTIBLE);
                add_wait_queue(&context_task_wq, &wait);
                if (TQ_ACTIVE(tq_context))
                        set_task_state(curtask, TASK_RUNNING);
                remove_wait_queue(&context_task_wq, &wait);

The kernel with these changes is locate here:


So, we are currently waiting for Amazon to test this kernel and see if
this problem goes away.


Status set to: Waiting on Client

Action by: peterm
Escalated to Bugzilla

Comment 1 Larry Woodman 2004-11-09 15:55:42 UTC

1.) OK, after looking at this at a different angle I think the real
problem is the priority of keventd
    gets lowered so it doenst get the cpu when the load average
exceeds the number of cpus by a
    significant margin.
       Can you renice keventd's priority to -20 and re-run the test.

       renice -20 -p <keventd-pid>

2.) Also, for the kiobuf buffer headers I added code to
__kmem_cache_free() to call the dtor when
    the per-cpu slabcache is full and the kiobuf is being freed to the
global slabcache.  This should
    eliminate the concerns performance degradation because the per-cpu
cache is still full of kiobufs
    with a full kiovec and bufferheads.

    The source and binary rpms for this kernel is here:



Comment 2 Larry Woodman 2004-11-16 18:11:27 UTC
I recieved a couple peices of feedback from Amazon stating that the 
"renice -20 -p <keventd-pid>" pretty much fixes the Oracle dbwr
process I/O hang / slowness when writing in async mode problem:



1)  I have been running my test with keventd set to a high priority
and also tried setting it's scheduler to the SCHED_RR.  We either of
these set I have not seen any large group of slow io_getevents, I
still had the odd one in the 10-20 second range but I don't get a
whole pile up where we will see a group of io_getevents between to
io_submits take 600+ seconds.  I will continue to do more testing to
make sure this is really helping.


I have aso tried adjusting kevent'd priority on the older amzn4 kernel
and that also seemed to work much better.  I have included two images
showing the difference (the highlighted stuff in the yellow circle is
what I believe to be the problem area).  The y axis is time in
seconds.  This is an aggregated sum of the io_getevents and io_submits
(I sum up time they spend in the call until they switch from
submitting to getting and back again).


I think they are OK with the renice'ng as a fix for this problem.

Comment 3 RHEL Product and Program Management 2007-10-19 19:14:37 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.