Description of Problem:
There are serious kernel IO bottlenecks in the 2.4.x kernel
that are impacting performance for enterprise applications such
as Oracle. The result is high system time,etc. for enterprise workloads
making the results unacceptable for tpc benchmarks,etc.
The bottlenecks are
a) Bounce buffer allocation for RAM <= 4GB. See kernel
profile for a test configuration.
b) __make_request - due to the global io_request_lock contention
Version-Release number of selected component (if applicable):
Kernel : 2.4.x
Steps to Reproduce:
1. System (4-Proc, 4 GB, 4 megaraid controllers-PERC3/DC)
2. Boot with profile=2
3. Run testdevices program
The compressed tar attachment includes a Makefile,
source file tio.c and the executable 'tio'.
The parameters to 'tio' are the size of the read and
the time in seconds. 'testdevices' is the driver for this.
On line 11 in this driver script you could modify the
size of the read (multiblock) and the time.
Right now these are set to 512k and 5 minutes
./testdevices /dev raw1 raw2 raw3 raw4
where raw1, raw2, raw3 & raw4 are raw partitions created on 4
different volumes (controllers), i.e., one process/controller
./testdevices /dev raw1 raw1 raw1 raw1
four processes/controller doing reads
4. While testdevices is running, use iostat and readprofile to
determine io and kernel issues.
See attached kernel profiles.
Patches such as
a) Jens Axboe bounce buffer patch seem to fix issue a) above
b) Experimental patches for the global io lock
Created attachment 32298 [details]
Created attachment 32299 [details]
Gunzip of test program
After doing some poking around inside the scsi layer, it appears that sd.c ends
up calling b_end_io with the io_request_lock held. This results in any highmem
bounce buffer copies being serialized for scsi requests. I don't think this
needs the io_request_lock being split just yet, just bugfixing.
In testing we have found several bugs in Jens's highmem nobounce patch,
and we have made progress fixing.
The io_request_lock stuff will take longer to fix because it requires
Created attachment 33661 [details]
Created attachment 33662 [details]
FYI Attached are the kernelprofile and iostat logs for kernel 2.4.9-0.18smp.
a) is dealt with
b) is being worked on but is a longer-term because the changes
are initially destabilizing and will require much more work
not only to complete but also to stabilize.
Our advanced server release is fixing most of these issues. How much is still
visible in that beta ?
(and with the latest kernel drop after that ?)
This is closed based on feedback from
Oracle (TPC-R benchmarking efforts).