Description of Problem: There are serious kernel IO bottlenecks in the 2.4.x kernel that are impacting performance for enterprise applications such as Oracle. The result is high system time,etc. for enterprise workloads making the results unacceptable for tpc benchmarks,etc. The bottlenecks are a) Bounce buffer allocation for RAM <= 4GB. See kernel profile for a test configuration. b) __make_request - due to the global io_request_lock contention Version-Release number of selected component (if applicable): Kernel : 2.4.x How Reproducible: Steps to Reproduce: 1. System (4-Proc, 4 GB, 4 megaraid controllers-PERC3/DC) 2. Boot with profile=2 3. Run testdevices program The compressed tar attachment includes a Makefile, source file tio.c and the executable 'tio'. The parameters to 'tio' are the size of the read and the time in seconds. 'testdevices' is the driver for this. On line 11 in this driver script you could modify the size of the read (multiblock) and the time. Right now these are set to 512k and 5 minutes respectively. Usage: ./testdevices /dev raw1 raw2 raw3 raw4 where raw1, raw2, raw3 & raw4 are raw partitions created on 4 different volumes (controllers), i.e., one process/controller ./testdevices /dev raw1 raw1 raw1 raw1 four processes/controller doing reads 4. While testdevices is running, use iostat and readprofile to determine io and kernel issues. Actual Results: See attached kernel profiles. Expected Results: Additional Information: Patches such as a) Jens Axboe bounce buffer patch seem to fix issue a) above b) Experimental patches for the global io lock
Created attachment 32298 [details] Kernel profile
Created attachment 32299 [details] Gunzip of test program
After doing some poking around inside the scsi layer, it appears that sd.c ends up calling b_end_io with the io_request_lock held. This results in any highmem bounce buffer copies being serialized for scsi requests. I don't think this needs the io_request_lock being split just yet, just bugfixing.
In testing we have found several bugs in Jens's highmem nobounce patch, and we have made progress fixing. The io_request_lock stuff will take longer to fix because it requires more auditing.
Created attachment 33661 [details] kernelprofile
Created attachment 33662 [details] iostat
FYI Attached are the kernelprofile and iostat logs for kernel 2.4.9-0.18smp.
a) is dealt with b) is being worked on but is a longer-term because the changes are initially destabilizing and will require much more work not only to complete but also to stabilize.
Our advanced server release is fixing most of these issues. How much is still visible in that beta ? (and with the latest kernel drop after that ?)
This is closed based on feedback from Oracle (TPC-R benchmarking efforts).