Bug 497672 - Slow random read performance under high system load
Summary: Slow random read performance under high system load
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: dmraid
Version: 5.3
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-04-26 00:35 UTC by Gavin Edwards
Modified: 2010-07-01 10:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-01 10:26:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gavin Edwards 2009-04-26 00:35:48 UTC
Description of problem:


Additional info:

Configuration:
2x Quad core AMD opteron w/ 32GB memory
8 mpath devices with single partitions formatted with ext3 (mpath10p1 for example).  No LVM is being used for these devices.


We're seeing high (average 25-30ms) random read times for a system.

In previous testing with a similar configuration (same read/write load without the additional CPU/memory load) we were seeing response times around 9ms average.  Process count was actually higher during IO testing than currently, but the testing processes were doing almost no work.

iostat is reporting await times around 9ms under load, and the SAN is not reporting high load.

Differences between the first IO benchmarking and current poor response situation:
-First testing was done with ext3 filesystems on top of raw mpath devices rather than a single partition on each device (mpath10 vs mpath10p1)
-Hugepages are being used since we're using the same database and ran into the same problems as bug #250155
-kswapd0 is running constantly at high CPU usage trying to keep memfree to constraints defined in /proc/sys/vm/lowmem_reserve_ratio (increasing those values decreases memfree allowing kswapd0 to sleep for a while before coming back to constantly free pages at the new level)
-CPU idle now is 10-15% vs 75-80% during testing.  Wait time is about the same at 20-30%.

Other thoughts:

We tried several tweaks to attempt to decrease read time (as measured at the application layer).  Nothing that we tried seemed to have an impact.
-Tried deadline and noop schedulers
-Tried increasing/decreasing queue_depth and nr_requests for associated devices
-Tried decreasing read_ahead_kb

It seems that you've done similar testing with the same Cache' database application we're using.  Were there any IO latency related issues that were found during those tests?  
http://www.redhat.com/f/pdf/rhel/Cache_WhitePaper_Opteron_V1-1.pdf
http://www.redhat.com/f/pdf/rhel/Cache_WhitePaper_Xeon_V1.pdf

Any ideas for which layer the delay is being added at?  The device level actually looks pretty good if iostat is to be believed.

Is there any other information that you would like me to provide?


Note You need to log in before you can comment on or make changes to this bug.