Bug 497672

Summary:	Slow random read performance under high system load
Product:	Red Hat Enterprise Linux 5	Reporter:	Gavin Edwards <gedwards>
Component:	dmraid	Assignee:	LVM and device-mapper development team <lvm-team>
Status:	CLOSED NOTABUG	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	low
Version:	5.3	CC:	agk, dwysocha, heinzm, mbroz, prockai
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-07-01 10:26:14 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gavin Edwards 2009-04-26 00:35:48 UTC

Description of problem:

Additional info:

Configuration:
2x Quad core AMD opteron w/ 32GB memory
8 mpath devices with single partitions formatted with ext3 (mpath10p1 for example). No LVM is being used for these devices.

We're seeing high (average 25-30ms) random read times for a system.

In previous testing with a similar configuration (same read/write load without the additional CPU/memory load) we were seeing response times around 9ms average. Process count was actually higher during IO testing than currently, but the testing processes were doing almost no work.

iostat is reporting await times around 9ms under load, and the SAN is not reporting high load.

Differences between the first IO benchmarking and current poor response situation:
-First testing was done with ext3 filesystems on top of raw mpath devices rather than a single partition on each device (mpath10 vs mpath10p1)
-Hugepages are being used since we're using the same database and ran into the same problems as bug #250155
-kswapd0 is running constantly at high CPU usage trying to keep memfree to constraints defined in /proc/sys/vm/lowmem_reserve_ratio (increasing those values decreases memfree allowing kswapd0 to sleep for a while before coming back to constantly free pages at the new level)
-CPU idle now is 10-15% vs 75-80% during testing. Wait time is about the same at 20-30%.

Other thoughts:

We tried several tweaks to attempt to decrease read time (as measured at the application layer). Nothing that we tried seemed to have an impact.
-Tried deadline and noop schedulers
-Tried increasing/decreasing queue_depth and nr_requests for associated devices
-Tried decreasing read_ahead_kb

It seems that you've done similar testing with the same Cache' database application we're using. Were there any IO latency related issues that were found during those tests?
http://www.redhat.com/f/pdf/rhel/Cache_WhitePaper_Opteron_V1-1.pdf
http://www.redhat.com/f/pdf/rhel/Cache_WhitePaper_Xeon_V1.pdf

Any ideas for which layer the delay is being added at? The device level actually looks pretty good if iostat is to be believed.

Is there any other information that you would like me to provide?