Bug 80735

Summary:	software raid 5 sync causes incorrect load average
Product:	[Retired] Red Hat Linux	Reporter:	Steven Pritchard <steve>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED WONTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.3
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-09-30 15:40:21 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Steven Pritchard 2002-12-30 16:51:29 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
On a single-processor 1GHz Celeron running kernel 2.4.18-18.7.x, while running
raid5syncd on a 6-disk SCSI array (on an Adaptec 29160N), top reports the following:

 10:45am  up  1:34,  7 users,  load average: 164.48, 164.36, 156.67
240 processes: 238 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: 23.8% user, 25.7% system,  0.0% nice, 50.4% idle
Mem:   772644K av,  336508K used,  436136K free,       0K shrd,   26132K buff
Swap: 2046736K av,      20K used, 2046716K free                   71636K cached

Note the inflated load average with only two processes running.  (The system has
been running at 75% idle, give or take, for the last hour.)

/proc/mdstat:

Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
      244227520 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [==========>..........]  resync = 53.0% (25909532/48845504) finish=60.3min
speed=6330K/sec
unused devices: <none>

The system is totally responsive, although the high load average makes some
daemons (such as sendmail) stop accepting connections.

Version-Release number of selected component (if applicable):


How reproducible:
Didn't try


Additional info:

Comment 1 Steven Pritchard 2002-12-30 17:59:05 UTC

After the resync finished, the load average dropped back to normal.

Comment 2 Steven Pritchard 2003-08-29 19:50:14 UTC

Bug is still present in i686 kernel-2.4.20-20.9 on the same box.

# uptime
  2:43pm  up  1:11,  2 users,  load average: 45.11, 42.69, 36.52
# cat /proc/mdstat 
Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
      244227520 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [===========>.........]  resync = 55.1% (26955244/48845504) finish=56.2min
speed=6481K/sec
unused devices: <none>
# uname -a
Linux hostname 2.4.20-20.9 #1 Mon Aug 18 11:45:58 EDT 2003 i686 unknown

Comment 3 Steven Pritchard 2004-04-07 17:51:11 UTC

Adding dev.raid.speed_limit_max = 6000 to /etc/sysctl.conf "fixed" 
the problem on that system, so apparently swamping the SCSI bus was 
making the load spike insanely. 
 
I haven't needed to rebuild the RAID lately to see if recent Fedora 
kernels behave any differently.

Comment 4 Bugzilla owner 2004-09-30 15:40:21 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/