80735 – software raid 5 sync causes incorrect load average

Bug 80735 - software raid 5 sync causes incorrect load average

Summary: software raid 5 sync causes incorrect load average

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-12-30 16:51 UTC by Steven Pritchard
Modified:	2007-04-18 16:49 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:40:21 UTC
Embargoed:

Attachments	(Terms of Use)

Description Steven Pritchard 2002-12-30 16:51:29 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
On a single-processor 1GHz Celeron running kernel 2.4.18-18.7.x, while running
raid5syncd on a 6-disk SCSI array (on an Adaptec 29160N), top reports the following:

 10:45am  up  1:34,  7 users,  load average: 164.48, 164.36, 156.67
240 processes: 238 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: 23.8% user, 25.7% system,  0.0% nice, 50.4% idle
Mem:   772644K av,  336508K used,  436136K free,       0K shrd,   26132K buff
Swap: 2046736K av,      20K used, 2046716K free                   71636K cached

Note the inflated load average with only two processes running.  (The system has
been running at 75% idle, give or take, for the last hour.)

/proc/mdstat:

Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
      244227520 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [==========>..........]  resync = 53.0% (25909532/48845504) finish=60.3min
speed=6330K/sec
unused devices: <none>

The system is totally responsive, although the high load average makes some
daemons (such as sendmail) stop accepting connections.

Version-Release number of selected component (if applicable):


How reproducible:
Didn't try


Additional info:

Comment 1 Steven Pritchard 2002-12-30 17:59:05 UTC

After the resync finished, the load average dropped back to normal.

Comment 2 Steven Pritchard 2003-08-29 19:50:14 UTC

Bug is still present in i686 kernel-2.4.20-20.9 on the same box.

# uptime
  2:43pm  up  1:11,  2 users,  load average: 45.11, 42.69, 36.52
# cat /proc/mdstat 
Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
      244227520 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [===========>.........]  resync = 55.1% (26955244/48845504) finish=56.2min
speed=6481K/sec
unused devices: <none>
# uname -a
Linux hostname 2.4.20-20.9 #1 Mon Aug 18 11:45:58 EDT 2003 i686 unknown

Comment 3 Steven Pritchard 2004-04-07 17:51:11 UTC

Adding dev.raid.speed_limit_max = 6000 to /etc/sysctl.conf "fixed" 
the problem on that system, so apparently swamping the SCSI bus was 
making the load spike insanely. 
 
I haven't needed to rebuild the RAID lately to see if recent Fedora 
kernels behave any differently.

Comment 4 Bugzilla owner 2004-09-30 15:40:21 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.