82687 – Raid5 mke2fs is exceptionally slow and occasionally hangs the system (DEFER RHEL3)

Bug 82687 - Raid5 mke2fs is exceptionally slow and occasionally hangs the system (DEFER RHEL3)

Summary: Raid5 mke2fs is exceptionally slow and occasionally hangs the system (DEFER R...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	90006
TreeView+	depends on / blocked

Reported:	2003-01-24 21:34 UTC by Glen A. Foster
Modified:	2007-11-30 22:06 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-06-17 16:05:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Glen A. Foster 2003-01-24 21:34:04 UTC

Description of problem: This is a collaborative-effort defect report.  Recently 
we received a shipment of HP2300 disk array controllersm, which house 1-14 SCSI 
drives.  It's controlled by the LSI1030 chip (mptscsih|mptbase).

What we are seeing is when we have sw-RAID5 partitions setup on the drives 
controlled by the array, it takes an exceedingly long time to create file 
systems.  Exceedingly slow == over 10 minutes to make a 300GB file system (7 
drives in the RAID5 partition, 2 used as spares, 73GB per drive).  Comparable 
times for a RAID0 and RAID1 mke2fs are ~1 minute.

It does seem isolated to RAID5.  Watching the progress on VC5 (during a text-
mode install) shows that mke2fs seems to progress in "bursts"... i.e., out of 
2188 block groups to format, anywhere from 200-300 block groups at a time are 
formatted (the numbers increase monotically quite rapidly) and then hang for at 
least 1-2 minutes.

Version-Release number of selected component (if applicable):
Stock AS2.1 (kernel-2.4.18-e.12)

How reproducible: 100% (always)
... although I haven't yet tried this with an HP2100 4-drive array enclosure.
I will try one and report the results soon.

Steps to Reproduce:
1. Create RAID5 partition on HP disk array using LSI1030 controller
2. Watch mke2fs generate the file system sporadically, slowly, in bursts

Additional Information:

This is likely NOT fixable for the first errata kernel, but really needs to be 
address for the following errata kernel (schedule date removed so this defect 
can be made public if necessary).

Comment 1 Glen A. Foster 2003-01-24 23:18:34 UTC

Other timing information observed, all with the HP2300 14-drive disk array and 
14 73GB HP drives (mptscsih|mptbase):

14 drives w/ RAID 0 (~980GB): 02:48
-----------------------------------
 7 drives w/ RAID 0 (~490GB): 01:16
 7 drives w/ RAID 5 (~290GB): 09:22

... the ---- deliniates two different installations.  If it matters, I 
formatted the RAID5 as /usr and RAID0 was /extra in the second install.
/extra was used for the RAID0 device in the first.

FWIW, I also saw "bursts" of progress with RAID0 installs but the time
lapse between the 'spurts of busy-ness' was noticably shorter.  I know
RAID0 does not use parity, but it still seems that an 8x multiplier
for SW-RAID5 is a bit excessive.

Also, due to /usr being RAID5 in the second install, the install was
twice as slow as the first (both installs were TUI/NFS installs) --
first install was 26:35 and the second install was 52:38.

Comment 2 Glen A. Foster 2003-01-28 21:00:23 UTC

A team-mate has noticed that making a file system on a RAID-5 device, OUTSIDE 
the realm of anaconda, also is quite slow.  He has created a 12-drive, RAID5 
evice (plus 2 spare drives).

The mke2fs has been running for over 3 hours now, and the system has very poor 
response time... it's as if the mke2fs is taking all available CPU.  If you 
receive this notice soon (i.e., in the next hour or so), please let me know 
what information would be helpful to obtain to help troubleshoot this.

Comment 3 Larry Troan 2003-05-11 15:16:03 UTC

FROM FEATUREZILLA......
------- Additional Comment #2 From Tim Burke on 2003-05-05 10:20 -------          
Weekly project meeting - should retry on RHEL3 where we are focusing more on
raid and other storage management. Defer from AS2.1 errata.

Raid5 is more expensive than 0 & 1.

Comment 4 Larry Troan 2003-06-17 16:05:03 UTC

FeatureZilla 90006 Closed=WONTFIX
Closing this Bugzilla as well to mirror FeatureZilla.

Note You need to log in before you can comment on or make changes to this bug.