Bug 82687

Summary: Raid5 mke2fs is exceptionally slow and occasionally hangs the system (DEFER RHEL3)
Product: Red Hat Enterprise Linux 2.1 Reporter: Glen A. Foster <glen.foster>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: tao
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-17 16:05:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 90006    

Description Glen A. Foster 2003-01-24 21:34:04 UTC
Description of problem: This is a collaborative-effort defect report.  Recently 
we received a shipment of HP2300 disk array controllersm, which house 1-14 SCSI 
drives.  It's controlled by the LSI1030 chip (mptscsih|mptbase).

What we are seeing is when we have sw-RAID5 partitions setup on the drives 
controlled by the array, it takes an exceedingly long time to create file 
systems.  Exceedingly slow == over 10 minutes to make a 300GB file system (7 
drives in the RAID5 partition, 2 used as spares, 73GB per drive).  Comparable 
times for a RAID0 and RAID1 mke2fs are ~1 minute.

It does seem isolated to RAID5.  Watching the progress on VC5 (during a text-
mode install) shows that mke2fs seems to progress in "bursts"... i.e., out of 
2188 block groups to format, anywhere from 200-300 block groups at a time are 
formatted (the numbers increase monotically quite rapidly) and then hang for at 
least 1-2 minutes.

Version-Release number of selected component (if applicable):
Stock AS2.1 (kernel-2.4.18-e.12)

How reproducible: 100% (always)
... although I haven't yet tried this with an HP2100 4-drive array enclosure.
I will try one and report the results soon.

Steps to Reproduce:
1. Create RAID5 partition on HP disk array using LSI1030 controller
2. Watch mke2fs generate the file system sporadically, slowly, in bursts

Additional Information:

This is likely NOT fixable for the first errata kernel, but really needs to be 
address for the following errata kernel (schedule date removed so this defect 
can be made public if necessary).

Comment 1 Glen A. Foster 2003-01-24 23:18:34 UTC
Other timing information observed, all with the HP2300 14-drive disk array and 
14 73GB HP drives (mptscsih|mptbase):

14 drives w/ RAID 0 (~980GB): 02:48
 7 drives w/ RAID 0 (~490GB): 01:16
 7 drives w/ RAID 5 (~290GB): 09:22

... the ---- deliniates two different installations.  If it matters, I 
formatted the RAID5 as /usr and RAID0 was /extra in the second install.
/extra was used for the RAID0 device in the first.

FWIW, I also saw "bursts" of progress with RAID0 installs but the time
lapse between the 'spurts of busy-ness' was noticably shorter.  I know
RAID0 does not use parity, but it still seems that an 8x multiplier
for SW-RAID5 is a bit excessive.

Also, due to /usr being RAID5 in the second install, the install was
twice as slow as the first (both installs were TUI/NFS installs) --
first install was 26:35 and the second install was 52:38.

Comment 2 Glen A. Foster 2003-01-28 21:00:23 UTC
A team-mate has noticed that making a file system on a RAID-5 device, OUTSIDE 
the realm of anaconda, also is quite slow.  He has created a 12-drive, RAID5 
evice (plus 2 spare drives).

The mke2fs has been running for over 3 hours now, and the system has very poor 
response time... it's as if the mke2fs is taking all available CPU.  If you 
receive this notice soon (i.e., in the next hour or so), please let me know 
what information would be helpful to obtain to help troubleshoot this.

Comment 3 Larry Troan 2003-05-11 15:16:03 UTC
------- Additional Comment #2 From Tim Burke on 2003-05-05 10:20 -------          
Weekly project meeting - should retry on RHEL3 where we are focusing more on
raid and other storage management. Defer from AS2.1 errata.

Raid5 is more expensive than 0 & 1.

Comment 4 Larry Troan 2003-06-17 16:05:03 UTC
FeatureZilla 90006 Closed=WONTFIX
Closing this Bugzilla as well to mirror FeatureZilla.