Bug 40290

Summary:	raid 5 causes 15+ minute boot time (when used on only one drive)
Product:	[Retired] Red Hat Linux	Reporter:	Ben Levenson <benl>
Component:	kernel	Assignee:	Ingo Molnar <mingo>
Status:	CLOSED NOTABUG	QA Contact:	Brock Organ <borgan>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.3	CC:	notting
Target Milestone:	---
Target Release:	---
Hardware:	ia64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2001-06-27 20:34:09 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ben Levenson 2001-05-11 20:22:38 UTC

Description of Problem:
Creating a raid 5 partition causes long boot times.  All of the testing
was done on workstation class systems with 1 hard drive.
The system booted as expected up until the message reading "Freeing unused
kernel memory" was displayed on the screen. The remaining part of the boot
process took about 15 minutes.
I verified this with /opt and /usr mounted on the raid device.
I was able to read and write to the raid device without any issues.

How Reproducible:
100% -- I will verify this against a system with multiple drives, but
the only other system that was available for testing was the Compaq.

Steps to Reproduce:
1. create raid5 device on a single drive
2. reboot


Actual Results:


Expected Results:


Additional Information:
I received a kernel panic when I recreated the above scenario on the 
BigSur located at my desk.  /proc/cpuinfo indicates that it is running
with a revision 4 processor.

cpuinfo: 
   1 CPU
   vendor     : GenuineIntel
   family     : IA-64
   model      : Itanium
   revision   : 6
scsi info:
   QLogic PCI to SCSI Adapter for ISP 1280/12160:
           Firmware version:  8.13.08, Driver version 3.24 Beta
   SCSI Host Adapter Information: QLA1280 
   Request Queue = 0x0000000008bb8000, Response Queue = 0x0000000008bc8000
   Request Queue count= 0x100, Response Queue count= 0x10
   Number of pending commands = 0x8b
   Number of queued commands = 0x0
   Number of free request entries = 38
   Attached devices: 
   Host: scsi0 Channel: 00 Id: 02 Lun: 00
     Vendor: QUANTUM  Model: ATLAS IV 9 SCA   Rev: 0B0B
mdstat:
   Personalities : [raid5] 
   read_ahead 1024 sectors
   md0 : active raid5 sda9[1] sda8[0] sda7[2]
      2056064 blocks level 5, 64k chunk, algorithm 0 [3/3] [UUU]
      [=============>.......]  resync = 66.5% (684384/1028032) 
finish=12.5min
   speed=456K/sec
   unused devices: <none>

Comment 1 Ben Levenson 2001-05-11 20:23:09 UTC

adding Bill to Cc:

Comment 2 Preston Brown 2001-05-21 15:45:21 UTC

this sounds like pretty bad news.  Ben, what was the result of a two or three disc 
raid5?  1 disc raid5 isn't a real world test.

Comment 3 Ben Levenson 2001-05-22 20:43:36 UTC

build qa0522.0
I performed a raid 5 install on the Compaq in the test lab this morning.
The system has 3 drives on a cpqarray.
raid 5 doesn't look very good here either....at least system boot time wasn't 
abnormally long.
Based on the information from dmesg, it doesn't look like the raid setup is
working as expected.  /dev/md0 (/opt) was mounted after the system came up, 
but apparently there were "not enough operational devices for md0 (2/3 failed)."

From dmesg:
cpqarray: Device 0x1000 has been found at bus 25 dev 8 func 0                  
Compaq SMART2 Driver (v 2.4.4)                                                 
Found 1 controller(s)                                                          
cpqarray: Finding drives on ida0 (Integrated Array)                            
cpqarray ida/c0d0: blksz=512 nr_blks=35553120                                  
cpqarray ida/c0d1: blksz=512 nr_blks=35561280                                  
cpqarray ida/c0d2: blksz=512 nr_blks=35561280                                  
cpqarray: Starting firmware's background processing                            
Partition check:                                                                
ida/c0d0: p1 p2 p3 < p5 p6 >                                                   
ida/c0d1: p1 p2                                                                
ida/c0d2: p1 p2 p3                                                            
raid5: measuring checksumming
speed                                                ia64      :    81.920
MB/sec                                                 raid5: using function:
ia64 (81.920 MB/sec)                                     raid5 personality
registered as nr 4                                            autodetecting RAID
arrays                                                       (read) ida/c0d0p2's
sb offset: 3072128 [events: 00000002]                       autorun
...                                                                    
considering ida/c0d0p2
...                                                        adding ida/c0d0p2
...                                                         created
md0                                                                    
bind<ida/c0d0p2,1>                                                             
running: <ida/c0d0p2>                                                          
ida/c0d0p2's event counter: 00000002                                           
md0: former device ida/c0d1p2 is unavailable, removing from array!             
md0: former device ida/c0d2p1 is unavailable, removing from array!             
md: md0: raid array is not clean -- starting background reconstruction         
md0: max total readahead window set to 512k                                    
md0: 2 data-disks, max readahead per data-disk: 256k                           
raid5: device ida/c0d0p2 operational as raid disk 0                            
raid5: not enough operational devices for md0 (2/3 failed)                     
RAID5 conf printout:                                                            
--- rd:3 wd:1 fd:2                                                             
disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ida/c0d0p2                                 
disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00]                                
disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00]                               
raid5: failed to run raid set md0                                              
pers->run() failed ...                                                         
do_md_run() returned -22                                                       
md0 stopped.                                                                   
unbind<ida/c0d0p2,0>                                                           
export_rdev(ida/c0d0p2)

FWIW, I was able to read and write to /dev/md0 (/opt). I know that there were
some issues with cpqarray for RC1, so let me know if you would like this tested
on another multi-disc system. (is there another semi-recent piece of IA64
hardware w/ multiple drives to test on?)

Comment 4 Bill Nottingham 2001-05-22 21:01:54 UTC

You can use the Dell box, if you can find caddies & drives for it.

Comment 5 Ben Levenson 2001-05-23 21:13:53 UTC

No luck with the dell. The system hard-locks while installing packages.
(qa0522.0)

Comment 6 Ben Levenson 2001-05-29 22:56:17 UTC

I crammed a couple more drives in the Big Sur in the test lab.
The raid 5 install completed successfully! System boot time was just a few
seconds shy of normal -- many times faster than the first attempt.
from dmesg:
<snip>
raid5: measuring checksumming speed
   ia64      :   114.688 MB/sec
raid5: using function: ia64 (114.688 MB/sec)
raid5 personality registered as nr 4
autodetecting RAID arrays
(read) sda2's sb offset: 2048192 [events: 00000002]
(read) sdb5's sb offset: 2048192 [events: 00000002]
(read) sdc1's sb offset: 2048192 [events: 00000002]
autorun ...
considering sdc1 ...
  adding sdc1 ...
  adding sdb5 ...
  adding sda2 ...
created md0
bind<sda2,1>
bind<sdb5,2>
bind<sdc1,3>
running: <sdc1><sdb5><sda2>
sdc1's event counter: 00000002
sdb5's event counter: 00000002
sda2's event counter: 00000002
md: md0: raid array is not clean -- starting background reconstruction
md0: max total readahead window set to 512k
md0: 2 data-disks, max readahead per data-disk: 256k
raid5: device sdc1 operational as raid disk 1
raid5: device sdb5 operational as raid disk 2
raid5: device sda2 operational as raid disk 0
raid5: allocated 12656kB for md0
raid5: raid level 5 set md0 active with 3 out of 3 devices, algorithm 0
raid5: raid set md0 not clean; reconstructing parity
...

Is this a typical completion time for the resync on a 4 Gig raid5 setup (it
doesn't appear to be speeding up...i've checked on it several times in the
last few minutes):
cat /proc/mdstat:
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdc1[1] sdb5[2] sda2[0]
      4096384 blocks level 5, 64k chunk, algorithm 0 [3/3] [UUU]
      [==>..................]  resync = 12.9% (265216/2048192) finish=69.9min 
speed=424K/sec
unused devices: <none>

Comment 7 Ben Levenson 2001-06-27 20:34:04 UTC

raid 5 worked without issue for the gold release.
marking resolved "NOTABUG" since the problem did not surface
when a realistic RAID 5 scenario was created.

Comment 8 Ben Levenson 2001-06-27 20:34:24 UTC

Closing