Bug 118296 - raid5: multiple requests... and crash
raid5: multiple requests... and crash
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
9
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-03-15 02:54 EST by Mogens Kjaer
Modified: 2007-04-18 13:04 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:41:51 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mogens Kjaer 2004-03-15 02:54:47 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031030

Description of problem:
I get the following in the log file:
Mar 13 21:57:30 mail kernel: raid5: multiple 0 requests for sector
211943424
Mar 13 21:57:30 mail kernel: raid5: multiple 1 requests for sector
211943424
Mar 13 22:00:33 mail kernel: raid5: multiple 0 requests for sector
103284744
Mar 14 04:06:12 mail kernel: raid5: multiple 1 requests for sector
260046944
Mar 14 04:20:07 mail kernel: raid5: multiple 0 requests for sector
171966472
Mar 14 04:20:08 mail kernel: raid5: multiple 1 requests for sector
171966472
...

Then 1-2 days after this starts, the machine hangs. Nothing in
the logfile (except for these messages). I've seen this problem
twice on one machine, and once for another machine. Both are
production servers, so...

Both machines run software raid1&raid5, both are HP Proliant ML370.

lsmod gives:

Module                  Size  Used by    Tainted: P  
cpqasm                335776  20 
cpqevt                  9248   2  [cpqasm]
autofs                 13684   1  (autoclean)
iptable_filter          2412   0  (autoclean) (unused)
ip_tables              15864   1  [iptable_filter]
tg3                    53064   1 
keybdev                 2976   0  (unused)
mousedev                5688   0  (unused)
hid                    22404   0  (unused)
input                   6208   0  [keybdev mousedev hid]
usb-ohci               22248   0  (unused)
usbcore                82816   1  [hid usb-ohci]
ext3                   73408   2 
jbd                    56368   2  [ext3]
raid5                  20072   1 
xor                     9064   0  [raid5]
raid1                  16076   2 
aic7xxx               142516  10 
sd_mod                 13452  20 
scsi_mod              110872   2  [aic7xxx sd_mod]

The machine that have crashed twice has the following raid
setup:
# cat /etc/raidtab
raiddev             /dev/md2
raid-level                  5
nr-raid-disks               3
chunk-size                  64k
persistent-superblock       1
nr-spare-disks              1
    device          /dev/sda3
    raid-disk     0
    device          /dev/sdb3
    raid-disk     1
    device          /dev/sdc3
    raid-disk     2
    device          /dev/sdd3
    spare-disk     0
raiddev             /dev/md0
raid-level                  1
nr-raid-disks               2
chunk-size                  64k
persistent-superblock       1
nr-spare-disks              1
    device          /dev/sda1
    raid-disk     0
    device          /dev/sdb1
    raid-disk     1
    device          /dev/sdd1
    spare-disk     0
raiddev             /dev/md1
raid-level                  1
nr-raid-disks               2
chunk-size                  64k
persistent-superblock       1
nr-spare-disks              1
    device          /dev/sda2
    raid-disk     0
    device          /dev/sdb2
    raid-disk     1
    device          /dev/sdd2
    spare-disk     0

And:

# cat /proc/mdstat 
Personalities : [raid1] [raid5] 
read_ahead 1024 sectors
md0 : active raid1 sdd1[2] sdb1[1] sda1[0]
      208704 blocks [2/2] [UU]
        resync=DELAYED
md1 : active raid1 sdd2[2] sdb2[1] sda2[0]
      3148672 blocks [2/2] [UU]
        resync=DELAYED
md2 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]
      280028800 blocks level 5, 64k chunk, algorithm 0 [3/3] [UUU]
      [=>...................]  resync =  7.7% (10871328/140014400)
finish=208.8min speed=10303K/sec

(this is after the last crash, that's why it rebuilds).

/boot is on md0, swap on md1, and / on md2.

The other machine has 8 disks, /boot on RAID1, swap
on RAID5, and / on RAID5.

This machine has crashed twice in three month, the other machine
has crashed once in eight month.

It is really frustrating, I hope someone can help...

Mogens

Version-Release number of selected component (if applicable):
2.4.20-30.9smp

How reproducible:
Sometimes

Steps to Reproduce:
1. Install rh9 on raid
2. wait...
3. crash!
    

Actual Results:  crash

Expected Results:  no crash

Additional info:
Comment 1 Mogens Kjaer 2004-08-19 09:21:43 EDT
The problem has been solved by installing a vanilla 2.4.26
kernel.
Comment 2 Bugzilla owner 2004-09-30 11:41:51 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.