Bug 65621 - MegaRAID has scsi timeout errors/hangs with 2.4.9-XX kernels
MegaRAID has scsi timeout errors/hangs with 2.4.9-XX kernels
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-05-28 15:21 EDT by Joe Rhett
Modified: 2008-08-01 12:22 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:39:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joe Rhett 2002-05-28 15:21:40 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; T312461)

Description of problem:
RedHat 7.1, LPR2000 server with NetRaid-1M controller, megaraid kernel.  Using 
kernel 2.4.2-2 (from installation) works perfectly fine.  Installing any of the 
upgrade kernels (2.4.9 chain) has the server locking up about 1x per day.  This 
is not related to cpu or disk activity, and will happen during 0 load.

Version-Release number of selected component (if applicable): 7.1


How reproducible:
Always

Steps to Reproduce:
1. Install 7.1 on a system with NetRAID-1m controller
2. Upgrade to 2.4.9-any kernel
3. Wait for 24-28 hours.

Actual Results:  System hangs once a day with the timeout messages shown below.

Expected Results:  System remains up and stable.

Additional info:

On the console you get the following

scsi: aborting command due to timeout: pid 0, scsi 0, channel 1, id 0, lun 0 
Write (10) 00 01 5f 76 4e 00 00 70 00
scsi0 channel 1: resetting for second half of retries.
SCSI bus is being reset for host 0 channel 1.
megaraid_RESET: 00000000 cmd=2a <c=1.t=0.1=0>, flag = 1
scsi0: device driver called scsi_done() for a synchronous reset.
SCSI host 0 channel 1 reset (pid 0) timed out - trying harder
SCSI bus is being reset for host 0 channel 1.
megaraid_RESET: 0003495e cmd=2a <c=1.t=0.1=0>, flag = 6
SCSI host 0 reset (pid 0) timed out again - probably an unrecoverable SCSI bus 
or device hang.

From /var/log/messages:

May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a0 04 00 03 18 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a3 1c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a3 ec 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a4 bc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a5 8c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a6 5c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a7 2c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a7 fc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a8 cc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f a9 9c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f aa 6c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ab 3c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ac 0c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ac dc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ad ac 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ae 7c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f af 4c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b0 1c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b0 ec 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b1 bc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b2 8c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b3 5c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b4 2c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b4 fc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b5 cc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b6 9c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b7 6c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b8 3c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b9 0c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f b9 dc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ba ac 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f bb 7c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f bc 4c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f bd 1c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f bd ec 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f be bc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f bf 8c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c0 5c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c1 2c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c1 fc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c2 cc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c3 9c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c4 6c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c5 3c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c6 0c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c6 dc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c7 ac 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c8 7c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f c9 4c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ca 1c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ca ec 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f cb bc 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f cc 8c 00 00 d0 00
May 20 15:36:32 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f cd 5c 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ce 2c 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ce fc 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f cf cc 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f d0 9c 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f d1 6c 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f d2 3c 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f d3 0c 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f d3 dc 00 00 d0 00
May 20 15:36:33 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f d4 ac 00 00 d0 00
May 20 15:37:59 abq-bizapps1 kernel: scsi : aborting command due to timeout : 
pid 0, scsi0, channel 1, id 0, lun 0 Write (10) 00 09 3f ae 7c 00 00 d0 00

EXACT SAME HARDWARE with RedHat 7.2 Linux running 2.4.9-31 kernel has no 
problems.  Backgrading to 2.4.2-2 kernel has no problems.  Only experienced 
this problem with 2.4.9 kernels with RedHat 7.1.

We tried out the first update, which was 2.4.9-9 or some such. Had the lockup 
problems.  Backgraded.  Had 8 months of problem-less operation at 2.4.2-2, 
upgrading to 2.4.9-31 displayed the problem again.
Comment 1 Adi Linden 2003-05-27 11:39:16 EDT
I am experiencing the same problem using RedHat 7.1 and the 2.4.20-13.7 kernel.
I will try going to the 2.4.2-2 kernel to see if it eliminates the problem.
Something further I have to add, this problem first occured when we used the
second SCSI channel on the PERC2/DC card. Everything had been working just fine
using just SCSI channel 0.
Comment 2 EZ 2004-02-21 12:41:00 EST
SAme problem with RH9 with latest updates and kernel
Comment 3 Bugzilla owner 2004-09-30 11:39:38 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.