153307 – SCSI driver problem with aic7902 and RAID system Infotrend A16U-G2421

Bug 153307 - SCSI driver problem with aic7902 and RAID system Infotrend A16U-G2421

Summary: SCSI driver problem with aic7902 and RAID system Infotrend A16U-G2421

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-04-04 17:20 UTC by Edgar Hoch
Modified:	2015-01-04 22:18 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-05-04 13:37:32 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
The messages in /var/log/messages from boot to crash (101.63 KB, text/plain) 2005-04-04 17:26 UTC, Edgar Hoch	no flags	Details
The last console messages before the kernel panic (5.70 KB, text/plain) 2005-04-04 17:27 UTC, Edgar Hoch	no flags	Details
The output of lsmod. (1.30 KB, text/plain) 2005-04-04 17:28 UTC, Edgar Hoch	no flags	Details
The output of 'cat /proc/partitions'. (270 bytes, text/plain) 2005-04-04 17:29 UTC, Edgar Hoch	no flags	Details
Kernel messages from /var/log/messages. (7.03 KB, text/plain) 2005-06-23 23:26 UTC, Edgar Hoch	no flags	Details
View All

Description Edgar Hoch 2005-04-04 17:20:39 UTC

Description of problem:

We have a dual AMD opteron server with motherboard Tyan Thunder K8S Pro (S2882)
and onboard Ultra320 SCSI controller Adaptec 7902. Connected is a RAID system
Infotrend A16U-G2421 with S-ATA disks to U320 SCSI on host side.
All have the current BIOS installed (3.01 on Tyan motherboard,
3.42A.05 on Infotrend RAID system).

The server without the RAID system works fine with Linux FC3 i386 with kernel
2.6.10-1.770_FC3smp and other FC3 kernels and Linux FC4test1 x86_64 kernel 
2.6.11-1.1177_FC4smp and 2.6.11-1.1219_FC4smp.
The RAID system firmware declares no error and normal working.

The RAID system volume size is 5.2 TByte, but I have divided it into serveral
logical partitions all less then 2 TByte and the partitions mapped to SCSI IDs.
I also tried a small partition with size 30 GByte.

But when I try to create a filesystem with the command
  mke2fs -j -L /labelname -i 8192 -m 1 /dev/sdb1
then with all kernels and FC3 and FC4test1 the kernel reports SCSI errors.
The aic79xx driver tells 'Card was paused', dumps some SCSI controller driver
values again and again and then tells
  'lost page write due to I/O error on sdb1'
  'scsi5 (1:0): rejecting I/O to offline device'

I will attatch the whole error messages as attachment.


Version-Release number of selected component (if applicable):
kernel: 2.6.10-1.770_FC3smp, 2.6.11-1.7_FC3
2.6.11-1.1177_FC4smp, 2.6.11-1.1219_FC4smp
FC3, FC4test1

How reproducible:
Allways.

Steps to Reproduce:
1. Connect, boot and configure the server and RAID system as described above.
2. Create a file system on the RAID system with
  mke2fs -j -L /labelname -i 8192 -m 1 /dev/sdb1

  
Actual results:
Error messages, see attachments.

Expected results:
No error messages, file system should be created.


Additional info:
I tried two different SCSI cables and both SCSI channels both on the server
and the RAID system. The result are allways the same. I also tried different
partitions and different SCSI IDs. The is only one SCSI U320 cable between the
server and the RAID system. The RAID system is configured to terminate the
SCSI bus.

I also tried a single 9 GByte SCSI disk instead of the RAID system.
With this single 'normal' disk I could create a file system on that disk
without error, and could mount and write on it.

I also notice that 'fdisk' is working on both the single disk and the RAID
system partitions. The creation of a 'dos/linux' partition table in the
boot block of a RAID system partition seems to work, and /proc/partition
shows the right partitions even after reboot.

The first part of 'mke2fs' also seems to work, but when it writes the
superblock and filesystem informations, then it lasts a long time ('hangs').
At this time the kernel produces the error messages in /var/log/messages
and the console. Then sometimes the system crashes with kernel panic,
sometimes 'mke2fs' finishes with no error message! The only error messages
are in /var/log/messages and on the console.

I don't know if the reason for the problem is in the kernel (e.g.module aic79xx)
or in the RAID system controller. I can't interpret the SCSI driver dump
messages myself to find the reason. Maybe someone with more internel knowledge
of the aic79xx driver can see the reason for the problem?

I would be happy if the would be a solution
because we can't use the very big disk space of the RAID system!

Comment 1 Edgar Hoch 2005-04-04 17:26:09 UTC

Created attachment 112669 [details]
The messages in /var/log/messages from boot to crash

This is the part of /var/log/messages from one boot to a crash
while creating a file system on the RAID system.

Please have a look at the error messages after the normal boot messages!

For your information about the context I included the whole cycle messages.
The host names are changed for security reasons.

Comment 2 Edgar Hoch 2005-04-04 17:27:34 UTC

Created attachment 112671 [details]
The last console messages before the kernel panic

Comment 3 Edgar Hoch 2005-04-04 17:28:32 UTC

Created attachment 112672 [details]
The output of lsmod.

Comment 4 Edgar Hoch 2005-04-04 17:29:18 UTC

Created attachment 112673 [details]
The output of 'cat /proc/partitions'.

Comment 5 Edgar Hoch 2005-06-23 23:24:00 UTC

The problem still exists in FC4 x86_64. The messages are similar, I will attach
them.

Comment 6 Edgar Hoch 2005-06-23 23:26:02 UTC

Created attachment 115909 [details]
Kernel messages from /var/log/messages.

Comment 7 Dave Jones 2005-07-15 21:38:25 UTC

[This comment has been added as a mass update for all FC4 kernel bugs.
 If you have migrated this bug from an FC3 bug today, ignore this comment.]

Please retest your problem with todays 2.6.12-1.1398_FC4 update.

If your problem involved being unable to boot, or some hardware not being
detected correctly, please make sure your /etc/modprobe.conf is correct *BEFORE*
installing any kernel updates.
If in doubt, you can recreate this file using..

mv /etc/sysconfig/hwconf /etc/sysconfig/hwconf.bak
mv /etc/modprobe.conf /etc/modprobe.conf.bak
kudzu


Thank you.

Comment 8 Dave Jones 2005-09-30 07:00:43 UTC

Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.

Comment 9 Dave Jones 2005-11-10 20:05:56 UTC

2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.

Comment 10 Dave Jones 2006-02-03 05:58:15 UTC

This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.

Comment 11 John Thacker 2006-05-04 13:37:32 UTC

Closing per previous comment.

Note You need to log in before you can comment on or make changes to this bug.