Description of problem:
We have a dual AMD opteron server with motherboard Tyan Thunder K8S Pro (S2882)
and onboard Ultra320 SCSI controller Adaptec 7902. Connected is a RAID system
Infotrend A16U-G2421 with S-ATA disks to U320 SCSI on host side.
All have the current BIOS installed (3.01 on Tyan motherboard,
3.42A.05 on Infotrend RAID system).
The server without the RAID system works fine with Linux FC3 i386 with kernel
2.6.10-1.770_FC3smp and other FC3 kernels and Linux FC4test1 x86_64 kernel
2.6.11-1.1177_FC4smp and 2.6.11-1.1219_FC4smp.
The RAID system firmware declares no error and normal working.
The RAID system volume size is 5.2 TByte, but I have divided it into serveral
logical partitions all less then 2 TByte and the partitions mapped to SCSI IDs.
I also tried a small partition with size 30 GByte.
But when I try to create a filesystem with the command
mke2fs -j -L /labelname -i 8192 -m 1 /dev/sdb1
then with all kernels and FC3 and FC4test1 the kernel reports SCSI errors.
The aic79xx driver tells 'Card was paused', dumps some SCSI controller driver
values again and again and then tells
'lost page write due to I/O error on sdb1'
'scsi5 (1:0): rejecting I/O to offline device'
I will attatch the whole error messages as attachment.
Version-Release number of selected component (if applicable):
kernel: 2.6.10-1.770_FC3smp, 2.6.11-1.7_FC3
Steps to Reproduce:
1. Connect, boot and configure the server and RAID system as described above.
2. Create a file system on the RAID system with
mke2fs -j -L /labelname -i 8192 -m 1 /dev/sdb1
Error messages, see attachments.
No error messages, file system should be created.
I tried two different SCSI cables and both SCSI channels both on the server
and the RAID system. The result are allways the same. I also tried different
partitions and different SCSI IDs. The is only one SCSI U320 cable between the
server and the RAID system. The RAID system is configured to terminate the
I also tried a single 9 GByte SCSI disk instead of the RAID system.
With this single 'normal' disk I could create a file system on that disk
without error, and could mount and write on it.
I also notice that 'fdisk' is working on both the single disk and the RAID
system partitions. The creation of a 'dos/linux' partition table in the
boot block of a RAID system partition seems to work, and /proc/partition
shows the right partitions even after reboot.
The first part of 'mke2fs' also seems to work, but when it writes the
superblock and filesystem informations, then it lasts a long time ('hangs').
At this time the kernel produces the error messages in /var/log/messages
and the console. Then sometimes the system crashes with kernel panic,
sometimes 'mke2fs' finishes with no error message! The only error messages
are in /var/log/messages and on the console.
I don't know if the reason for the problem is in the kernel (e.g.module aic79xx)
or in the RAID system controller. I can't interpret the SCSI driver dump
messages myself to find the reason. Maybe someone with more internel knowledge
of the aic79xx driver can see the reason for the problem?
I would be happy if the would be a solution
because we can't use the very big disk space of the RAID system!
Created attachment 112669 [details]
The messages in /var/log/messages from boot to crash
This is the part of /var/log/messages from one boot to a crash
while creating a file system on the RAID system.
Please have a look at the error messages after the normal boot messages!
For your information about the context I included the whole cycle messages.
The host names are changed for security reasons.
Created attachment 112671 [details]
The last console messages before the kernel panic
Created attachment 112672 [details]
The output of lsmod.
Created attachment 112673 [details]
The output of 'cat /proc/partitions'.
The problem still exists in FC4 x86_64. The messages are similar, I will attach
Created attachment 115909 [details]
Kernel messages from /var/log/messages.
[This comment has been added as a mass update for all FC4 kernel bugs.
If you have migrated this bug from an FC3 bug today, ignore this comment.]
Please retest your problem with todays 2.6.12-1.1398_FC4 update.
If your problem involved being unable to boot, or some hardware not being
detected correctly, please make sure your /etc/modprobe.conf is correct *BEFORE*
installing any kernel updates.
If in doubt, you can recreate this file using..
mv /etc/sysconfig/hwconf /etc/sysconfig/hwconf.bak
mv /etc/modprobe.conf /etc/modprobe.conf.bak
Mass update to all FC4 bugs:
An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (220.127.116.11). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.
Please retest with this update, and update this bug if necessary.
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.
This is a mass-update to all currently open kernel bugs.
A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
Closing per previous comment.