Description of problem: We have a dual AMD opteron server with motherboard Tyan Thunder K8S Pro (S2882) and onboard Ultra320 SCSI controller Adaptec 7902. Connected is a RAID system Infotrend A16U-G2421 with S-ATA disks to U320 SCSI on host side. All have the current BIOS installed (3.01 on Tyan motherboard, 3.42A.05 on Infotrend RAID system). The server without the RAID system works fine with Linux FC3 i386 with kernel 2.6.10-1.770_FC3smp and other FC3 kernels and Linux FC4test1 x86_64 kernel 2.6.11-1.1177_FC4smp and 2.6.11-1.1219_FC4smp. The RAID system firmware declares no error and normal working. The RAID system volume size is 5.2 TByte, but I have divided it into serveral logical partitions all less then 2 TByte and the partitions mapped to SCSI IDs. I also tried a small partition with size 30 GByte. But when I try to create a filesystem with the command mke2fs -j -L /labelname -i 8192 -m 1 /dev/sdb1 then with all kernels and FC3 and FC4test1 the kernel reports SCSI errors. The aic79xx driver tells 'Card was paused', dumps some SCSI controller driver values again and again and then tells 'lost page write due to I/O error on sdb1' 'scsi5 (1:0): rejecting I/O to offline device' I will attatch the whole error messages as attachment. Version-Release number of selected component (if applicable): kernel: 2.6.10-1.770_FC3smp, 2.6.11-1.7_FC3 2.6.11-1.1177_FC4smp, 2.6.11-1.1219_FC4smp FC3, FC4test1 How reproducible: Allways. Steps to Reproduce: 1. Connect, boot and configure the server and RAID system as described above. 2. Create a file system on the RAID system with mke2fs -j -L /labelname -i 8192 -m 1 /dev/sdb1 Actual results: Error messages, see attachments. Expected results: No error messages, file system should be created. Additional info: I tried two different SCSI cables and both SCSI channels both on the server and the RAID system. The result are allways the same. I also tried different partitions and different SCSI IDs. The is only one SCSI U320 cable between the server and the RAID system. The RAID system is configured to terminate the SCSI bus. I also tried a single 9 GByte SCSI disk instead of the RAID system. With this single 'normal' disk I could create a file system on that disk without error, and could mount and write on it. I also notice that 'fdisk' is working on both the single disk and the RAID system partitions. The creation of a 'dos/linux' partition table in the boot block of a RAID system partition seems to work, and /proc/partition shows the right partitions even after reboot. The first part of 'mke2fs' also seems to work, but when it writes the superblock and filesystem informations, then it lasts a long time ('hangs'). At this time the kernel produces the error messages in /var/log/messages and the console. Then sometimes the system crashes with kernel panic, sometimes 'mke2fs' finishes with no error message! The only error messages are in /var/log/messages and on the console. I don't know if the reason for the problem is in the kernel (e.g.module aic79xx) or in the RAID system controller. I can't interpret the SCSI driver dump messages myself to find the reason. Maybe someone with more internel knowledge of the aic79xx driver can see the reason for the problem? I would be happy if the would be a solution because we can't use the very big disk space of the RAID system!
Created attachment 112669 [details] The messages in /var/log/messages from boot to crash This is the part of /var/log/messages from one boot to a crash while creating a file system on the RAID system. Please have a look at the error messages after the normal boot messages! For your information about the context I included the whole cycle messages. The host names are changed for security reasons.
Created attachment 112671 [details] The last console messages before the kernel panic
Created attachment 112672 [details] The output of lsmod.
Created attachment 112673 [details] The output of 'cat /proc/partitions'.
The problem still exists in FC4 x86_64. The messages are similar, I will attach them.
Created attachment 115909 [details] Kernel messages from /var/log/messages.
[This comment has been added as a mass update for all FC4 kernel bugs. If you have migrated this bug from an FC3 bug today, ignore this comment.] Please retest your problem with todays 2.6.12-1.1398_FC4 update. If your problem involved being unable to boot, or some hardware not being detected correctly, please make sure your /etc/modprobe.conf is correct *BEFORE* installing any kernel updates. If in doubt, you can recreate this file using.. mv /etc/sysconfig/hwconf /etc/sysconfig/hwconf.bak mv /etc/modprobe.conf /etc/modprobe.conf.bak kudzu Thank you.
Mass update to all FC4 bugs: An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream kernel (2.6.13.2). As there were ~3500 changes upstream between this and the previous kernel, it's possible your bug has been fixed already. Please retest with this update, and update this bug if necessary. Thanks.
2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Closing per previous comment.