Bug 114425

Summary: boot hangs when loading aic7xx module with device attached to card Adaptec 29160N Ultra160
Product: Red Hat Enterprise Linux 3 Reporter: Need Real Name <irina>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: bugzilla, k.georgiou, petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-19 18:43:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2004-01-27 23:16:14 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922

Description of problem:
booting SMP kernel on dell precision 650N with a tape drive attached
to Adaptec 29160N Ultra160 scsi card hangs after aic7xxx module
is loaded. system can be rebooted only by turning power off and on.

UP kernel boots fine with the same device.

kernel-smp-2.4.21-9.EL has aic7xxx Rev 6.2.36.

upgrading aic7xxx module to Rev 6.3.4 (from
/people.freebsd.org/~gibbs/linux/) fixed the problem for smp kernel,
so this version probably can be included in the next kernel? 

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-9.EL

How reproducible:
Always

Steps to Reproduce:
1. connect Exabyte tape drive (narrow scsi device) to Adaptec 29160N
Ultra160 scsi card
2. reboot with smp kernel (kernel-smp-2.4.21-9.E)
3.
    

Actual Results:  
 computer will be frozen after aic7xxx is loaded.

Expected Results:   
 normal boot

Additional info:
 
 tried this on 2 identical (hardware wise) dells. both got locked up.

Comment 3 Tom Coughlan 2005-01-14 18:29:53 UTC
Sorry for the delay in looking at this. 

We will not use the aic7xxx Rev 6.3.4 (from
/people.freebsd.org/~gibbs/linux/) because there are a number of 
changes in that driver that are not acceptable. We will need to
identify the specifc fix for this problem.

Have you tried to reproduce this problem on a recent RHEL 3 kernel? If
so, please post the console messages leading up to the hang. If you
are still willing to pursue this, I will give you a debug driver to
try to identify the problem. 



Comment 4 Scott Russell 2005-06-22 22:29:46 UTC
Tom - 

I've seen this problem, or something very similar, on all RHEL3 SMP i686
kernels. My hardware is an IBM x236 with an Adaptec 29160B Ultra160 SCSI
adapter. The only device attached to the adapter is an IBM Tape library:

Attached devices:
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: IBM      Model: ULTRIUM-TD2      Rev: 3AY4
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: IBM      Model: ULTRIUM-TD2      Rev: 3AY4
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: IBM      Model: ULTRIUM-TD2      Rev: 3AY4
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: IBM      Model: ULTRIUM-TD2      Rev: 3AY4
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 06 Lun: 00
  Vendor: IBM      Model: 4560SLX          Rev: 0425
  Type:   Medium Changer                   ANSI SCSI revision: 02

I'm more than happy to test using the current RHEL3 AS U5 errata kernel
kernel-smp-2.4.21-32.0.1.EL (or newer) and would like to get a hold of the debug
module you mention above. Are you still willing to supply the debug aic7xxx
module and assist in resolving this issue?

Comment 5 Tom Coughlan 2005-06-23 12:54:55 UTC
Yes, I would like to get this fixed.

Please post /var/log/messages, showing the aic driver being loaded, and any
other messages up to the time of the hang. Does the system hang right after the
driver loads, or is there some I/O involved? Try booting the SMP kernel with the
NOAPIC kernel parameter.  



Comment 6 Scott Russell 2005-06-23 13:05:23 UTC
Capturing the boot log messages will require a serial terminal since once it
hangs the system tends not to boot and the forced reset looses the boot log. In
other words, give me a day or two to recreate and provide what you want. 

Can you explain why you think the NOAPIC kernel parm has relivance? I know that
everything boots fine if the tape devices are turned off or disconnected from
the scsi chain. I'm not sure I see how the NOAPIC option would make a difference.

Thanks for the quick response.

Comment 7 Tom Coughlan 2005-06-23 13:40:10 UTC
APIC is one of the things that is implicated when there is a hard hang like
this. It is also one of things that behaves differently UP vs. SMP.  

While you have the serial console, please try to get some alt-sysrq output.

Before the hang: 

echo 1 > /proc/sys/kernel/sysrq

Then after the hang try alt-sysrq-t, alt-sysrq-m.

Also, try turning on the nmi watchdog timer. It will hopefully cause a panic
after the hang. The console output from that would be a big help. 

On the kernel command line, add: "nmi_watchdog=1".

Comment 8 Tom Coughlan 2005-09-19 18:43:38 UTC
This has been in NEEDINFO for nearly three months. We will assume the problem was
not reproducible or has been fixed in a later RHEL 3 update. If this problem
still exists, please reopen and provide the requested info.