Bug 77861

Summary: Kernel lockup in qlogicfc0 driver
Product: [Retired] Red Hat Linux Reporter: Hrunting Johnson <hrunting>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.2CC: pfrields, sct
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-12-17 13:26:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 77803    
Attachments:
Description Flags
output from readprofile -v none

Description Hrunting Johnson 2002-11-14 15:10:28 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.0.3705)

Description of problem:
Compaq 8500, 8 P3 CPU, 4GB RAM, QLA2200 FC
RH 7.2, all errata, kernel 2.4.18-17.7.x

Under heavy load (backups, running real-time monitoring system, and lookupd 
data) across the fibre-channel card, the system locks up.  Upon 
reboot, /var/log/messages contains 36 lines like:

kernel: qlogicfc0 : no handle slots, this should not happen
kernel: hostdata->queued is 4d, in_ptr: 38

The '4d' and 'in_ptr' values will vary.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.boot system
2.run under heavy load
3.wait

Actual Results:  system locks up

Expected Results:  system does not lock up

Additional info:

http://ldm.bkbits.net:8080/linux-2.5-cpu/cset@1.621.1.10?nav=index.html%
7CChangeSet@-4w

This URL contains information about changes made in 2.5 that supposedly fix 
this problem.  Looks like a change was made in the way drivers need to handle 
locks (per device vs. global).

I consider 2.4.18-17.7.x to be an extremely buggy kernel.  This is the third 
bug related to this kernel I've filed since I upgraded an RH7.2 box to this 
kernel yesterday.  Was any QA done on this kernel at all?

Comment 1 Arjan van de Ven 2002-11-14 15:12:37 UTC
please use the qla2200 driver instead; that one is actually supported

Comment 2 Hrunting Johnson 2002-11-14 15:26:26 UTC
Will do.  Under 2.4.9-31, I was using the qla2x00 without incident, but that 
disappeared in the new release.  The qla2200 driver under that kernel never 
worked for us, so I didn't bother to try it again.

Comment 3 Hrunting Johnson 2002-11-14 23:01:38 UTC
Okay, switching to the supported qla2200 driver appears to fix the problems 
with the machine lockup (and another bug, 77803, which I have no idea why or 
how), but the kjournald thread for the ext3 partition that is on the RAID 
accessed through that card is taking up around 11% of the total CPU on the box, 
whereas before it took up around 2%.  Why the increase?  Is that qla2200 driver 
that poor?

Under 2.4.9-31 and the qla2x00 driver, we didn't have that much journal 
activity, but we were also running under a different VM.  Under the qlogicfc0 
driver and the new VM, we had basically the same system usage as the qla2x00 
driver.

Comment 4 Stephen Tweedie 2002-11-15 10:15:48 UTC
The 77803 bug is likely due to dropped interrupts if a driver change fixes it.

As for the kjournald overhead, that could be a number of things, including
bounce buffer overhead.  We'd need to see a kernel profile to have any hope of
diagnosing it.  (Boot with the kernel parameters "profile=2"; man readprofile to
see how to extract info.)

Comment 5 Hrunting Johnson 2002-11-22 14:14:49 UTC
At the risk of being taken for an idiot, when I enable profiling (with 
profile=2), no matter what, I always get:

# readprofile -m /boot/System.map-2.4.18-18.7.xbigmem 
     4 _stext                                     0.0500
     4 total                                      0.0000

No matter what.  Do I need to do something else to enable accurate profiling on 
this machine?  The system is under heavy load.  The /proc/profile file is 
constantly being updated (according to its timestamp), but it's always the same 
size, and it always contains that same data (in -v, everything is set to 0).

This is with 2.4.18-18.7.xbigmem.

Comment 6 Arjan van de Ven 2002-11-22 14:20:19 UTC
you need to ALSO specify nmi_watchdog=1 in addition to profile=

Comment 7 Hrunting Johnson 2002-11-22 17:25:38 UTC
Created attachment 86072 [details]
output from readprofile -v

Comment 8 Dave Jones 2003-12-17 02:34:44 UTC
Fixed in the 2.4.20-20 erratas ?


Comment 9 Hrunting Johnson 2003-12-17 13:11:11 UTC
Yes.