77861 – Kernel lockup in qlogicfc0 driver

Bug 77861 - Kernel lockup in qlogicfc0 driver

Summary: Kernel lockup in qlogicfc0 driver

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.2
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	77803
TreeView+	depends on / blocked

Reported:	2002-11-14 15:10 UTC by Hrunting Johnson
Modified:	2015-01-04 22:02 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-12-17 13:26:22 UTC
Embargoed:

Attachments	(Terms of Use)
output from readprofile -v (53.26 KB, text/plain) 2002-11-22 17:25 UTC, Hrunting Johnson	no flags	Details
View All

Description Hrunting Johnson 2002-11-14 15:10:28 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.0.3705)

Description of problem:
Compaq 8500, 8 P3 CPU, 4GB RAM, QLA2200 FC
RH 7.2, all errata, kernel 2.4.18-17.7.x

Under heavy load (backups, running real-time monitoring system, and lookupd 
data) across the fibre-channel card, the system locks up.  Upon 
reboot, /var/log/messages contains 36 lines like:

kernel: qlogicfc0 : no handle slots, this should not happen
kernel: hostdata->queued is 4d, in_ptr: 38

The '4d' and 'in_ptr' values will vary.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.boot system
2.run under heavy load
3.wait

Actual Results:  system locks up

Expected Results:  system does not lock up

Additional info:

http://ldm.bkbits.net:8080/linux-2.5-cpu/cset@1.621.1.10?nav=index.html%
7CChangeSet@-4w

This URL contains information about changes made in 2.5 that supposedly fix 
this problem.  Looks like a change was made in the way drivers need to handle 
locks (per device vs. global).

I consider 2.4.18-17.7.x to be an extremely buggy kernel.  This is the third 
bug related to this kernel I've filed since I upgraded an RH7.2 box to this 
kernel yesterday.  Was any QA done on this kernel at all?

Comment 1 Arjan van de Ven 2002-11-14 15:12:37 UTC

please use the qla2200 driver instead; that one is actually supported

Comment 2 Hrunting Johnson 2002-11-14 15:26:26 UTC

Will do.  Under 2.4.9-31, I was using the qla2x00 without incident, but that 
disappeared in the new release.  The qla2200 driver under that kernel never 
worked for us, so I didn't bother to try it again.

Comment 3 Hrunting Johnson 2002-11-14 23:01:38 UTC

Okay, switching to the supported qla2200 driver appears to fix the problems 
with the machine lockup (and another bug, 77803, which I have no idea why or 
how), but the kjournald thread for the ext3 partition that is on the RAID 
accessed through that card is taking up around 11% of the total CPU on the box, 
whereas before it took up around 2%.  Why the increase?  Is that qla2200 driver 
that poor?

Under 2.4.9-31 and the qla2x00 driver, we didn't have that much journal 
activity, but we were also running under a different VM.  Under the qlogicfc0 
driver and the new VM, we had basically the same system usage as the qla2x00 
driver.

Comment 4 Stephen Tweedie 2002-11-15 10:15:48 UTC

The 77803 bug is likely due to dropped interrupts if a driver change fixes it.

As for the kjournald overhead, that could be a number of things, including
bounce buffer overhead.  We'd need to see a kernel profile to have any hope of
diagnosing it.  (Boot with the kernel parameters "profile=2"; man readprofile to
see how to extract info.)

Comment 5 Hrunting Johnson 2002-11-22 14:14:49 UTC

At the risk of being taken for an idiot, when I enable profiling (with 
profile=2), no matter what, I always get:

# readprofile -m /boot/System.map-2.4.18-18.7.xbigmem 
     4 _stext                                     0.0500
     4 total                                      0.0000

No matter what.  Do I need to do something else to enable accurate profiling on 
this machine?  The system is under heavy load.  The /proc/profile file is 
constantly being updated (according to its timestamp), but it's always the same 
size, and it always contains that same data (in -v, everything is set to 0).

This is with 2.4.18-18.7.xbigmem.

Comment 6 Arjan van de Ven 2002-11-22 14:20:19 UTC

you need to ALSO specify nmi_watchdog=1 in addition to profile=

Comment 7 Hrunting Johnson 2002-11-22 17:25:38 UTC

Created attachment 86072 [details]
output from readprofile -v

Comment 8 Dave Jones 2003-12-17 02:34:44 UTC

Fixed in the 2.4.20-20 erratas ?

Comment 9 Hrunting Johnson 2003-12-17 13:11:11 UTC

Yes.

Note You need to log in before you can comment on or make changes to this bug.