Bug 77861

Summary:

Kernel lockup in qlogicfc0 driver

Product:

[Retired] Red Hat Linux

Reporter:

Hrunting Johnson <hrunting>

Component:

kernel

Assignee:

Dave Jones <davej>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

medium

Version:

7.2

CC:

pfrields, sct

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2003-12-17 13:26:22 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

77803

Attachments:

Description	Flags
output from readprofile -v	none

Description Hrunting Johnson 2002-11-14 15:10:28 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.0.3705)

Description of problem:
Compaq 8500, 8 P3 CPU, 4GB RAM, QLA2200 FC
RH 7.2, all errata, kernel 2.4.18-17.7.x

Under heavy load (backups, running real-time monitoring system, and lookupd 
data) across the fibre-channel card, the system locks up.  Upon 
reboot, /var/log/messages contains 36 lines like:

kernel: qlogicfc0 : no handle slots, this should not happen
kernel: hostdata->queued is 4d, in_ptr: 38

The '4d' and 'in_ptr' values will vary.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.boot system
2.run under heavy load
3.wait

Actual Results:  system locks up

Expected Results:  system does not lock up

Additional info:

http://ldm.bkbits.net:8080/linux-2.5-cpu/cset@1.621.1.10?nav=index.html%
7CChangeSet@-4w

This URL contains information about changes made in 2.5 that supposedly fix 
this problem.  Looks like a change was made in the way drivers need to handle 
locks (per device vs. global).

I consider 2.4.18-17.7.x to be an extremely buggy kernel.  This is the third 
bug related to this kernel I've filed since I upgraded an RH7.2 box to this 
kernel yesterday.  Was any QA done on this kernel at all?

Comment 1 Arjan van de Ven 2002-11-14 15:12:37 UTC

please use the qla2200 driver instead; that one is actually supported

Comment 2 Hrunting Johnson 2002-11-14 15:26:26 UTC

Will do.  Under 2.4.9-31, I was using the qla2x00 without incident, but that 
disappeared in the new release.  The qla2200 driver under that kernel never 
worked for us, so I didn't bother to try it again.

Comment 3 Hrunting Johnson 2002-11-14 23:01:38 UTC

Okay, switching to the supported qla2200 driver appears to fix the problems 
with the machine lockup (and another bug, 77803, which I have no idea why or 
how), but the kjournald thread for the ext3 partition that is on the RAID 
accessed through that card is taking up around 11% of the total CPU on the box, 
whereas before it took up around 2%.  Why the increase?  Is that qla2200 driver 
that poor?

Under 2.4.9-31 and the qla2x00 driver, we didn't have that much journal 
activity, but we were also running under a different VM.  Under the qlogicfc0 
driver and the new VM, we had basically the same system usage as the qla2x00 
driver.

Comment 4 Stephen Tweedie 2002-11-15 10:15:48 UTC

The 77803 bug is likely due to dropped interrupts if a driver change fixes it.

As for the kjournald overhead, that could be a number of things, including
bounce buffer overhead.  We'd need to see a kernel profile to have any hope of
diagnosing it.  (Boot with the kernel parameters "profile=2"; man readprofile to
see how to extract info.)

Comment 5 Hrunting Johnson 2002-11-22 14:14:49 UTC

At the risk of being taken for an idiot, when I enable profiling (with 
profile=2), no matter what, I always get:

# readprofile -m /boot/System.map-2.4.18-18.7.xbigmem 
     4 _stext                                     0.0500
     4 total                                      0.0000

No matter what.  Do I need to do something else to enable accurate profiling on 
this machine?  The system is under heavy load.  The /proc/profile file is 
constantly being updated (according to its timestamp), but it's always the same 
size, and it always contains that same data (in -v, everything is set to 0).

This is with 2.4.18-18.7.xbigmem.

Comment 6 Arjan van de Ven 2002-11-22 14:20:19 UTC

you need to ALSO specify nmi_watchdog=1 in addition to profile=

Comment 7 Hrunting Johnson 2002-11-22 17:25:38 UTC

Created attachment 86072 [details]
output from readprofile -v

Comment 8 Dave Jones 2003-12-17 02:34:44 UTC

Fixed in the 2.4.20-20 erratas ?

Comment 9 Hrunting Johnson 2003-12-17 13:11:11 UTC

Yes.