Bug 221328

Summary: [RHEL 5.0] qla4xxx/qla3xxx: co-existence issues during load/unload of either interface
Product: Red Hat Enterprise Linux 5 Reporter: David Somayajulu <david.somayajulu>
Component: kernelAssignee: Mike Christie <mchristi>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 5.0CC: andriusb, coughlan, dwm, konradr, mbarrow, rkenna
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RC Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-08 02:07:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 216985    
Attachments:
Description Flags
qla4xxx/qla3xxx: co-existence issues during load/unload of either interface
none
update patch.net none

Description David Somayajulu 2007-01-03 20:50:25 UTC
Description of problem:

qla4xxx/qla3xxx: co-existence issues during load/unload of either interface

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.With qla4xxx and qla3xxx modules loaded,
unloading/loading or bringdown/up one of the qla3xxx interfaces can sometimes
lock up the qla4xxx driver. 
2.With qla4xxx and qla3xxx modules loaded, unloading the qla4xxx module on a
4052 (one with two ports) can lock up the qla4xxx driver.
3.Another simple test is to run traffic on the ISCSI side and simply unload the
qla3xxx module. This would cause the iscsi traffic to stop.
  
Actual results:


Expected results:

One should be able to unload/load either of the modules or bring up/down the
ethernet interfaces with out the other module knowing about it.

Additional info:


The reason is as follows:
qla4xxx supports one Ethernet and one iscsi Interfaces per port.
qla4052 has two physical ports and qla4050 has one port.
qla4052 presents four PCI devices and qla4050 presents two PCI devices.

Irrespective of how many PCI devices are presented, they all share the same
hardware state machine. When one of the interface resets the hardware all other
interface are notified via an interrupt of this event. All interfaces need to
acknowledge the interrupt before the reset actually completes. (please note
there is a timeout mechanism which takes care of the case when one or more of
the interfaces do not acknowledge). To maintain atomicity during
resets/initialization of the hardware a semaphore register is provided. The
individual driver instances corresponding to each interface need to grab this
common semaphore before performing a reset/initialization. As part of the module
unload we leave the chip in a known state buy doing a reset. 

When the ethernet interface is going down or when qla3xxx is unloaded, the
qla4xxx driver is posted an interrupt which results in the initialization
process getting kicked. However the hardware semaphore was not grabbed prior to
it which results in the lock up.

The other case is on a qla4052, when the qla4xxx module is unloaded,
qla4xxx_remove_adapter() gets invoked for the first iSCSI function, the
resulting reset triggers a notification to the second iSCSI function, which
begins its initialization. During this time the qla4xxx_remove_adapter()
completes for the first function and gets called for the second one. This screws
up everything. So I added ql4_mod_unload flag to make sure the second function
does not start initializing when the module is getting unloaded.

This patch has been verified by IBM as well

Comment 1 David Somayajulu 2007-01-03 20:50:26 UTC
Created attachment 144735 [details]
qla4xxx/qla3xxx: co-existence issues during load/unload of either interface

Comment 2 Andrius Benokraitis 2007-01-03 21:08:41 UTC
How different is this bug from the following previously filed bugs:

Bug 215641
Bug 216255
Bug 217546
Bug 220246


Comment 3 Mike Christie 2007-01-04 20:12:24 UTC
Adding devel_ack.

This is related to the bugs listed that are hit because the hw state machine is
the same one for both qla3xxx and qla4xxx. So all the bugs except 216255 which
is a fw timing issue that is not related to another driver using it.

The patch fixes my soft lock up, but now it causes the machine to barf out a lot
of soft lock up warnings. Which is worse? :) I am stilling looking into this and
so is Qlogic.

Comment 4 RHEL Program Management 2007-01-04 20:14:22 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Jay Turner 2007-01-04 21:06:13 UTC
QE ack under the condition that merely up/down on the interface will cause the
hang.  I'm less concerned if the problem only occurs on removeal of the modules.

Comment 6 Mike Christie 2007-01-05 00:08:29 UTC
(In reply to comment #5)
> QE ack under the condition that merely up/down on the interface will cause the
> hang.  I'm less concerned if the problem only occurs on removeal of the modules.

It occurs with either a rmmod or up/down of the interface.

Comment 8 David Somayajulu 2007-01-05 21:17:56 UTC
For some reason I haven't been receiving any emails corresponding to the
comments above, like I do for the bugs in Bugzilla I am cc'ed on. - david S.

Comment 9 Andrius Benokraitis 2007-01-05 21:35:55 UTC
Since you are the bug reporter, you should see all comments via email...

Comment 10 Tom Coughlan 2007-01-05 22:04:56 UTC
We will need a fix for this by the end of the day Monday. 

Comment 11 Andrius Benokraitis 2007-01-10 04:29:04 UTC
Since no fix was proposed by EOB 8-Jan-07, this is now deferred to RHEL 5.1. A
pending solution is to disable the qla3xxx driver in 5.0 GA, and this bugzilla
to be used in 5.1 to re-enable and find a permanent fix so that both drivers can
be used together.

Comment 12 Mike Christie 2007-01-10 04:45:08 UTC
Andrius, I just got a updated patch from Qlogic. It would be EOB for them since
they are on the west coast :) Can I still send this for 5.0?

Comment 15 Jay Turner 2007-01-10 15:27:25 UTC
Built into 2.6.18-1.3002.el5.

Comment 16 Mike Christie 2007-01-10 18:22:29 UTC
Created attachment 145271 [details]
update patch.net

Update patch.net

- Fix race in test and clear of DPC_RESET_HA_INTR.
- Fix coding style nits to match what will be upstream.
- Bump driver version.

Comment 17 Andrius Benokraitis 2007-01-24 17:05:18 UTC
Patch in Comment #16 should move to a new bug to be proposed in 5.1. Doing that now.

Comment 18 Andrius Benokraitis 2007-01-24 17:21:19 UTC
See bug 224203 for patch in Comment #16 to be proposed in RHEL 5.1.

Comment 19 RHEL Program Management 2007-02-08 02:07:49 UTC
A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.