Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 222460

Summary:

qla4xxx/qla3xxx: co-existence issues during load/unload of either interface

Product:

Red Hat Enterprise Linux 4

Reporter:

Mike Christie <mchristi>

Component:

kernel

Assignee:

Mike Christie <mchristi>

Status:

CLOSED DUPLICATE

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.0

CC:

andriusb, coughlan, dwm, karen.higgins, konradr, mbarrow, ravi.anand, rkenna

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-02-26 21:08:18 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

209341, 216986

Attachments:

Description	Flags
v5.01.00-d5	none
Fix MII register access wait.	none

Description Mike Christie 2007-01-12 16:59:40 UTC

+++ This bug was initially created as a clone of Bug #221328 +++

Description of problem:

qla4xxx/qla3xxx: co-existence issues during load/unload of either interface

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.With qla4xxx and qla3xxx modules loaded,
unloading/loading or bringdown/up one of the qla3xxx interfaces can sometimes
lock up the qla4xxx driver. 
2.With qla4xxx and qla3xxx modules loaded, unloading the qla4xxx module on a
4052 (one with two ports) can lock up the qla4xxx driver.
3.Another simple test is to run traffic on the ISCSI side and simply unload the
qla3xxx module. This would cause the iscsi traffic to stop.
  
Actual results:


Expected results:

One should be able to unload/load either of the modules or bring up/down the
ethernet interfaces with out the other module knowing about it.

Additional info:


The reason is as follows:
qla4xxx supports one Ethernet and one iscsi Interfaces per port.
qla4052 has two physical ports and qla4050 has one port.
qla4052 presents four PCI devices and qla4050 presents two PCI devices.

Irrespective of how many PCI devices are presented, they all share the same
hardware state machine. When one of the interface resets the hardware all other
interface are notified via an interrupt of this event. All interfaces need to
acknowledge the interrupt before the reset actually completes. (please note
there is a timeout mechanism which takes care of the case when one or more of
the interfaces do not acknowledge). To maintain atomicity during
resets/initialization of the hardware a semaphore register is provided. The
individual driver instances corresponding to each interface need to grab this
common semaphore before performing a reset/initialization. As part of the module
unload we leave the chip in a known state buy doing a reset. 

When the ethernet interface is going down or when qla3xxx is unloaded, the
qla4xxx driver is posted an interrupt which results in the initialization
process getting kicked. However the hardware semaphore was not grabbed prior to
it which results in the lock up.

The other case is on a qla4052, when the qla4xxx module is unloaded,
qla4xxx_remove_adapter() gets invoked for the first iSCSI function, the
resulting reset triggers a notification to the second iSCSI function, which
begins its initialization. During this time the qla4xxx_remove_adapter()
completes for the first function and gets called for the second one. This screws
up everything. So I added ql4_mod_unload flag to make sure the second function
does not start initializing when the module is getting unloaded.

Comment 1 Karen Higgins 2007-01-13 01:04:24 UTC

Created attachment 145510 [details]
v5.01.00-d5

Bug fix to acquire drvr semaphore prior to resetting card during driver unload.
 Reproduced bug and verified fix on kernel-2.6.9-42.40 dual proc x86_64 system.

Comment 3 Mike Christie 2007-01-16 17:31:50 UTC

(In reply to comment #1)
> Created an attachment (id=145510) [edit]
> v5.01.00-d5
> 
> Bug fix to acquire drvr semaphore prior to resetting card during driver unload.
>  Reproduced bug and verified fix on kernel-2.6.9-42.40 dual proc x86_64 system.

Thanks for the patch. It fixes the hard lock up found with just the last patch,
but the soft lock is still there. It is harder to hit now though. I am in the
middle of recompiling the kernel with some debugging to see if there is anything
detectable.

Comment 4 Karen Higgins 2007-01-16 19:35:31 UTC

Mike,

From the following steps:
1.With qla4xxx and qla3xxx modules loaded,
unloading/loading or bringdown/up one of the qla3xxx interfaces can sometimes
lock up the qla4xxx driver. 
2.With qla4xxx and qla3xxx modules loaded, unloading the qla4xxx module on a
4052 (one with two ports) can lock up the qla4xxx driver.
3.Another simple test is to run traffic on the ISCSI side and simply unload the
qla3xxx module. This would cause the iscsi traffic to stop.

Which one is causing the soft lockup to happen?

Comment 5 Mike Christie 2007-01-16 19:51:00 UTC

With a 4052, I load qla3xxx and qla4xxx. qla3xxx is setup as eth0 and eth1.
qla4xxx is setup with a session in the db, but the target is not connected.
There is no traffic on the iscsi or network interface. When I do ifdown on eth0
or eth1, the box locks up. I cannot move the mouse or type anything in the
console. Then, maybe a minute or two later the box unfreezes and everything
works again. I think this is your #1.

Comment 7 Ron Mercer 2007-01-19 22:11:44 UTC

Created attachment 146040 [details]
Fix MII register access wait.

This patch fixes a condition where the network driver was busy waiting for the
MII register to become ready.  It was looping without giving up the processor.
The loop duration is still 10ms, but a schedule_timeout is not called via
mdelay() instead of udelay().

Comment 8 Ron Mercer 2007-01-19 22:13:28 UTC

I meant to say "but a schedule_timeout is "now" called via
mdelay() instead of udelay()."

Comment 12 ravi anand 2007-01-30 17:50:22 UTC

Seems like it should be really msleep() and not mdelay().  msleep() invokes schedule_timeout() whie 
mdelay() is busy waiting function.

Ravi

Comment 13 Mike Christie 2007-01-30 18:17:32 UTC

If you are trying to give up the processer use msleep, I agree with Ravi.

I think some places where you are using msleep and ssleep today though, you
should not be holding a spin lock with irqs off. There is a kernel compile time
debug option to check for this, DEBUG_SPINLOCK_SLEEP. It is not a default option
for RHEL, but you can run with it yourself if you compile your own kernel.

Comment 14 Mike Christie 2007-01-30 18:36:38 UTC

(In reply to comment #13)
> debug option to check for this, DEBUG_SPINLOCK_SLEEP. It is not a default option
> for RHEL, but you can run with it yourself if you compile your own kernel.

Actually, it should be compiled in your kernel already.

Comment 15 Ron Mercer 2007-01-30 19:59:10 UTC

I am running with CONFIG_DEBUG_SPINLOCK_SLEEP already.  I will look into the 
areas where sleep is called with irqs off.

I haven't pushed the patch upstream so will change delay to sleep when I do.

Comment 17 Andrius Benokraitis 2007-02-26 21:08:18 UTC


*** This bug has been marked as a duplicate of 228416 ***