Red Hat Bugzilla – Bug 504181
[Broadcom 5.4 bug] Include fixes/cleanups for bnx2i
Last modified: 2009-09-03 09:55:29 EDT
Description of problem:
RHEL reviewers found some issues with the bnx2i driver including
- rhel patch had a ifdef LINUX_VERSION that is of course not needed
- rhel patch had implemented a callout that is not called and not upstream so it should be removed
- rhel patch had some bnx2i rhel compat stuff that was not needed because in another patch I added it to a common iscsi compat.h file.
- Remove BKL use if possible
While reviewing their review comments and testing new patches I found the following issues:
- wait_event_interruptible is called with spin locks held in a couple places.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
There is also a fix in the handling of the expected statsn in the upstream bnx2i version. Please include it.
Could you send me a patch for all the bnx2i changes you need/want. I thought you attached that to 441979.
I was attaching the cleanup/lock fix/bug fix patches to 441979 to bring everything inline with upstream this morning, but Andrius stopped me. Should I attach those patches here? The patches will address the problems in the Description plus the fix for expected statsn.
Yeah, please do.
Sorry for the confusion. I thought you had uploaded everything before Andrius busted you :)
Created attachment 347090 [details]
cnic cleanup patch (1/4)
Patch for -152.el5 kernel to remove LINUX_VERSION_CODE and to change nl_msg_recv() to return int.
Created attachment 347091 [details]
uio BKL removal patch (2/4)
Backported uio BKL removal patch for -152.el5 kernel
Created attachment 347092 [details]
bnx2i/cnic cleanup patch 3/4
Remove unused cnic/bnx2i inetevent handler for -152.el5.
Created attachment 347093 [details]
bnx2i bug fix and cleanup (4/4)
bnx2i bug fixes and cleanup for -152.el5:
- Locking bugfix in ep_connect()/ep_disconnect() that was noticed by Mike.
- expected statsn handling fix.
- other minor cleanups.
A big thanks to Mike Christie for helping us get this merged upstream and into RHEL5.4.
What is up with the hba->lock around some bnx2i_reg_device accesses? It seems useless since different hbas could have their own lock and increment/decrement it?
Was supposed to be a per hba variable? If not maybe you want to make it atomic?
Also bnx2i_init_one always returns 0. Was it supposed to return rc in the failure path? The only caller checks for a non zero return value.
Yeah, we've actually discussed this when Anil sent out his patch for internal review, but all of us failed to realize that bnx2i_reg_device is global. I think it is meant to be a global counter but it is not very clear to me the purpose of this counter. It is only checked in bnx2i_init_one(). It should either be protected by the bnx2i_dev_lock or converted to atomic.
Yeah, bnx2i_init_one() looks buggy to me in the error path.
Adding Anil to CC.
Yeah, 'bnx2i_reg_device' should be defined as atomic. bnx2i assumes register_device() will fail only in case of duplicate registration, but there could be other cases like failed request_irq() and other cases where bnx2i_init_one() should return error code. I update this and send a patch.
Created attachment 347665 [details]
New bnx2i cleanup patch (4/4)
New bnx2i cleanup patch to replace the older one. Mike found problems with the bnx2i_init_one() error handling path the locking around bnx2i_reg_device.
Created attachment 349194 [details]
Newest bnx2i cleanup patch (4/4)
Newest bnx2i bug fix and cleanup patch to match upstream.
Created attachment 351009 [details]
Fix ep_connect() if shost is specified
mchan - the patch in Comment #19 will need to be included in a new BZ since mchristi already sent up a patchset (hence it is in POST) for review.
I made a new bugzilla for this and a bug in libiscsi I found while testing it. The bz is here https://bugzilla.redhat.com/show_bug.cgi?id=511096
You can download this test kernel from http://people.redhat.com/dzickus/el5
Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so. However feel free
to provide a comment indicating that this fix has been verified.
Verified that the patch is included in kernel-2.6.18-160.el5
~~ Attention Partners - RHEL 5.4 Snapshot 5 Released! ~~
RHEL 5.4 Snapshot 5 is the FINAL snapshot to be release before RC. It has been
released on partners.redhat.com. If you have already reported your test results,
you can safely ignore this request. Otherwise, please notice that there should be
a fix available now that addresses this particular issue. Please test and report
back your results here, at your earliest convenience.
If you encounter any issues while testing Beta, please describe the
issues you have encountered and set the bug into NEED_INFO. If you
encounter new issues, please clone this bug to open a new issue and
request it be reviewed for inclusion in RHEL 5.4 or a later update, if it
is not of urgent severity. If it is urgent, escalate the issue to your partner manager as soon as possible. There is /very/ little time left to get additional code into 5.4 before GA.
Partners, after you have verified, do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue.
Further questions can be directed to your Red Hat Partner Manager or other
appropriate customer representative.
Can Broadcom please test this functionality ASAP?
Testing it right now. I'll complete testing as fast as possible.
Thanks Greg. Your testing feedback is greatly appreciated.
Greg, and updates?
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.