Bug 470610

Summary: [Emulex 5.3 bug] Update lpfc to version 8.2.0.33.3p
Product: Red Hat Enterprise Linux 5 Reporter: Jamie Wellnitz <jamie.wellnitz>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED ERRATA QA Contact: Mike Gahagan <mgahagan>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.3CC: andriusb, berthiaume_wayne, bino.sebastian, coughlan, laurie.barry, marting, phinchman, rpacheco, rsarraf, syeghiay, xdl-redhat-bugzilla
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 19:48:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 357171, 373081, 431464    
Attachments:
Description Flags
patch for Emulex lpfc 8.2.0.33.3p - applies on top of 8.2.0.33.2p none

Description Jamie Wellnitz 2008-11-07 23:14:03 UTC
Created attachment 322912 [details]
patch for Emulex lpfc 8.2.0.33.3p - applies on top of 8.2.0.33.2p

There's a bug in 8.2.0.33.2p (in kernel 2.6.18-122.el5) involving Fibre Channel discovery.

Symptoms:
lpfc HBAs fail to discover all the targets after a switch reboot when connected to a QLogic switch.  The fix is two lines (actually a single line moved by a few lines), plus the version change.

Following are the events that cause this issue.

- After a switch reboot HBA receives a link up event
- Driver starts discovery by querying the name server and sending PLOGI to the targets.
- The PLOGI for one target fails with LS_RJT
- The driver started a delay timer to retry the PLOGI after one second. The NLP_NPR_2B_DISC flag of this target is set and target is in NPR state now.
- Before delay timer expires, the HBA received an RSCN for the target and lpfc_device_recov_npr_node state machine function is called. This function  resets the NLP_NPR_2B_DISC flag and calls lpfc_cancel_retry_delay_tmo function.

The counters for keeping track of the number of targets in discovery is updated in lpfc_cancel_retry_delay_tmo function based on NLP_NPR_2B_DISC flag. These counters are not updated because NLP_NPR_2B_DISC flag is cleared before calling lpfc_cancel_retry_delay_tmo function. This leaves the HBA in FC_NDISC_ACTIVE state and results in the driver not responding to any RSCN events.

Fix:
Moved the clearing of NLP_NPR_2B_DISC flag after the lpfc_cancel_retry_delay_tmo function call in lpfc_device_recov_npr_node function.

Comment 5 Don Zickus 2008-12-09 21:04:53 UTC
in kernel-2.6.18-126.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 7 Chris Ward 2008-12-16 16:29:26 UTC
~~~ Attention Partners ~~~ The *last* RHEL 5.3 Snapshot 6 is now available at partners.redhat.com. A fix for this bug should be present. Please test and update this bug with test results as soon as possible.  If the fix present in Snap6 meets all the expected requirements for this bug, please add the keyword PartnerVerified. If any new bugs are discovered, please CLONE this bug and describe the issues encountered there.

Comment 8 Jamie Wellnitz 2008-12-17 02:38:08 UTC
2.6.18-126.el5 has 8.2.0.33.3p and looks good.

Comment 10 errata-xmlrpc 2009-01-20 19:48:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html