Bug 548496

Summary: [Emulex 4.9 bug] lpfc driver doesn't acquire lock when searching hba for target
Product: Red Hat Enterprise Linux 4 Reporter: Casey Dahlin <cdahlin>
Component: kernelAssignee: Rob Evers <revers>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.8CC: andriusb, coughlan, cward, dhoward, laurie.barry, moshiro, phan, qcai, tao, vaios.papadimitriou, vanhoof
Target Milestone: rcKeywords: OtherQA, ZStream
Target Release: 4.9   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-16 15:23:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 561453, 626414    
Attachments:
Description Flags
Patch to add lock protection to lpfc_find_target none

Description Casey Dahlin 2009-12-17 16:15:17 UTC
Created attachment 379032 [details]
Patch to add lock protection to lpfc_find_target

lpfc_find_target needs to acquire the host lock before it begins iterating the lists to avoid a potential hang.

Patch enclosed. Z-stream request should be made shortly.

Comment 4 Andrius Benokraitis 2010-01-07 15:27:20 UTC
Casey - will you be able to test this on behalf of Emulex, or will you require Emulex to test this as well?

Comment 5 RHEL Program Management 2010-01-07 15:30:42 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Rob Evers 2010-01-07 15:44:58 UTC
Laurie,

Can someone at Emulex review the attached patch?

Thanks, Rob

Comment 7 Tom Coughlan 2010-01-07 16:53:05 UTC
Description of Problem:
 If multiple hosts connected to FC switch are rebooted synchronously,
 the boot sequence stops while loading the lpfc driver.
 -------------------------------
 ELILO boot: Uncompressing Linux... done
 Loading initrd initrd-2.6.9-67.EL.img...done
 i8042.c: No controller found.
 Red Hat nash version 4.2.1.13 starting
 lpfc 0005:0b:01.0: 0:1303 Link Up Event x1 received Data: x1 x0 x10 x0
 lpfc 0005:0b:01.1: 1:1303 Link Up Event x1 received Data: x1 x0 x10 x0
 (*** stops here ***)
 -------------------------------

 It seems the lpfc driver is looping forever in the code where the driver
 is scanning through the list of node because that code is not protected
 by a spinlock.

 This problem is only seen in RHEL4.6, but it may be a cause of the potential
 bug in the other versions of lpfc driver.

Version-Release number of selected component:
 Red Hat Enterprise Linux Version Number: RHEL4
 Release Number: 4.6
 Architecture: ia64
 Kernel Version: 2.6.9-67.EL
 Related Package Version: lpfc driver v8.0.16.40
 Related Middleware / Application: None

Drivers or hardware or architecture dependency:
 lpfc driver for RHEL4

How reproducible:
 Unclear, our customer says about 1 out of 10 times, although that environment
 is somewhat special since it is SAN boot and all hosts are rebooted synchronously.
 In out test environment it was about 1 out of 5000 tries.

Step to Reproduce:
 Prepare multi node in SAN boot environment, and keep rebooting nodes synchronously
 until the problem occures.

Actual Results:
 Boot sequence stops while loading the lpfc driver

Expected Results:
 Boot sequence completes normally

Summary of actions taken to resolve issue:
 Reset the system.

Location of diagnostic data:
 When scanning through the listp, the listp had NULL value and ended up looping forever,
 although the previous list_empty() test had passed as the listp exists. I guess the
 content of the listp had changed after passing the list_empty() test since it is not
 protected by a spinlock. Handling the node list should be protected by spinlocks.

 ============================
 struct lpfc_target *
 lpfc_find_target(struct lpfc_hba * phba, uint32_t tgt,
                  struct lpfc_nodelist *nlp)
 {
         struct lpfc_target *targetp = NULL;
         int found = 0, i;
         struct list_head *listp;
         struct list_head *node_list[6];
         ...
 
         if(!nlp) {
                 // spin_lock_irqsave(phba->host->host_lock, iflag); Need to get spinlock
 
                 /* Search over all lists other than fc_nlpunmap_list */
                 node_list[0] = &phba->fc_npr_list;
                 node_list[1] = &phba->fc_nlpmap_list; /* Skip fc_nlpunmap */
                 node_list[2] = &phba->fc_prli_list;
                 node_list[3] = &phba->fc_reglogin_list;
                 node_list[4] = &phba->fc_adisc_list;
                 node_list[5] = &phba->fc_plogi_list;
 
                 for (i=0; i < 6 && !found; i++) {
                         listp = node_list[i];
                         if (list_empty(listp))
                                 continue;
                         list_for_each_entry(nlp, listp, nlp_listp) { // loop here
                                 if (tgt == nlp->nlp_sid) {
                                         found = 1;
                                         break;
                                 }
                         }
                 }
 
                 // spin_unlock_irqrestore(phba->host->host_lock, iflag); Need to unlock spinlock
 ============================

Comment 8 Vaios Papadimitriou 2010-01-07 22:54:40 UTC
We reviewed the patch, and it looks good.

Thank you.

Comment 9 Rob Evers 2010-01-07 23:07:22 UTC
(In reply to comment #4)
> Casey - will you be able to test this on behalf of Emulex, or will you require
> Emulex to test this as well?  

Casey,

I need to know the status of testing in order to post this.

Rob

Comment 11 Rob Evers 2010-01-08 15:33:42 UTC
Confirmed patch fixed problem from issue tracker.

Comment 14 Chris Ward 2010-01-21 09:21:28 UTC
@CAI. See Comment #13.

Comment 21 Vivek Goyal 2010-02-17 16:22:17 UTC
Committed in 89.20.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 32 errata-xmlrpc 2011-02-16 15:23:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html