548496 – [Emulex 4.9 bug] lpfc driver doesn't acquire lock when searching hba for target

Bug 548496 - [Emulex 4.9 bug] lpfc driver doesn't acquire lock when searching hba for target

Summary: [Emulex 4.9 bug] lpfc driver doesn't acquire lock when searching hba for target

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.8
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	4.9
Assignee:	Rob Evers
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	561453 626414
TreeView+	depends on / blocked

Reported:	2009-12-17 16:15 UTC by Casey Dahlin
Modified:	2018-10-27 15:08 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-02-16 15:23:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch to add lock protection to lpfc_find_target (793 bytes, application/octet-stream) 2009-12-17 16:15 UTC, Casey Dahlin	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0263	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update	2011-02-16 15:14:55 UTC

Description Casey Dahlin 2009-12-17 16:15:17 UTC

Created attachment 379032 [details]
Patch to add lock protection to lpfc_find_target

lpfc_find_target needs to acquire the host lock before it begins iterating the lists to avoid a potential hang.

Patch enclosed. Z-stream request should be made shortly.

Comment 4 Andrius Benokraitis 2010-01-07 15:27:20 UTC

Casey - will you be able to test this on behalf of Emulex, or will you require Emulex to test this as well?

Comment 5 RHEL Program Management 2010-01-07 15:30:42 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Rob Evers 2010-01-07 15:44:58 UTC

Laurie,

Can someone at Emulex review the attached patch?

Thanks, Rob

Comment 7 Tom Coughlan 2010-01-07 16:53:05 UTC

Description of Problem:
 If multiple hosts connected to FC switch are rebooted synchronously,
 the boot sequence stops while loading the lpfc driver.
 -------------------------------
 ELILO boot: Uncompressing Linux... done
 Loading initrd initrd-2.6.9-67.EL.img...done
 i8042.c: No controller found.
 Red Hat nash version 4.2.1.13 starting
 lpfc 0005:0b:01.0: 0:1303 Link Up Event x1 received Data: x1 x0 x10 x0
 lpfc 0005:0b:01.1: 1:1303 Link Up Event x1 received Data: x1 x0 x10 x0
 (*** stops here ***)
 -------------------------------

 It seems the lpfc driver is looping forever in the code where the driver
 is scanning through the list of node because that code is not protected
 by a spinlock.

 This problem is only seen in RHEL4.6, but it may be a cause of the potential
 bug in the other versions of lpfc driver.

Version-Release number of selected component:
 Red Hat Enterprise Linux Version Number: RHEL4
 Release Number: 4.6
 Architecture: ia64
 Kernel Version: 2.6.9-67.EL
 Related Package Version: lpfc driver v8.0.16.40
 Related Middleware / Application: None

Drivers or hardware or architecture dependency:
 lpfc driver for RHEL4

How reproducible:
 Unclear, our customer says about 1 out of 10 times, although that environment
 is somewhat special since it is SAN boot and all hosts are rebooted synchronously.
 In out test environment it was about 1 out of 5000 tries.

Step to Reproduce:
 Prepare multi node in SAN boot environment, and keep rebooting nodes synchronously
 until the problem occures.

Actual Results:
 Boot sequence stops while loading the lpfc driver

Expected Results:
 Boot sequence completes normally

Summary of actions taken to resolve issue:
 Reset the system.

Location of diagnostic data:
 When scanning through the listp, the listp had NULL value and ended up looping forever,
 although the previous list_empty() test had passed as the listp exists. I guess the
 content of the listp had changed after passing the list_empty() test since it is not
 protected by a spinlock. Handling the node list should be protected by spinlocks.

 ============================
 struct lpfc_target *
 lpfc_find_target(struct lpfc_hba * phba, uint32_t tgt,
                  struct lpfc_nodelist *nlp)
 {
         struct lpfc_target *targetp = NULL;
         int found = 0, i;
         struct list_head *listp;
         struct list_head *node_list[6];
         ...
 
         if(!nlp) {
                 // spin_lock_irqsave(phba->host->host_lock, iflag); Need to get spinlock
 
                 /* Search over all lists other than fc_nlpunmap_list */
                 node_list[0] = &phba->fc_npr_list;
                 node_list[1] = &phba->fc_nlpmap_list; /* Skip fc_nlpunmap */
                 node_list[2] = &phba->fc_prli_list;
                 node_list[3] = &phba->fc_reglogin_list;
                 node_list[4] = &phba->fc_adisc_list;
                 node_list[5] = &phba->fc_plogi_list;
 
                 for (i=0; i < 6 && !found; i++) {
                         listp = node_list[i];
                         if (list_empty(listp))
                                 continue;
                         list_for_each_entry(nlp, listp, nlp_listp) { // loop here
                                 if (tgt == nlp->nlp_sid) {
                                         found = 1;
                                         break;
                                 }
                         }
                 }
 
                 // spin_unlock_irqrestore(phba->host->host_lock, iflag); Need to unlock spinlock
 ============================

Comment 8 Vaios Papadimitriou 2010-01-07 22:54:40 UTC

We reviewed the patch, and it looks good.

Thank you.

Comment 9 Rob Evers 2010-01-07 23:07:22 UTC

(In reply to comment #4)
> Casey - will you be able to test this on behalf of Emulex, or will you require
> Emulex to test this as well?  

Casey,

I need to know the status of testing in order to post this.

Rob

Comment 11 Rob Evers 2010-01-08 15:33:42 UTC

Confirmed patch fixed problem from issue tracker.

Comment 14 Chris Ward 2010-01-21 09:21:28 UTC

@CAI. See Comment #13.

Comment 21 Vivek Goyal 2010-02-17 16:22:17 UTC

Committed in 89.20.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 32 errata-xmlrpc 2011-02-16 15:23:18 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html

Note You need to log in before you can comment on or make changes to this bug.