Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 515176

Summary: scsi_transport_fc: fc_user_scan can loop forever, needs mutex with rport list changes
Product: Red Hat Enterprise Linux 5 Reporter: Mark Goodwin <mgoodwin>
Component: kernelAssignee: David Milburn <dmilburn>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.3CC: coughlan, cward, dhoward, dzickus, jpirko, jtluka, moshiro, tao
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 06:52:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 499522, 521239, 651455    
Attachments:
Description Flags
upstream patch: scsi_transport_fc: fc_user_scan correction none

Description Mark Goodwin 2009-08-03 06:14:26 UTC
Created attachment 355956 [details]
upstream patch: scsi_transport_fc: fc_user_scan correction

Description of problem:
Customer reported scsi_scan looping forever, causing eventual soft-lockup :

BUG: soft lockup detected on CPU#1!
 [<c044d21c>] softlockup_tick+0x96/0xa4
 [<c042ddb0>] update_process_times+0x39/0x5c
 [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<f88aeccd>] fc_user_scan+0x69/0x72 [scsi_transport_fc]
 [<f88aec64>] fc_user_scan+0x0/0x72 [scsi_transport_fc]
 [<f88704bb>] store_scan+0x83/0xab [scsi_mod]
 [<f8870438>] store_scan+0x0/0xab [scsi_mod]
 [<c054cd24>] class_device_attr_store+0x1b/0x1f
 [<c04a4a3c>] sysfs_write_file+0x91/0xbb
 [<c04a49ab>] sysfs_write_file+0x0/0xbb
 [<c0470254>] vfs_write+0xa1/0x143
 [<c0470846>] sys_write+0x3c/0x63
 [<c0404eff>] syscall_call+0x7/0xb

Version-Release number of selected component (if applicable):
Reported on RHEL5.3, but all versions of RHEL believed to be affected

Step to Reproduce:
 # echo "- - -" > /sys/class/scsi_host/hostN/scan
(for HBA number N)

Actual Results:
The scan sometimes loops forever, resulting in a hung system due to no I/O possible on that HBA.

Expected Results:
 scan should complete normally in a reasonable time-frame.

Summary of actions taken to resolve issue:
 reboot the system.

Additional info:
We have identified an upstream patch (attached), built a test kernel
and the customer has verified it resolves the issue (see associated IT).
Basically, the patch re-introduces some irq locking to guard against
rport list changes during the main loop of the scan.

See the attached patch, or the upstream commit: bda232531f0c117921690ee3c060953c8f12e5a1

Thanks
-- Mark Goodwin

Comment 1 David Milburn 2009-08-05 22:41:25 UTC
I realize that you have tested the upstream patch on a -53 test kernel, would
you please try this test kernel built from the latest RHEL 5.4 sources? Thanks.

http://people.redhat.com/dmilburn/.bz515176/

Comment 10 Don Zickus 2009-09-04 18:45:54 UTC
in kernel-2.6.18-165.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 19 errata-xmlrpc 2010-03-30 06:52:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html