Bug 515176 - scsi_transport_fc: fc_user_scan can loop forever, needs mutex with rport list changes
Summary: scsi_transport_fc: fc_user_scan can loop forever, needs mutex with rport list...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: David Milburn
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 499522 521239 651455
TreeView+ depends on / blocked
 
Reported: 2009-08-03 06:14 UTC by Mark Goodwin
Modified: 2018-11-27 19:32 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 06:52:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
upstream patch: scsi_transport_fc: fc_user_scan correction (3.16 KB, patch)
2009-08-03 06:14 UTC, Mark Goodwin
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Mark Goodwin 2009-08-03 06:14:26 UTC
Created attachment 355956 [details]
upstream patch: scsi_transport_fc: fc_user_scan correction

Description of problem:
Customer reported scsi_scan looping forever, causing eventual soft-lockup :

BUG: soft lockup detected on CPU#1!
 [<c044d21c>] softlockup_tick+0x96/0xa4
 [<c042ddb0>] update_process_times+0x39/0x5c
 [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<f88aeccd>] fc_user_scan+0x69/0x72 [scsi_transport_fc]
 [<f88aec64>] fc_user_scan+0x0/0x72 [scsi_transport_fc]
 [<f88704bb>] store_scan+0x83/0xab [scsi_mod]
 [<f8870438>] store_scan+0x0/0xab [scsi_mod]
 [<c054cd24>] class_device_attr_store+0x1b/0x1f
 [<c04a4a3c>] sysfs_write_file+0x91/0xbb
 [<c04a49ab>] sysfs_write_file+0x0/0xbb
 [<c0470254>] vfs_write+0xa1/0x143
 [<c0470846>] sys_write+0x3c/0x63
 [<c0404eff>] syscall_call+0x7/0xb

Version-Release number of selected component (if applicable):
Reported on RHEL5.3, but all versions of RHEL believed to be affected

Step to Reproduce:
 # echo "- - -" > /sys/class/scsi_host/hostN/scan
(for HBA number N)

Actual Results:
The scan sometimes loops forever, resulting in a hung system due to no I/O possible on that HBA.

Expected Results:
 scan should complete normally in a reasonable time-frame.

Summary of actions taken to resolve issue:
 reboot the system.

Additional info:
We have identified an upstream patch (attached), built a test kernel
and the customer has verified it resolves the issue (see associated IT).
Basically, the patch re-introduces some irq locking to guard against
rport list changes during the main loop of the scan.

See the attached patch, or the upstream commit: bda232531f0c117921690ee3c060953c8f12e5a1

Thanks
-- Mark Goodwin

Comment 1 David Milburn 2009-08-05 22:41:25 UTC
I realize that you have tested the upstream patch on a -53 test kernel, would
you please try this test kernel built from the latest RHEL 5.4 sources? Thanks.

http://people.redhat.com/dmilburn/.bz515176/

Comment 10 Don Zickus 2009-09-04 18:45:54 UTC
in kernel-2.6.18-165.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 19 errata-xmlrpc 2010-03-30 06:52:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.