Created attachment 355956 [details] upstream patch: scsi_transport_fc: fc_user_scan correction Description of problem: Customer reported scsi_scan looping forever, causing eventual soft-lockup : BUG: soft lockup detected on CPU#1! [<c044d21c>] softlockup_tick+0x96/0xa4 [<c042ddb0>] update_process_times+0x39/0x5c [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c [<c04059bf>] apic_timer_interrupt+0x1f/0x24 [<f88aeccd>] fc_user_scan+0x69/0x72 [scsi_transport_fc] [<f88aec64>] fc_user_scan+0x0/0x72 [scsi_transport_fc] [<f88704bb>] store_scan+0x83/0xab [scsi_mod] [<f8870438>] store_scan+0x0/0xab [scsi_mod] [<c054cd24>] class_device_attr_store+0x1b/0x1f [<c04a4a3c>] sysfs_write_file+0x91/0xbb [<c04a49ab>] sysfs_write_file+0x0/0xbb [<c0470254>] vfs_write+0xa1/0x143 [<c0470846>] sys_write+0x3c/0x63 [<c0404eff>] syscall_call+0x7/0xb Version-Release number of selected component (if applicable): Reported on RHEL5.3, but all versions of RHEL believed to be affected Step to Reproduce: # echo "- - -" > /sys/class/scsi_host/hostN/scan (for HBA number N) Actual Results: The scan sometimes loops forever, resulting in a hung system due to no I/O possible on that HBA. Expected Results: scan should complete normally in a reasonable time-frame. Summary of actions taken to resolve issue: reboot the system. Additional info: We have identified an upstream patch (attached), built a test kernel and the customer has verified it resolves the issue (see associated IT). Basically, the patch re-introduces some irq locking to guard against rport list changes during the main loop of the scan. See the attached patch, or the upstream commit: bda232531f0c117921690ee3c060953c8f12e5a1 Thanks -- Mark Goodwin
I realize that you have tested the upstream patch on a -53 test kernel, would you please try this test kernel built from the latest RHEL 5.4 sources? Thanks. http://people.redhat.com/dmilburn/.bz515176/
in kernel-2.6.18-165.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html