Bug 463416 - RHEL 5.3: fix scsi regression causing udev to hang loading sr_mod
Summary: RHEL 5.3: fix scsi regression causing udev to hang loading sr_mod
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: beta
: ---
Assignee: Mike Christie
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-09-23 09:56 UTC by Mark McLoughlin
Modified: 2009-01-20 20:16 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 20:16:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
'lspci -v' output (7.48 KB, text/plain)
2008-09-23 09:57 UTC, Mark McLoughlin
no flags Details
2.6.18.4-116.el5-udev-hang.long (501.57 KB, text/plain)
2008-09-23 09:57 UTC, Mark McLoughlin
no flags Details
scsi-fix-regression-introduced-by-typo-in-failfast.patch (1.26 KB, patch)
2008-09-23 10:00 UTC, Mark McLoughlin
no flags Details | Diff
scsi-fix-regression-introduced-by-typo-in-failfast.patch (1.33 KB, patch)
2008-09-23 10:03 UTC, Mark McLoughlin
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Mark McLoughlin 2008-09-23 09:56:09 UTC
With 2.6.18.4-116, udev hangs when loading sr_mod on my machine.

I bisected the issue down to:

http://git.engineering.redhat.com/?p=users/dzickus/rhel5/kernel;a=commit;h=325d5462da6613a1353fa8cbc4603e8f056e67b1

  commit 325d5462da6613a1353fa8cbc4603e8f056e67b1
  [scsi] modify failfast so it does not always fail fast

Attaching 'lspci -v' output and a log from booting with 'udevdebug'

The log shows that 'modprobe sr_mod' is the first command to hang; also, at the end it shows:

  sr 0:0:1:0: timing out command, waited 120s
  sr 0:0:1:0: timing out command, waited 120s
  sr 0:0:1:0: timing out command, waited 120s
  sr 0:0:1:0: timing out command, waited 120s

The issue seems to be caused by a fairly simple typo; attaching a patch below

Comment 1 Mark McLoughlin 2008-09-23 09:57:02 UTC
Created attachment 317454 [details]
'lspci -v' output

Comment 2 Mark McLoughlin 2008-09-23 09:57:41 UTC
Created attachment 317456 [details]
2.6.18.4-116.el5-udev-hang.long

Comment 3 Mark McLoughlin 2008-09-23 10:00:36 UTC
Created attachment 317458 [details]
scsi-fix-regression-introduced-by-typo-in-failfast.patch

Comment 4 Mark McLoughlin 2008-09-23 10:03:21 UTC
Created attachment 317459 [details]
scsi-fix-regression-introduced-by-typo-in-failfast.patch

Wow - emacs really screwed that up ...

Comment 7 Jeff Moyer 2008-09-24 19:18:28 UTC
This is keeping our performance team from running their standard battery of tests.

Comment 9 Jay Turner 2008-09-30 11:54:21 UTC
Looks like this patch actually made it into -117.el5 and I'm not longer seeing the hang.  Will continue to test.

Comment 10 Don Zickus 2008-09-30 16:01:58 UTC
in kernel-2.6.18-117.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 13 errata-xmlrpc 2009-01-20 20:16:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.