Bug 726099 - __scsi_add_device+0xc8/0x170 has a problem when there is scsi enclosure
Summary: __scsi_add_device+0xc8/0x170 has a problem when there is scsi enclosure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: ppc64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Vivek Goyal
QA Contact: Gris Ge
URL:
Whiteboard:
: 728424 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-27 14:25 UTC by Steve Best
Modified: 2011-12-06 14:10 UTC (History)
5 users (show)

Fixed In Version: kernel-2.6.32-192.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 13:55:00 UTC
Target Upstream Version:


Attachments (Terms of Use)
kernel log (71.47 KB, application/octet-stream)
2011-07-27 14:25 UTC, Steve Best
no flags Details
sosreport (916.81 KB, application/x-xz)
2011-08-11 12:55 UTC, IBM Bug Proxy
no flags Details
systcl -a output (2.15 MB, application/octet-stream)
2011-08-11 12:55 UTC, IBM Bug Proxy
no flags Details
Console log (147.78 KB, text/x-log)
2011-08-11 12:55 UTC, IBM Bug Proxy
no flags Details
console log showing the issue is fixed (139.20 KB, text/plain)
2011-08-16 15:11 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1530 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2011-12-06 01:45:35 UTC

Description Steve Best 2011-07-27 14:25:27 UTC
Created attachment 515535 [details]
kernel log

Description of problem:
It looks like the change for bz https://bugzilla.redhat.com/show_bug.cgi?id=707755
caused the following issue.
Jul 27 09:44:36 ibm-js22-06 kernel: dracut: rd_NO_MD: removing MD RAID activation
Jul 27 09:44:36 ibm-js22-06 kernel: ipr: IBM Power RAID SCSI Device Driver version: 2.5.1 (August 10, 2010)
Jul 27 09:44:36 ibm-js22-06 kernel: ipr 0000:00:01.0: Found IOA with IRQ: 289
Jul 27 09:44:36 ibm-js22-06 kernel: ipr 0000:00:01.0: Starting IOA initialization sequence.
Jul 27 09:44:36 ibm-js22-06 kernel: ipr 0000:00:01.0: Adapter firmware version: 03200048
Jul 27 09:44:36 ibm-js22-06 kernel: ipr 0000:00:01.0: IOA initialized.
Jul 27 09:44:36 ibm-js22-06 kernel: scsi0 : IBM 572C Storage Adapter
Jul 27 09:44:36 ibm-js22-06 kernel: scsi 0:3:0:0: Direct-Access     IBM-ESXS ST973402SS       B529 PQ: 0 ANSI: 5
Jul 27 09:44:36 ibm-js22-06 kernel: scsi 0:8:0:0: Enclosure         IBM      VSBPD1BB   SAS     01 PQ: 0 ANSI: 2
Jul 27 09:44:36 ibm-js22-06 kernel: INFO: task modprobe:250 blocked for more than 120 seconds.
Jul 27 09:44:36 ibm-js22-06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 27 09:44:36 ibm-js22-06 kernel: modprobe      D 0000008030329488     0   250      1 0x00008000
Jul 27 09:44:36 ibm-js22-06 kernel: Call Trace:
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685aee40] [c0000000685aeef0] 0xc0000000685aeef0 (unreliable)
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af010] [c000000000014278] .__switch_to+0xf8/0x1d0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af0a0] [c00000000059dc0c] .schedule+0x3fc/0xd30
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af390] [c00000000059f6ac] .__mutex_lock_slowpath+0x1bc/0x2d0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af470] [c00000000059ff8c] .mutex_lock+0x5c/0x60
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af4f0] [c0000000003d6b38] .__scsi_add_device+0xc8/0x170
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af5a0] [c0000000003d6bf4] .scsi_add_device+0x14/0x50
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af610] [d000000001a81808] .ipr_probe+0x328/0x404 [ipr]
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af6f0] [c000000000317b24] .local_pci_probe+0x34/0x50
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af760] [c0000000003189d8] .pci_device_probe+0x158/0x170
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af810] [c0000000003afa38] .driver_probe_device+0xe8/0x350
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af8b0] [c0000000003afdac] .__driver_attach+0x10c/0x110
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af940] [c0000000003aec28] .bus_for_each_dev+0x98/0xf0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af9f0] [c0000000003af6c8] .driver_attach+0x28/0x40
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afa70] [c0000000003ae198] .bus_add_driver+0x2a8/0x3e0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afb20] [c0000000003b023c] .driver_register+0x9c/0x1b0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afbc0] [c000000000318d24] .__pci_register_driver+0x64/0x140
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afc60] [d000000001a81930] .ipr_init+0x4c/0x68 [ipr]
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afce0] [c00000000000976c] .do_one_initcall+0x5c/0x200
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afd90] [c0000000000dd79c] .SyS_init_module+0x14c/0x2c0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afe30] [c000000000008564] syscall_exit+0x0/0x40
Jul 27 09:44:36 ibm-js22-06 kernel: INFO: task modprobe:250 blocked for more than 120 seconds.
Jul 27 09:44:36 ibm-js22-06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 27 09:44:36 ibm-js22-06 kernel: modprobe      D 0000008030329488     0   250      1 0x00008000
Jul 27 09:44:36 ibm-js22-06 kernel: Call Trace:
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685aee40] [c0000000685aeef0] 0xc0000000685aeef0 (unreliable)
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af010] [c000000000014278] .__switch_to+0xf8/0x1d0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af0a0] [c00000000059dc0c] .schedule+0x3fc/0xd30
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af390] [c00000000059f6ac] .__mutex_lock_slowpath+0x1bc/0x2d0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af470] [c00000000059ff8c] .mutex_lock+0x5c/0x60
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af4f0] [c0000000003d6b38] .__scsi_add_device+0xc8/0x170
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af5a0] [c0000000003d6bf4] .scsi_add_device+0x14/0x50
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af610] [d000000001a81808] .ipr_probe+0x328/0x404 [ipr]
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af6f0] [c000000000317b24] .local_pci_probe+0x34/0x50
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af760] [c0000000003189d8] .pci_device_probe+0x158/0x170
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af810] [c0000000003afa38] .driver_probe_device+0xe8/0x350
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af8b0] [c0000000003afdac] .__driver_attach+0x10c/0x110
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af940] [c0000000003aec28] .bus_for_each_dev+0x98/0xf0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af9f0] [c0000000003af6c8] .driver_attach+0x28/0x40
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afa70] [c0000000003ae198] .bus_add_driver+0x2a8/0x3e0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afb20] [c0000000003b023c] .driver_register+0x9c/0x1b0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afbc0] [c000000000318d24] .__pci_register_driver+0x64/0x140
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afc60] [d000000001a81930] .ipr_init+0x4c/0x68 [ipr]
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afce0] [c00000000000976c] .do_one_initcall+0x5c/0x200
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afd90] [c0000000000dd79c] .SyS_init_module+0x14c/0x2c0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afe30] [c000000000008564] syscall_exit+0x0/0x40
Jul 27 09:44:36 ibm-js22-06 kernel: INFO: task modprobe:301 blocked for more than 120 seconds.
Jul 27 09:44:36 ibm-js22-06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 27 09:44:36 ibm-js22-06 kernel: modprobe      D 0000008030329488     0   301      1 0x00008000
Jul 27 09:44:36 ibm-js22-06 kernel: Call Trace:
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7660] [c0000000688e7710] 0xc0000000688e7710 (unreliable)
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7830] [c000000000014278] .__switch_to+0xf8/0x1d0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e78c0] [c00000000059dc0c] .schedule+0x3fc/0xd30
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7bb0] [c0000000003afff4] .wait_for_device_probe+0x64/0xc0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7c70] [d000000001b10014] .wait_scan_init+0x10/0xc4 [scsi_wait_scan]
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7ce0] [c00000000000976c] .do_one_initcall+0x5c/0x200
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7d90] [c0000000000dd79c] .SyS_init_module+0x14c/0x2c0
Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7e30] [c000000000008564] syscall_exit+0x0/0x40
Jul 27 09:44:36 ibm-js22-06 kernel: sd 0:3:0:0: [sda] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB)
Jul 27 09:44:36 ibm-js22-06 kernel: sd 0:3:0:0: [sda] Write Protect is off
Jul 27 09:44:36 ibm-js22-06 kernel: sd 0:3:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
Jul 27 09:44:36 ibm-js22-06 kernel: sda: sda1 sda2 sda3
Jul 27 09:44:36 ibm-js22-06 kernel: sd 0:3:0:0: [sda] Attached SCSI disk
Jul 27 09:44:36 ibm-js22-06 kernel: scsi: unknown device type 31
Jul 27 09:44:36 ibm-js22-06 kernel: scsi 0:255:255:255: No Device         IBM      572C001SISIOA    0150 PQ: 0 ANSI: 0
Jul 27 09:44:36 ibm-js22-06 kernel: dracut: Scanning devices sda3  for LVM logical volumes vg_ibmjs2206/lv_root vg_ibmjs2206/lv_swap 

Version-Release number of selected component (if applicable):
6.2

How reproducible:
every time is the system has scsi Enclosure

Steps to Reproduce:
1.install 158 kernel works
2.install 158 + the patches for 707755 and the problem occurs 
3.
  
Actual results:


Expected results:


Additional info:

I've narrow this down by finding the latest 6.2 kernel that works and that was 158. I took 158 and added the patches for 707755 and the problem happens. I've seen this on at least 2 systems so far. 1 at RH and the other at IBM.

not sure how this patchset would cause this still looking at more closely

Comment 2 Steve Best 2011-08-03 11:04:57 UTC
Vivek,

I haven't found a good why to bisect this patchset. any ideas of how to break up the patchset to narrow this down or have you started to debug this yet?

-Steve

Comment 3 Steve Best 2011-08-11 12:47:08 UTC
*** Bug 728424 has been marked as a duplicate of this bug. ***

Comment 4 IBM Bug Proxy 2011-08-11 12:55:21 UTC
Created attachment 517798 [details]
sosreport

Comment 5 IBM Bug Proxy 2011-08-11 12:55:28 UTC
Created attachment 517799 [details]
systcl -a output

Comment 6 IBM Bug Proxy 2011-08-11 12:55:33 UTC
Created attachment 517800 [details]
Console log

Comment 7 Vivek Goyal 2011-08-11 14:31:22 UTC
Hmm..., So with the above patch series, system still works just that it takes a
long time to boot.

I am wondering if it is related to issue where queue deletion will get slow in
case of CFQ. I had posted a patch for that upstream. 

I think a quick way to verify that will be using any other scheduler then cfq.
How about using elevator=deadline on command line and see if you still see the
issue.

Comment 8 Vivek Goyal 2011-08-11 14:36:06 UTC
Can you please try following CFQ patch I posted upstream and see if issue is resolved.

https://lkml.org/lkml/2011/8/1/259

Comment 9 Steve Best 2011-08-11 15:26:25 UTC
(In reply to comment #8)
> Can you please try following CFQ patch I posted upstream and see if issue is
> resolved.
> 
> https://lkml.org/lkml/2011/8/1/259

I'll build this patch and see if it fixes the problem

Comment 10 Vivek Goyal 2011-08-11 15:42:18 UTC
ok, before building the patch, can you please do quick test of using elevator=deadline on kernel command line. This will atleast confirm that problem is with CFQ. If that's not the case then above patch will most likely not help.

Comment 11 Steve Best 2011-08-12 12:50:06 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Can you please try following CFQ patch I posted upstream and see if issue is
> > resolved.
> > 
> > https://lkml.org/lkml/2011/8/1/259
> 
> I'll build this patch and see if it fixes the problem

I tested this patch on a system here and it fixes the issue. thanks for the help.

-Steve

Comment 12 Vivek Goyal 2011-08-12 12:56:20 UTC
(In reply to comment #11)
> > > Can you please try following CFQ patch I posted upstream and see if issue is
> > > resolved.
> > > 
> > > https://lkml.org/lkml/2011/8/1/259
> > 
> > I'll build this patch and see if it fixes the problem
> 
> I tested this patch on a system here and it fixes the issue. thanks for the
> help.
> 

Great. Did you do a brew build across all arches or just a local build for ppc64? If you did brew build across all arches, i could skip that step. Otherwise i will do a brew build, and post the patch to rhkernel-list.

Comment 13 Steve Best 2011-08-12 13:00:43 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > > > Can you please try following CFQ patch I posted upstream and see if issue is
> > > > resolved.
> > > > 
> > > > https://lkml.org/lkml/2011/8/1/259
> > > 
> > > I'll build this patch and see if it fixes the problem
> > 
> > I tested this patch on a system here and it fixes the issue. thanks for the
> > help.
> > 
> 
> Great. Did you do a brew build across all arches or just a local build for
> ppc64? If you did brew build across all arches, i could skip that step.
> Otherwise i will do a brew build, and post the patch to rhkernel-list.

yes I have brewbuild.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3549946
only have tested on powerpc

Comment 15 IBM Bug Proxy 2011-08-16 15:11:17 UTC
Created attachment 518515 [details]
console log showing the issue is fixed


------- Comment (attachment only) From  2011-08-16 11:06 EDT-------

Comment 16 RHEL Program Management 2011-08-18 04:49:30 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 18 Aristeu Rozanski 2011-08-24 15:53:57 UTC
Patch(es) available on kernel-2.6.32-192.el6

Comment 20 Steve Best 2011-08-24 17:44:52 UTC
on my system ibm-js22-06 I installed 192 kernel and the fix works.

Comment 22 IBM Bug Proxy 2011-10-17 18:42:31 UTC
------- Comment From sbest.com 2011-10-17 14:31 EDT-------
making Naveed's comment external

this issue is no more reproducible in RHEL6.2Snap1.
closing as it is fixed !

Comment 23 Gris Ge 2011-11-04 09:12:36 UTC
As comment #20. Change to VERIFY.

Comment 24 errata-xmlrpc 2011-12-06 13:55:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html


Note You need to log in before you can comment on or make changes to this bug.