Hide Forgot
Created attachment 515535 [details] kernel log Description of problem: It looks like the change for bz https://bugzilla.redhat.com/show_bug.cgi?id=707755 caused the following issue. Jul 27 09:44:36 ibm-js22-06 kernel: dracut: rd_NO_MD: removing MD RAID activation Jul 27 09:44:36 ibm-js22-06 kernel: ipr: IBM Power RAID SCSI Device Driver version: 2.5.1 (August 10, 2010) Jul 27 09:44:36 ibm-js22-06 kernel: ipr 0000:00:01.0: Found IOA with IRQ: 289 Jul 27 09:44:36 ibm-js22-06 kernel: ipr 0000:00:01.0: Starting IOA initialization sequence. Jul 27 09:44:36 ibm-js22-06 kernel: ipr 0000:00:01.0: Adapter firmware version: 03200048 Jul 27 09:44:36 ibm-js22-06 kernel: ipr 0000:00:01.0: IOA initialized. Jul 27 09:44:36 ibm-js22-06 kernel: scsi0 : IBM 572C Storage Adapter Jul 27 09:44:36 ibm-js22-06 kernel: scsi 0:3:0:0: Direct-Access IBM-ESXS ST973402SS B529 PQ: 0 ANSI: 5 Jul 27 09:44:36 ibm-js22-06 kernel: scsi 0:8:0:0: Enclosure IBM VSBPD1BB SAS 01 PQ: 0 ANSI: 2 Jul 27 09:44:36 ibm-js22-06 kernel: INFO: task modprobe:250 blocked for more than 120 seconds. Jul 27 09:44:36 ibm-js22-06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 27 09:44:36 ibm-js22-06 kernel: modprobe D 0000008030329488 0 250 1 0x00008000 Jul 27 09:44:36 ibm-js22-06 kernel: Call Trace: Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685aee40] [c0000000685aeef0] 0xc0000000685aeef0 (unreliable) Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af010] [c000000000014278] .__switch_to+0xf8/0x1d0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af0a0] [c00000000059dc0c] .schedule+0x3fc/0xd30 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af390] [c00000000059f6ac] .__mutex_lock_slowpath+0x1bc/0x2d0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af470] [c00000000059ff8c] .mutex_lock+0x5c/0x60 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af4f0] [c0000000003d6b38] .__scsi_add_device+0xc8/0x170 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af5a0] [c0000000003d6bf4] .scsi_add_device+0x14/0x50 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af610] [d000000001a81808] .ipr_probe+0x328/0x404 [ipr] Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af6f0] [c000000000317b24] .local_pci_probe+0x34/0x50 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af760] [c0000000003189d8] .pci_device_probe+0x158/0x170 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af810] [c0000000003afa38] .driver_probe_device+0xe8/0x350 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af8b0] [c0000000003afdac] .__driver_attach+0x10c/0x110 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af940] [c0000000003aec28] .bus_for_each_dev+0x98/0xf0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af9f0] [c0000000003af6c8] .driver_attach+0x28/0x40 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afa70] [c0000000003ae198] .bus_add_driver+0x2a8/0x3e0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afb20] [c0000000003b023c] .driver_register+0x9c/0x1b0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afbc0] [c000000000318d24] .__pci_register_driver+0x64/0x140 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afc60] [d000000001a81930] .ipr_init+0x4c/0x68 [ipr] Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afce0] [c00000000000976c] .do_one_initcall+0x5c/0x200 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afd90] [c0000000000dd79c] .SyS_init_module+0x14c/0x2c0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afe30] [c000000000008564] syscall_exit+0x0/0x40 Jul 27 09:44:36 ibm-js22-06 kernel: INFO: task modprobe:250 blocked for more than 120 seconds. Jul 27 09:44:36 ibm-js22-06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 27 09:44:36 ibm-js22-06 kernel: modprobe D 0000008030329488 0 250 1 0x00008000 Jul 27 09:44:36 ibm-js22-06 kernel: Call Trace: Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685aee40] [c0000000685aeef0] 0xc0000000685aeef0 (unreliable) Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af010] [c000000000014278] .__switch_to+0xf8/0x1d0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af0a0] [c00000000059dc0c] .schedule+0x3fc/0xd30 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af390] [c00000000059f6ac] .__mutex_lock_slowpath+0x1bc/0x2d0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af470] [c00000000059ff8c] .mutex_lock+0x5c/0x60 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af4f0] [c0000000003d6b38] .__scsi_add_device+0xc8/0x170 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af5a0] [c0000000003d6bf4] .scsi_add_device+0x14/0x50 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af610] [d000000001a81808] .ipr_probe+0x328/0x404 [ipr] Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af6f0] [c000000000317b24] .local_pci_probe+0x34/0x50 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af760] [c0000000003189d8] .pci_device_probe+0x158/0x170 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af810] [c0000000003afa38] .driver_probe_device+0xe8/0x350 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af8b0] [c0000000003afdac] .__driver_attach+0x10c/0x110 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af940] [c0000000003aec28] .bus_for_each_dev+0x98/0xf0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685af9f0] [c0000000003af6c8] .driver_attach+0x28/0x40 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afa70] [c0000000003ae198] .bus_add_driver+0x2a8/0x3e0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afb20] [c0000000003b023c] .driver_register+0x9c/0x1b0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afbc0] [c000000000318d24] .__pci_register_driver+0x64/0x140 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afc60] [d000000001a81930] .ipr_init+0x4c/0x68 [ipr] Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afce0] [c00000000000976c] .do_one_initcall+0x5c/0x200 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afd90] [c0000000000dd79c] .SyS_init_module+0x14c/0x2c0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000685afe30] [c000000000008564] syscall_exit+0x0/0x40 Jul 27 09:44:36 ibm-js22-06 kernel: INFO: task modprobe:301 blocked for more than 120 seconds. Jul 27 09:44:36 ibm-js22-06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 27 09:44:36 ibm-js22-06 kernel: modprobe D 0000008030329488 0 301 1 0x00008000 Jul 27 09:44:36 ibm-js22-06 kernel: Call Trace: Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7660] [c0000000688e7710] 0xc0000000688e7710 (unreliable) Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7830] [c000000000014278] .__switch_to+0xf8/0x1d0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e78c0] [c00000000059dc0c] .schedule+0x3fc/0xd30 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7bb0] [c0000000003afff4] .wait_for_device_probe+0x64/0xc0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7c70] [d000000001b10014] .wait_scan_init+0x10/0xc4 [scsi_wait_scan] Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7ce0] [c00000000000976c] .do_one_initcall+0x5c/0x200 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7d90] [c0000000000dd79c] .SyS_init_module+0x14c/0x2c0 Jul 27 09:44:36 ibm-js22-06 kernel: [c0000000688e7e30] [c000000000008564] syscall_exit+0x0/0x40 Jul 27 09:44:36 ibm-js22-06 kernel: sd 0:3:0:0: [sda] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) Jul 27 09:44:36 ibm-js22-06 kernel: sd 0:3:0:0: [sda] Write Protect is off Jul 27 09:44:36 ibm-js22-06 kernel: sd 0:3:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA Jul 27 09:44:36 ibm-js22-06 kernel: sda: sda1 sda2 sda3 Jul 27 09:44:36 ibm-js22-06 kernel: sd 0:3:0:0: [sda] Attached SCSI disk Jul 27 09:44:36 ibm-js22-06 kernel: scsi: unknown device type 31 Jul 27 09:44:36 ibm-js22-06 kernel: scsi 0:255:255:255: No Device IBM 572C001SISIOA 0150 PQ: 0 ANSI: 0 Jul 27 09:44:36 ibm-js22-06 kernel: dracut: Scanning devices sda3 for LVM logical volumes vg_ibmjs2206/lv_root vg_ibmjs2206/lv_swap Version-Release number of selected component (if applicable): 6.2 How reproducible: every time is the system has scsi Enclosure Steps to Reproduce: 1.install 158 kernel works 2.install 158 + the patches for 707755 and the problem occurs 3. Actual results: Expected results: Additional info: I've narrow this down by finding the latest 6.2 kernel that works and that was 158. I took 158 and added the patches for 707755 and the problem happens. I've seen this on at least 2 systems so far. 1 at RH and the other at IBM. not sure how this patchset would cause this still looking at more closely
Vivek, I haven't found a good why to bisect this patchset. any ideas of how to break up the patchset to narrow this down or have you started to debug this yet? -Steve
*** Bug 728424 has been marked as a duplicate of this bug. ***
Created attachment 517798 [details] sosreport
Created attachment 517799 [details] systcl -a output
Created attachment 517800 [details] Console log
Hmm..., So with the above patch series, system still works just that it takes a long time to boot. I am wondering if it is related to issue where queue deletion will get slow in case of CFQ. I had posted a patch for that upstream. I think a quick way to verify that will be using any other scheduler then cfq. How about using elevator=deadline on command line and see if you still see the issue.
Can you please try following CFQ patch I posted upstream and see if issue is resolved. https://lkml.org/lkml/2011/8/1/259
(In reply to comment #8) > Can you please try following CFQ patch I posted upstream and see if issue is > resolved. > > https://lkml.org/lkml/2011/8/1/259 I'll build this patch and see if it fixes the problem
ok, before building the patch, can you please do quick test of using elevator=deadline on kernel command line. This will atleast confirm that problem is with CFQ. If that's not the case then above patch will most likely not help.
(In reply to comment #9) > (In reply to comment #8) > > Can you please try following CFQ patch I posted upstream and see if issue is > > resolved. > > > > https://lkml.org/lkml/2011/8/1/259 > > I'll build this patch and see if it fixes the problem I tested this patch on a system here and it fixes the issue. thanks for the help. -Steve
(In reply to comment #11) > > > Can you please try following CFQ patch I posted upstream and see if issue is > > > resolved. > > > > > > https://lkml.org/lkml/2011/8/1/259 > > > > I'll build this patch and see if it fixes the problem > > I tested this patch on a system here and it fixes the issue. thanks for the > help. > Great. Did you do a brew build across all arches or just a local build for ppc64? If you did brew build across all arches, i could skip that step. Otherwise i will do a brew build, and post the patch to rhkernel-list.
(In reply to comment #12) > (In reply to comment #11) > > > > Can you please try following CFQ patch I posted upstream and see if issue is > > > > resolved. > > > > > > > > https://lkml.org/lkml/2011/8/1/259 > > > > > > I'll build this patch and see if it fixes the problem > > > > I tested this patch on a system here and it fixes the issue. thanks for the > > help. > > > > Great. Did you do a brew build across all arches or just a local build for > ppc64? If you did brew build across all arches, i could skip that step. > Otherwise i will do a brew build, and post the patch to rhkernel-list. yes I have brewbuild. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3549946 only have tested on powerpc
Created attachment 518515 [details] console log showing the issue is fixed ------- Comment (attachment only) From 2011-08-16 11:06 EDT-------
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available on kernel-2.6.32-192.el6
on my system ibm-js22-06 I installed 192 kernel and the fix works.
------- Comment From sbest.com 2011-10-17 14:31 EDT------- making Naveed's comment external this issue is no more reproducible in RHEL6.2Snap1. closing as it is fixed !
As comment #20. Change to VERIFY.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html