Red Hat Bugzilla – Bug 674064
[RHEL6] panic in scsi_init_io() during connectathon
Last modified: 2011-08-05 17:08:10 EDT
Created attachment 476199 [details] full console log Description of problem: during -109.el6 scratch test, we hit this: SELinux: initialized (dev 0:13, type nfs), uses genfs_contexts general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu5/cache/index2/shared_cpu_map CPU 1 Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss bluetooth rfkill cryptd aes_x86_64 aes_generic ts_kmp nls_koi8_u nls_cp932 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 dm_mirror dm_region_hash dm_log serio_raw edac_core edac_mce_amd k10temp broadcom tg3 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 sg shpchp ext4 mbcache jbd2 firewire_ohci firewire_core crc_itu_t sr_mod cdrom sd_mod crc_t10dif mvsas libsas scsi_transport_sas pata_acpi ata_generic pata_atiixp ahci radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: stap_abc29162d8f4841d4587c8bcc7eb8338_880] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss bluetooth rfkill cryptd aes_x86_64 aes_generic ts_kmp nls_koi8_u nls_cp932 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 dm_mirror dm_region_hash dm_log serio_raw edac_core edac_mce_amd k10temp broadcom tg3 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 sg shpchp ext4 mbcache jbd2 firewire_ohci firewire_core crc_itu_t sr_mod cdrom sd_mod crc_t10dif mvsas libsas scsi_transport_sas pata_acpi ata_generic pata_atiixp ahci radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: stap_abc29162d8f4841d4587c8bcc7eb8338_880] Pid: 15524, comm: dumpcap Not tainted 2.6.32-109.el6scratch.x86_64.debug #1 Snook RIP: 0010:[<ffffffff8137c295>] [<ffffffff8137c295>] scsi_init_io+0xc5/0x170 RSP: 0018:ffff880005003b50 EFLAGS: 00010082 RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800764550b8 RCX: 0000000000000000 RDX: ffff88007ac10c00 RSI: ffffffff812840c0 RDI: ffff8800789fa1e8 RBP: ffff880005003b80 R08: 0000000000000000 R09: ffff8800764550b8 R10: 09f911029d74e35b R11: 0000000000000000 R12: 0000000000000002 R13: 0000000000000020 R14: 0000000000000002 R15: ffff880077730a20 FS: 00007f9bf8a91700(0000) GS:ffff880005000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007ffa05e1ccc1 CR3: 0000000078b00000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dumpcap (pid: 15524, threadinfo ffff880079768000, task ffff88007ac10c00) Stack: ffff880005003b90 ffff8800773f3c20 ffff8800789fa000 ffff8800775e30a8 <0> ffff8800789fa000 ffff880077730a20 ffff880005003ba0 ffffffff8137c3b5 <0> ffff8800773f3c20 ffff8800775e30a8 ffff880005003c30 ffffffffa021cd5f Call Trace: <IRQ> [<ffffffff8137c3b5>] scsi_setup_fs_cmnd+0x75/0xe0 [<ffffffffa021cd5f>] sd_prep_fn+0x1df/0xea0 [sd_mod] [<ffffffff8137bacc>] ? scsi_request_fn+0x43c/0x5b0 [<ffffffff81268316>] blk_peek_request+0xd6/0x210 [<ffffffff8137b6fb>] scsi_request_fn+0x6b/0x5b0 [<ffffffff81268e0a>] __blk_run_queue+0x7a/0x160 [<ffffffff81268fd0>] blk_run_queue+0x30/0x50 [<ffffffff8137ac1a>] scsi_run_queue+0xda/0x3c0 [<ffffffff81374760>] ? __scsi_put_command+0x60/0xa0 [<ffffffff8137be52>] scsi_next_command+0x42/0x60 [<ffffffff8137cc5e>] scsi_io_completion+0x35e/0x550 [<ffffffff81373b02>] scsi_finish_command+0xc2/0x130 [<ffffffff8137cfbd>] scsi_softirq_done+0x14d/0x170 [<ffffffff8126db3d>] blk_done_softirq+0x8d/0xa0 [<ffffffff81072193>] __do_softirq+0xd3/0x220 [<ffffffff8100c3cc>] call_softirq+0x1c/0x30 [<ffffffff8100e09d>] do_softirq+0xad/0xe0 [<ffffffff81071d85>] irq_exit+0x95/0xa0 [<ffffffff8150ee45>] do_IRQ+0x75/0xf0 [<ffffffff8100bb53>] ret_from_intr+0x0/0x16 <EOI> [<ffffffff81508a35>] ? _spin_unlock_irqrestore+0x45/0x80 [<ffffffff812936db>] __debug_object_init+0x9b/0x3d0 [<ffffffff81293a32>] debug_object_init_on_stack+0x22/0x30 [<ffffffff8109659e>] hrtimer_init_on_stack+0x2e/0x50 [<ffffffff8150726f>] schedule_hrtimeout_range+0x5f/0x170 [<ffffffff814db080>] ? packet_poll+0x80/0xe0 [<ffffffff814db0b9>] ? packet_poll+0xb9/0xe0 [<ffffffff810aa28d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81072ce7>] ? local_bh_enable_ip+0x97/0x100 [<ffffffff815089a4>] ? _spin_unlock_bh+0x34/0x40 [<ffffffff811a06f9>] poll_schedule_timeout+0x39/0x60 [<ffffffff811a0db9>] do_select+0x5c9/0x6f0 [<ffffffff811a07f0>] ? do_select+0x0/0x6f0 [<ffffffff811700bd>] ? cache_free_debugcheck+0x1ad/0x270 [<ffffffff811a0ee0>] ? __pollwait+0x0/0xf0 [<ffffffff811a0fd0>] ? pollwake+0x0/0x60 [<ffffffffa02abf09>] ? ext4_da_write_end+0xf9/0x330 [ext4] [<ffffffff81097a03>] ? up_read+0x23/0x40 [<ffffffff81120dfe>] ? generic_file_buffered_write+0x1de/0x2c0 [<ffffffff81070727>] ? current_fs_time+0x27/0x30 [<ffffffff81146f5c>] ? might_fault+0x5c/0xb0 [<ffffffff8112303b>] ? generic_file_aio_write+0x5b/0xe0 [<ffffffff811a1889>] ? core_sys_select+0x49/0x310 [<ffffffff81146fa5>] ? might_fault+0xa5/0xb0 [<ffffffff81146f5c>] ? might_fault+0x5c/0xb0 [<ffffffff811a1a18>] core_sys_select+0x1d8/0x310 [<ffffffff811a1889>] ? core_sys_select+0x49/0x310 [<ffffffff81091e70>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810127c9>] ? read_tsc+0x9/0x20 [<ffffffff8109e429>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff810127c9>] ? read_tsc+0x9/0x20 [<ffffffff8109e429>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff811a1da7>] sys_select+0x47/0x110 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Code: ba 20 00 00 00 e8 5c fc ff ff 85 c0 41 89 c6 74 45 48 89 df 45 89 f4 e8 8a fe ff ff 48 89 df e8 12 85 ff ff 48 8b 83 80 00 00 00 <48> c7 80 d8 00 00 00 00 00 00 00 44 89 e0 48 8b 5d d8 4c 8b 65 RIP [<ffffffff8137c295>] scsi_init_io+0xc5/0x170 RSP <ffff880005003b50> ---[ end trace d57daefeda8b7968 ]--- Notice that 109 didn't have any specific scsi layer changes: [aris@napanee rhel6]$ git diff kernel-2.6.32-108.el6..kernel-2.6.32-109.el6 | diffstat | grep scsi b/drivers/firmware/iscsi_ibft.c | 740 +++----- b/drivers/firmware/iscsi_ibft_find.c | 61 b/drivers/s390/scsi/zfcp_aux.c | 6 b/drivers/s390/scsi/zfcp_def.h | 26 b/drivers/s390/scsi/zfcp_ext.h | 4 b/drivers/s390/scsi/zfcp_fc.c | 54 b/drivers/s390/scsi/zfcp_fsf.c | 10 b/drivers/s390/scsi/zfcp_qdio.c | 41 b/drivers/scsi/Kconfig | 10 b/drivers/scsi/Makefile | 4 b/drivers/scsi/be2iscsi/Kconfig | 3 b/drivers/scsi/be2iscsi/be.h | 6 b/drivers/scsi/be2iscsi/be_cmds.c | 118 + b/drivers/scsi/be2iscsi/be_cmds.h | 174 +- b/drivers/scsi/be2iscsi/be_iscsi.c | 267 +-- b/drivers/scsi/be2iscsi/be_iscsi.h | 4 b/drivers/scsi/be2iscsi/be_main.c | 643 ++++++- b/drivers/scsi/be2iscsi/be_main.h | 36 b/drivers/scsi/be2iscsi/be_mgmt.c | 136 + b/drivers/scsi/be2iscsi/be_mgmt.h | 19 b/drivers/scsi/cxgbi/Kconfig | 2 b/drivers/scsi/cxgbi/Makefile | 2 b/drivers/scsi/cxgbi/cxgb3i/Kbuild | 3 b/drivers/scsi/cxgbi/cxgb3i/Kconfig | 7 b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 1465 +++++++++++++++++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.h | 62 b/drivers/scsi/cxgbi/cxgb4i/Kbuild | 3 b/drivers/scsi/cxgbi/cxgb4i/Kconfig | 7 b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 1607 +++++++++++++++++++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.h | 43 b/drivers/scsi/cxgbi/libcxgbi.c | 2615 +++++++++++++++++++++++++++++++ b/drivers/scsi/cxgbi/libcxgbi.h | 745 ++++++++ b/drivers/scsi/fcoe/fcoe.c | 42 b/drivers/scsi/fcoe/libfcoe.c | 1576 ++++++++++++++++-- b/drivers/scsi/fnic/fnic_main.c | 11 b/drivers/scsi/iscsi_boot_sysfs.c | 481 +++++ b/drivers/scsi/libfc/fc_disc.c | 44 b/drivers/scsi/libfc/fc_elsct.c | 2 b/drivers/scsi/libfc/fc_exch.c | 261 +-- b/drivers/scsi/libfc/fc_fcp.c | 163 + b/drivers/scsi/libfc/fc_libfc.c | 78 b/drivers/scsi/libfc/fc_libfc.h | 18 b/drivers/scsi/libfc/fc_lport.c | 163 - b/drivers/scsi/libfc/fc_rport.c | 574 ++++-- b/drivers/scsi/libiscsi.c | 10 b/drivers/scsi/scsi_transport_iscsi.c | 2 b/include/linux/iscsi_boot_sysfs.h | 123 + b/include/linux/iscsi_ibft.h | 20 b/include/scsi/fc/fc_els.h | 2 b/include/scsi/fc/fc_fip.h | 46 b/include/scsi/fc/fc_ns.h | 7 b/include/scsi/fc_encode.h | 7 b/include/scsi/fc_frame.h | 52 b/include/scsi/iscsi_if.h | 1 b/include/scsi/libfc.h | 80 b/include/scsi/libfcoe.h | 70 b/include/scsi/scsi_transport_iscsi.h | 1 drivers/scsi/cxgb3i/Kbuild | 4 drivers/scsi/cxgb3i/Kconfig | 7 drivers/scsi/cxgb3i/cxgb3i.h | 161 - drivers/scsi/cxgb3i/cxgb3i_ddp.c | 770 --------- drivers/scsi/cxgb3i/cxgb3i_ddp.h | 311 --- drivers/scsi/cxgb3i/cxgb3i_init.c | 132 - drivers/scsi/cxgb3i/cxgb3i_iscsi.c | 1017 ------------ drivers/scsi/cxgb3i/cxgb3i_offload.c | 1938 ---------------------- drivers/scsi/cxgb3i/cxgb3i_offload.h | 243 -- drivers/scsi/cxgb3i/cxgb3i_pdu.c | 494 ----- drivers/scsi/cxgb3i/cxgb3i_pdu.h | 59 and the machine is using pata_atiixp. While 108 has: 904f7d52d1ee8295b319f9d7a7cfbf5a4742aa03 [scsi] fix id computation in scsi_eh_target_reset d48b1e1fc588f13cd2f6817ae176c2acedd9a6e2 [scsi] fix the return value of scsi_target_queue_read() 326f45f33c122105a1e1975b38472ada7eb31416 [scsi] fix locking around blk_abort_request() Attached the complete log. machine in use was amd-snook-01.lab.bos.redhat.com (https://beaker.engineering.redhat.com/view/amd-snook-01.lab.bos.redhat.com). Recipe: http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=475254 I didn't do any further analisys.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release.
I am not sure why it would start showing up now, but we have had this resent oops in scsi_init_io fixed upstream. diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 9ade720..ee02d38 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1011,8 +1011,8 @@ int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask) err_exit: scsi_release_buffers(cmd); - scsi_put_command(cmd); cmd->request->special = NULL; + scsi_put_command(cmd); return error; } EXPORT_SYMBOL(scsi_init_io);
I have not been able to replicate this on that box. But there is definately a place where we can oops that is at the end of scsi_init_io. Should I just send the patch in comment #4? we need it either way.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This wasn't included in any kernel yet, moving back to POST.
Patch(es) available on kernel-2.6.32-114.el6
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html