Bug 436791
| Summary: | Kernel BUG at drivers/scsi/iscsi_tcp.c:387 - invalid opcode: 0000 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | James Laska <jlaska> | ||||
| Component: | kernel | Assignee: | Mike Christie <mchristi> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 5.2 | CC: | coughlan, dzickus, jturner, mchristi | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2009-09-02 08:59:59 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 501988 | ||||||
| Attachments: |
|
||||||
Created attachment 297446 [details]
boot.log
Panic'd again, but the call trace looks *slightly* different
+---------------------+ Formatting +----------------------+
| |
| Formatting / file system... |
| |
| 98% Unable to
handle kernel paging request at 0000000000003a98 RIP:
[<ffffffff80237fae>] tcp_sendpage+0x38a/0x64c |
PGD 13dfe067 PUD 13ebc067 PMD 0 ------------------------------------+
Oops: 0000 [1] SMP
last sysfs file: /devices/platform/host4/session1/target4:0:0/4:0:0:0/model
CPU 0
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 ata_piix
libata e1000e us
Pid: 611, comm: scsi_wq_4 Tainted: G M 2.6.18-84.el5 #1
RIP: 0010:[<ffffffff80237fae>] [<ffffffff80237fae>] tcp_sendpage+0x38a/0x64c
RSP: 0018:ffff81001432fc60 EFLAGS: 00010246
RAX: ffffffff80331fa0 RBX: ffff810014fd6700 RCX: 0000000000000110
RDX: 0000000000003a98 RSI: ffff810014128400 RDI: ffff810004a475c8xt screen
RBP: ffff810004a47500 R08: 0000000000000000 R09: 0000000000000000
R10: ffff810004a47500 R11: 00000000000000c8 R12: 0000000000000110
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002000
FS: 0000000000000000(0000) GS:ffffffff8039d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000003a98 CR3: 0000000013dff000 CR4: 00000000000006e0
Process scsi_wq_4 (pid: 611, threadinfo ffff81001432e000, task ffff81001edea100)
Stack: 00008000141284d8 fffffffffffde000 000005a800002000 00000000000005a8
0000000000000001 0000000000003a98 ffff810000795308 ffff810014128400
0000000000003a98 ffff8100143090d0 ffff8100143090f8 00000000fffe0000
Call Trace:
[<ffffffff881c945e>] :iscsi_tcp:iscsi_send_data+0x9b/0x527
[<ffffffff881ca1b9>] :iscsi_tcp:iscsi_tcp_ctask_xmit+0x82f/0xba1
[<ffffffff881b909d>] :libiscsi:iscsi_xmitworker+0x0/0x1d5
[<ffffffff881b891d>] :libiscsi:iscsi_xmit_ctask+0x39/0x66
[<ffffffff881b9209>] :libiscsi:iscsi_xmitworker+0x16c/0x1d5
[<ffffffff8004ce1f>] run_workqueue+0x94/0xe4
[<ffffffff80049734>] worker_thread+0x0/0x122
[<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
[<ffffffff80049824>] worker_thread+0xf0/0x122
[<ffffffff8008ab24>] default_wake_function+0x0/0xe
[<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
[<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032518>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
[<ffffffff8003241a>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Hitting this panic on systems *without* iBFT hardware
sh-3.2# ------------[ cut here ]------------
kernel BUG at drivers/scsi/iscsi_tcp.c:387!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /devices/platform/host6/session1/target6:0:0/6:0:0:0/model
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 sata_nv
libata mptspi scs
CPU: 0
EIP: 0060:[<f893fc5d>] Not tainted VLI
EFLAGS: 00010246 (2.6.18-84.el5 #1)
EIP is at iscsi_tcp_data_recv+0x563/0x11fb [iscsi_tcp]
eax: 00000000 ebx: e79fb852 ecx: 00000000 edx: 00000000
esi: 00000000 edi: 00000004 ebp: f4f82ae4 esp: c072adbc
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c072a000 task=c06713c0 task.ti=c06e7000)
Stack: 00000000 00000000 c072ae5c f7ceb940 00000000 f493dd6c f493f780 0020a8c0
f4f82ae4 e79fb852 f4974800 f7fae9c0 c072ae6c 00000004 f4974844 00010000
d45cb1c0 0000e540 c05bab67 c072ae24 c072aebc c072ae6c c072ae30 f7faecc0
Call Trace:
[<c05bab67>] fib_rules_lookup+0x37/0x61
[<c042d9fa>] lock_timer_base+0x15/0x2f
[<c05d54e4>] tcp_cong_avoid+0x15/0x25
[<c05d7517>] tcp_ack+0x15a0/0x16d2
[<c05d3209>] tcp_read_sock+0x5e/0x163
[<f893f6fa>] iscsi_tcp_data_recv+0x0/0x11fb [iscsi_tcp]
[<f8940bee>] iscsi_tcp_data_ready+0x32/0x3f [iscsi_tcp]
[<c05da2da>] tcp_rcv_established+0x529/0x7c7
[<c05df49f>] tcp_v4_do_rcv+0x25/0x2bc
[<c05e6be2>] arp_process+0x4c6/0x4d0
[<c05e199a>] tcp_v4_rcv+0x8dd/0x92f
[<c0407406>] do_IRQ+0xa5/0xae
[<c05c942b>] ip_local_deliver+0x159/0x202
[<c05c9298>] ip_rcv+0x475/0x4af
[<c05aeae2>] netif_receive_skb+0x2b1/0x325
[<c05b02db>] process_backlog+0x7a/0xcd
[<c05b0482>] net_rx_action+0x92/0x175
[<c042a7e6>] __do_softirq+0x5a/0xbb
[<c0407461>] do_softirq+0x52/0x9d
[<c0407406>] do_IRQ+0xa5/0xae
[<c040592e>] common_interrupt+0x1a/0x20
[<c0403b98>] default_idle+0x0/0x59
[<c0403bc9>] default_idle+0x31/0x59
[<c0403c90>] cpu_idle+0x9f/0xb9
[<c06ec9ee>] start_kernel+0x379/0x380
=======================
Code: b0 01 86 45 48 58 5a e9 af 03 00 00 8b 54 24 38 b9 04 00 00 00 8b 82 94 00
00 00 8d 94 24 94 00 00 00 e8 63 66 af c7 85 c0 75 08 <0f> 0b 83 01 a8 0f 94 f8
8b 53 18 8
EIP: [<f893fc5d>] iscsi_tcp_data_recv+0x563/0x11fb [iscsi_tcp] SS:ESP 0068:c072adbc
<0>Kernel panic - not syncing: Fatal exception in interrupt
Hey James, We should never hit this :( We can only hit this when the target sends more data than it should or due to some really crazy locking/race bug that should not be possible. The iscsi driver should not bug out, and should handle this gracefully either way though. This might be something like what IBM reported where they said we need to add some barriers to the send and recv path or we get some weird bugs. Or it might be a bad target. What target are you using, and is it in westford or RDU? And do you only hit this with ibft setups? Do you only hit this with e1000? Greetings Mike, I'm hitting this using an IBM DS300 target that I use in RDU. I'm hitting this on the initiator on all architectures even against a known working tree (RHEL5.2-Server-20080303.0). I'm hitting it on systems with (comment#0) and without (comment#3) iBFT. I'm rebooting the target and retrying ... hopefully something got wedged on the target. I can also try a software target if I continue to hit this. I've just rebooted my iSCSI IBM DS300 target, and now installs complete without error. Very strange. Definitely something crufty on the target that linux didn't/couldn't handle. So this appears to no longer be an issue for me since rebooting my target. I can close this defect out as WORKSFORME with the noted workaround of rebooting the target. Or, would you prefer leaving this open to address any similar issues noted by IBM? Leave this open for me. For RHEL 5.2 I am going to add some code so we handle this error more gracefully. We do not need to BUG out here. If the target is doing something bad and sending more data than we negotiated for we do not have to oops the box. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This did not make 5.3, due to a lack of resources. Proposing for 5.4. I have a patch for this in my tree for 5.4. It is mixed up with the code for offload though, so as long as that goes in this will get in too. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Updating PM score. in kernel-2.6.18-144.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |
Description of problem: iSCSI iBFT install on RHEL5.2-Server-20080306.0 panics during disk format. Version-Release number of selected component (if applicable): How reproducible: Seems to occur every time now Steps to Reproduce: 1. Initiate install of RHEL5.2-Server-20080306.0 on an iBFT enabled system 2. During disk format step of the installation, the system panics 3. Actual results: +---------------------+ Formatting +----------------------+| | | Formatting / file system... | | | | 0% | | | +---------------------------------------------------------+ 1 23 45 67 89 10 12 34 56 78 9 201 23 45 67 89 30 12 34 56 78 9 401 423 445 467 48950123%45% 67% 89% 60% 12 34 56 78 970 1 23 45 67 89 80 12 34 56 78 990 1 23 45 6----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at drivers/scsi/iscsi_tcp.c:387 invalid opcode: 0000 [1] SMP last sysfs file: /devices/platform/host4/session1/target4:0:0/4:0:0:0/model CPU 0 Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 ata_piix libata e1000e uhci_hcd ehci_hcd iscsi_ibft iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom squashfs pcspkr edd floppy loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs Pid: 0, comm: swapper Tainted: G M 2.6.18-84.el5 #1 RIP: 0010:[<ffffffff881cb20c>] [<ffffffff881cb20c>] :iscsi_tcp:iscsi_tcp_data_recv+0x639/0x1476 RSP: 0018:ffffffff80416a40 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8100179d7054 RCX: ffffffff80416b10 RDX: ffff810013c2de00 RSI: ffff810013cf3380 RDI: ffffffff80416b10 RBP: ffff810013c2de00 R08: ffffffff80416b10 R09: ffff810013cf3360 R10: ffffffff80416c10 R11: 0000000000000030 R12: 0000000000000000 R13: ffff810013c2de68 R14: ffff810014265d00 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffffffff8039d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000011c69690 CR3: 0000000013cc5000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff803ce000, task ffffffff802e2ae0) Stack: ffff81001e5c04e0 ffffffff80416b20 ffff81001e5c04c0 ffffffff80031c26 ffff810017045670 0000000000000030 00000000143b4180 ffff810013c27a90 ffff8100143a9980 ffff810014265d00 ffff8100179d7054 0000000000000000 Call Trace: <IRQ> [<ffffffff80031c26>] ip_output+0x29a/0x2dd [<ffffffff80076eb6>] smp_apic_timer_interrupt+0x41/0x47 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c [<ffffffff8003d613>] lock_timer_base+0x1b/0x3c [<ffffffff800359d1>] tcp_read_sock+0x6b/0x190 [<ffffffff881cabd3>] :iscsi_tcp:iscsi_tcp_data_recv+0x0/0x1476 [<ffffffff881cc28e>] :iscsi_tcp:iscsi_tcp_data_ready+0x3d/0x4c [<ffffffff8001b6d1>] tcp_rcv_established+0x63a/0x925 [<ffffffff8003b0d0>] tcp_v4_do_rcv+0x2a/0x300 [<ffffffff8002710f>] tcp_v4_rcv+0x9f6/0xa5f [<ffffffff80034386>] ip_local_deliver+0x19d/0x263 [<ffffffff80035519>] ip_rcv+0x540/0x583 [<ffffffff8002015f>] netif_receive_skb+0x330/0x3ae [<ffffffff882106f5>] :e1000e:e1000_receive_skb+0x143/0x164 [<ffffffff88213af4>] :e1000e:e1000_clean_rx_irq+0x24d/0x2ee [<ffffffff88211f75>] :e1000e:e1000_clean+0x77/0x246 [<ffffffff8000c4c1>] net_rx_action+0xa4/0x1a4 [<ffffffff882122c7>] :e1000e:e1000_intr_msi+0xd4/0xe7 [<ffffffff80011e47>] __do_softirq+0x5e/0xd6 [<ffffffff80154719>] end_msi_irq_wo_maskbit+0x9/0x16 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006c55e>] do_softirq+0x2c/0x85 [<ffffffff8006c3e6>] do_IRQ+0xec/0xf5 [<ffffffff80056bda>] mwait_idle+0x0/0x4a [<ffffffff8005d615>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80056c10>] mwait_idle+0x36/0x4a [<ffffffff80048a93>] cpu_idle+0x95/0xb8 [<ffffffff803d9801>] start_kernel+0x220/0x225 [<ffffffff803d922f>] _sinittext+0x22f/0x236 Code: 0f 0b 68 7c c7 1c 88 c2 83 01 8b 53 18 48 8b 84 24 d0 00 00 RIP [<ffffffff881cb20c>] :iscsi_tcp:iscsi_tcp_data_recv+0x639/0x1476 RSP <ffffffff80416a40> <0>Kernel panic - not syncing: Fatal exception Expected results: No panic Additional info: