Bug 436791 - Kernel BUG at drivers/scsi/iscsi_tcp.c:387 - invalid opcode: 0000
Kernel BUG at drivers/scsi/iscsi_tcp.c:387 - invalid opcode: 0000
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
low Severity medium
: rc
: ---
Assigned To: Mike Christie
Red Hat Kernel QE team
:
Depends On:
Blocks: 501988
  Show dependency treegraph
 
Reported: 2008-03-10 10:35 EDT by James Laska
Modified: 2013-09-02 02:24 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 04:59:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
boot.log (23.80 KB, text/plain)
2008-03-10 10:48 EDT, James Laska
no flags Details

  None (edit)
Description James Laska 2008-03-10 10:35:08 EDT
Description of problem:

iSCSI iBFT install on RHEL5.2-Server-20080306.0 panics during disk format.

Version-Release number of selected component (if applicable):


How reproducible:
Seems to occur every time now

Steps to Reproduce:
1. Initiate install of RHEL5.2-Server-20080306.0 on an iBFT enabled system
2. During disk format step of the installation, the system panics
3.
  
Actual results:
+---------------------+ Formatting +----------------------+|                   
                                     | | Formatting / file system...           
                 | |                                                         | |
0% | |                                                         |
+---------------------------------------------------------+                    
                                       1 23 45 67 89 10 12 34 56 78 9 201 23 45
67 89 30 12 34 56 78 9     401    423   445  467 48950123%45% 67%  89%   60%   
12 34 56 78 970 1 23 45 67 89 80 12 34 56 78 990 1 23 45 6----------- [cut here
] --------- [please bite here ] ---------
Kernel BUG at drivers/scsi/iscsi_tcp.c:387
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/platform/host4/session1/target4:0:0/4:0:0:0/model
CPU 0 
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 ata_piix
libata e1000e uhci_hcd ehci_hcd iscsi_ibft iscsi_tcp libiscsi
scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom squashfs pcspkr edd
floppy loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs
Pid: 0, comm: swapper Tainted: G   M  2.6.18-84.el5 #1
RIP: 0010:[<ffffffff881cb20c>]  [<ffffffff881cb20c>]
:iscsi_tcp:iscsi_tcp_data_recv+0x639/0x1476
RSP: 0018:ffffffff80416a40  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8100179d7054 RCX: ffffffff80416b10
RDX: ffff810013c2de00 RSI: ffff810013cf3380 RDI: ffffffff80416b10
RBP: ffff810013c2de00 R08: ffffffff80416b10 R09: ffff810013cf3360
R10: ffffffff80416c10 R11: 0000000000000030 R12: 0000000000000000
R13: ffff810013c2de68 R14: ffff810014265d00 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffffffff8039d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000011c69690 CR3: 0000000013cc5000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff803ce000, task ffffffff802e2ae0)
Stack:  ffff81001e5c04e0 ffffffff80416b20 ffff81001e5c04c0 ffffffff80031c26
 ffff810017045670 0000000000000030 00000000143b4180 ffff810013c27a90
 ffff8100143a9980 ffff810014265d00 ffff8100179d7054 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff80031c26>] ip_output+0x29a/0x2dd
 [<ffffffff80076eb6>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 [<ffffffff8003d613>] lock_timer_base+0x1b/0x3c
 [<ffffffff800359d1>] tcp_read_sock+0x6b/0x190
 [<ffffffff881cabd3>] :iscsi_tcp:iscsi_tcp_data_recv+0x0/0x1476
 [<ffffffff881cc28e>] :iscsi_tcp:iscsi_tcp_data_ready+0x3d/0x4c
 [<ffffffff8001b6d1>] tcp_rcv_established+0x63a/0x925
 [<ffffffff8003b0d0>] tcp_v4_do_rcv+0x2a/0x300
 [<ffffffff8002710f>] tcp_v4_rcv+0x9f6/0xa5f
 [<ffffffff80034386>] ip_local_deliver+0x19d/0x263
 [<ffffffff80035519>] ip_rcv+0x540/0x583
 [<ffffffff8002015f>] netif_receive_skb+0x330/0x3ae
 [<ffffffff882106f5>] :e1000e:e1000_receive_skb+0x143/0x164
 [<ffffffff88213af4>] :e1000e:e1000_clean_rx_irq+0x24d/0x2ee
 [<ffffffff88211f75>] :e1000e:e1000_clean+0x77/0x246
 [<ffffffff8000c4c1>] net_rx_action+0xa4/0x1a4
 [<ffffffff882122c7>] :e1000e:e1000_intr_msi+0xd4/0xe7
 [<ffffffff80011e47>] __do_softirq+0x5e/0xd6
 [<ffffffff80154719>] end_msi_irq_wo_maskbit+0x9/0x16
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006c55e>] do_softirq+0x2c/0x85
 [<ffffffff8006c3e6>] do_IRQ+0xec/0xf5
 [<ffffffff80056bda>] mwait_idle+0x0/0x4a
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80056c10>] mwait_idle+0x36/0x4a
 [<ffffffff80048a93>] cpu_idle+0x95/0xb8
 [<ffffffff803d9801>] start_kernel+0x220/0x225
 [<ffffffff803d922f>] _sinittext+0x22f/0x236


Code: 0f 0b 68 7c c7 1c 88 c2 83 01 8b 53 18 48 8b 84 24 d0 00 00 
RIP  [<ffffffff881cb20c>] :iscsi_tcp:iscsi_tcp_data_recv+0x639/0x1476
 RSP <ffffffff80416a40>
 <0>Kernel panic - not syncing: Fatal exception

Expected results:
No panic

Additional info:
Comment 1 James Laska 2008-03-10 10:48:03 EDT
Created attachment 297446 [details]
boot.log
Comment 2 James Laska 2008-03-10 11:59:38 EDT
Panic'd again, but the call trace looks *slightly* different

          +---------------------+ Formatting +----------------------+
          |                                                         |
          | Formatting / file system...                             |
          |                                                         |
          |                           98%                          Unable to
handle kernel paging request at 0000000000003a98 RIP:
 [<ffffffff80237fae>] tcp_sendpage+0x38a/0x64c                      |
PGD 13dfe067 PUD 13ebc067 PMD 0 ------------------------------------+
Oops: 0000 [1] SMP
last sysfs file: /devices/platform/host4/session1/target4:0:0/4:0:0:0/model
CPU 0
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 ata_piix
libata e1000e us
Pid: 611, comm: scsi_wq_4 Tainted: G   M  2.6.18-84.el5 #1
RIP: 0010:[<ffffffff80237fae>]  [<ffffffff80237fae>] tcp_sendpage+0x38a/0x64c
RSP: 0018:ffff81001432fc60  EFLAGS: 00010246
RAX: ffffffff80331fa0 RBX: ffff810014fd6700 RCX: 0000000000000110
RDX: 0000000000003a98 RSI: ffff810014128400 RDI: ffff810004a475c8xt screen
RBP: ffff810004a47500 R08: 0000000000000000 R09: 0000000000000000
R10: ffff810004a47500 R11: 00000000000000c8 R12: 0000000000000110
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002000
FS:  0000000000000000(0000) GS:ffffffff8039d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000003a98 CR3: 0000000013dff000 CR4: 00000000000006e0
Process scsi_wq_4 (pid: 611, threadinfo ffff81001432e000, task ffff81001edea100)
Stack:  00008000141284d8 fffffffffffde000 000005a800002000 00000000000005a8
 0000000000000001 0000000000003a98 ffff810000795308 ffff810014128400
 0000000000003a98 ffff8100143090d0 ffff8100143090f8 00000000fffe0000
Call Trace:
 [<ffffffff881c945e>] :iscsi_tcp:iscsi_send_data+0x9b/0x527
 [<ffffffff881ca1b9>] :iscsi_tcp:iscsi_tcp_ctask_xmit+0x82f/0xba1
 [<ffffffff881b909d>] :libiscsi:iscsi_xmitworker+0x0/0x1d5
 [<ffffffff881b891d>] :libiscsi:iscsi_xmit_ctask+0x39/0x66
 [<ffffffff881b9209>] :libiscsi:iscsi_xmitworker+0x16c/0x1d5
 [<ffffffff8004ce1f>] run_workqueue+0x94/0xe4
 [<ffffffff80049734>] worker_thread+0x0/0x122
 [<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80049824>] worker_thread+0xf0/0x122
 [<ffffffff8008ab24>] default_wake_function+0x0/0xe
 [<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032518>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003241a>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11
Comment 3 James Laska 2008-03-10 13:20:12 EDT
Hitting this panic on systems *without* iBFT hardware

sh-3.2# ------------[ cut here ]------------
kernel BUG at drivers/scsi/iscsi_tcp.c:387!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /devices/platform/host6/session1/target6:0:0/6:0:0:0/model
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 sata_nv
libata mptspi scs
CPU:    0
EIP:    0060:[<f893fc5d>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-84.el5 #1)
EIP is at iscsi_tcp_data_recv+0x563/0x11fb [iscsi_tcp]
eax: 00000000   ebx: e79fb852   ecx: 00000000   edx: 00000000
esi: 00000000   edi: 00000004   ebp: f4f82ae4   esp: c072adbc
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=c072a000 task=c06713c0 task.ti=c06e7000)
Stack: 00000000 00000000 c072ae5c f7ceb940 00000000 f493dd6c f493f780 0020a8c0
       f4f82ae4 e79fb852 f4974800 f7fae9c0 c072ae6c 00000004 f4974844 00010000
       d45cb1c0 0000e540 c05bab67 c072ae24 c072aebc c072ae6c c072ae30 f7faecc0
Call Trace:
 [<c05bab67>] fib_rules_lookup+0x37/0x61
 [<c042d9fa>] lock_timer_base+0x15/0x2f
 [<c05d54e4>] tcp_cong_avoid+0x15/0x25
 [<c05d7517>] tcp_ack+0x15a0/0x16d2
 [<c05d3209>] tcp_read_sock+0x5e/0x163
 [<f893f6fa>] iscsi_tcp_data_recv+0x0/0x11fb [iscsi_tcp]
 [<f8940bee>] iscsi_tcp_data_ready+0x32/0x3f [iscsi_tcp]
 [<c05da2da>] tcp_rcv_established+0x529/0x7c7
 [<c05df49f>] tcp_v4_do_rcv+0x25/0x2bc
 [<c05e6be2>] arp_process+0x4c6/0x4d0
 [<c05e199a>] tcp_v4_rcv+0x8dd/0x92f
 [<c0407406>] do_IRQ+0xa5/0xae
 [<c05c942b>] ip_local_deliver+0x159/0x202
 [<c05c9298>] ip_rcv+0x475/0x4af
 [<c05aeae2>] netif_receive_skb+0x2b1/0x325
 [<c05b02db>] process_backlog+0x7a/0xcd
 [<c05b0482>] net_rx_action+0x92/0x175
 [<c042a7e6>] __do_softirq+0x5a/0xbb
 [<c0407461>] do_softirq+0x52/0x9d
 [<c0407406>] do_IRQ+0xa5/0xae
 [<c040592e>] common_interrupt+0x1a/0x20
 [<c0403b98>] default_idle+0x0/0x59
 [<c0403bc9>] default_idle+0x31/0x59
 [<c0403c90>] cpu_idle+0x9f/0xb9
 [<c06ec9ee>] start_kernel+0x379/0x380
 =======================
Code: b0 01 86 45 48 58 5a e9 af 03 00 00 8b 54 24 38 b9 04 00 00 00 8b 82 94 00
00 00 8d 94 24 94 00 00 00 e8 63 66 af c7 85 c0 75 08 <0f> 0b 83 01 a8 0f 94 f8
8b 53 18 8 
EIP: [<f893fc5d>] iscsi_tcp_data_recv+0x563/0x11fb [iscsi_tcp] SS:ESP 0068:c072adbc
 <0>Kernel panic - not syncing: Fatal exception in interrupt
Comment 4 Mike Christie 2008-03-10 13:40:45 EDT
Hey James,

We should never hit this :( We can only hit this when the target sends more data
than it should or due to some really crazy locking/race bug that should not be
possible. The iscsi driver should not bug out, and should handle this gracefully
either way though.

This might be something like what IBM reported where they said we need to add
some barriers to the send and recv path or we get some weird bugs.

Or it might be a bad target. What target are you using, and is it in westford or
RDU?

And do you only hit this with ibft setups? Do you only hit this with e1000?
Comment 5 James Laska 2008-03-10 14:15:25 EDT
Greetings Mike,

I'm hitting this using an IBM DS300 target that I use in RDU.  I'm hitting this
on the initiator on all architectures even against a known working tree
(RHEL5.2-Server-20080303.0).  I'm hitting it on systems with (comment#0) and
without (comment#3) iBFT.

I'm rebooting the target and retrying ... hopefully something got wedged on the
target.  I can also try a software target if I continue to hit this.
Comment 6 James Laska 2008-03-10 14:16:11 EDT
I've just rebooted my iSCSI IBM DS300 target, and now installs complete without
error.  Very strange.

Definitely something crufty on the target that linux didn't/couldn't handle.
Comment 7 James Laska 2008-03-11 07:32:36 EDT
So this appears to no longer be an issue for me since rebooting my target.  I
can close this defect out as WORKSFORME with the noted workaround of rebooting
the target.

Or, would you prefer leaving this open to address any similar issues noted by IBM?
Comment 8 Mike Christie 2008-03-11 12:33:11 EDT
Leave this open for me. For RHEL 5.2 I am going to add some code so we handle
this error more gracefully. We do not need to BUG out here. If the target is
doing something bad and sending more data than we negotiated for we do not have
to oops the box.
Comment 9 RHEL Product and Program Management 2008-06-27 22:39:10 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 10 Tom Coughlan 2008-10-28 14:43:48 EDT
This did not make 5.3, due to a lack of resources. Proposing for 5.4.
Comment 11 Mike Christie 2009-02-12 12:41:35 EST
I have a patch for this in my tree for 5.4. It is mixed up with the code for offload though, so as long as that goes in this will get in too.
Comment 12 RHEL Product and Program Management 2009-02-12 13:19:45 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 13 RHEL Product and Program Management 2009-02-16 10:43:41 EST
Updating PM score.
Comment 14 Don Zickus 2009-05-06 13:13:50 EDT
in kernel-2.6.18-144.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 17 errata-xmlrpc 2009-09-02 04:59:59 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.