Bug 759430

Summary: be2iscsi: kernel panic at "be2iscsi:beiscsi_process_cq+0x155/0x689" after iscsi login
Product: Red Hat Enterprise Linux 5 Reporter: Gris Ge <fge>
Component: kernelAssignee: Mike Christie <mchristi>
Status: CLOSED CURRENTRELEASE QA Contact: Gris Ge <fge>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.7CC: bgoncalv, ccui, jayamohan.kallickal, mchristi
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-09 09:00:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gris Ge 2011-12-02 11:21:46 UTC
Description of problem:

Got kernel panic after iscsi login.
=====
(beiscsi_process_cq():1740):CQ Error notification for cmd.. code 19 cid 0x0
logger: 2011-12-02 05:42:49 /usr/bin/rhts-test-runner.sh 28896 240 hearbeat...
Unable to handle kernel NULL pointer dereference at 0000000000000170 RIP:
 [<ffffffff8859e2cf>] :be2iscsi:beiscsi_process_cq+0x155/0x689
PGD 1a0982067 PUD 18e863067 PMD 0
Oops: 0002 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:09:00.0/0000:0a:01.0/0000:0e:00.3/host48/session2/target48:0:1/48:0:1:0/state
CPU 0
Modules linked in: sg be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic cxgb3i autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 xfrm_nalgo crypto_api uio libcxgbi cxgb3 libiscsi_tcp dm_mirror dm_multipath scsi_dh video backlight sbs power_meter i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ide_cd i5k_amb cdrom hwmon hpilo tpm_tis tpm libiscsi2 pcspkr serio_raw be2net 8021q bnx2 i5000_edac tpm_bios scsi_transport_iscsi2 scsi_transport_iscsi edac_mc dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 9135, comm: dt Not tainted 2.6.18-298.el5 #1
RIP: 0010:[<ffffffff8859e2cf>]  [<ffffffff8859e2cf>] :be2iscsi:beiscsi_process_cq+0x155/0x689
RSP: 0000:ffffffff804afe80  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8101a5ccc5c0 RCX: 0000000080000001
RDX: 0000000000000000 RSI: ffff8100cfdb4180 RDI: 0000000080000001
RBP: 0000000080280001 R08: ffff8101a5ccc640 R09: ffff8101a6221290
R10: 000000000000003c R11: 0000000000000246 R12: 00000000000005c0
R13: ffff8100cfdd9f60 R14: 0000000000000001 R15: ffff810196c985a0
FS:  00002ae22393d070(0000) GS:ffffffff8042f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000170 CR3: 000000019be9c000 CR4: 00000000000006e0
Process dt (pid: 9135, threadinfo ffff810192d6c000, task ffff8101a851a820)
Stack:  ffff8100cf7f0ca0 0000000000000000 ffff8101a8b7ad40 0000000002bbf20e
 000000000000a800 000000000000000a ffff8100cf7f0040 000000000000000a
 0000000000000100 0000000000000000 00000000ffffe2e7 ffffffff8859e817
Call Trace:
 <IRQ>  [<ffffffff8859e817>] :be2iscsi:be_iopoll+0x14/0x50
 [<ffffffff8859ff6a>] :be2iscsi:be_isr_msix+0x1cf/0x1e1
 [<ffffffff8014be3a>] blk_iopoll_softirq+0x60/0xe0
 [<ffffffff8001251d>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d646>] do_softirq+0x2c/0x7d
 [<ffffffff8006d4d6>] do_IRQ+0xee/0xf7
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 <EOI>

Code: 89 90 70 01 00 00 89 c8 c1 ef 08 25 00 00 ff 00 c1 e8 10 74
RIP  [<ffffffff8859e2cf>] :be2iscsi:beiscsi_process_cq+0x155/0x689
 RSP <ffffffff804afe80>
CR2: 0000000000000170
 <0>Kernel panic - not syncing: Fatal exception
=====

Version-Release number of selected component (if applicable):
kernel -298
be2iscsi WITHOUT CHAP

How reproducible:
no sure. Hit by a beaker job 
https://beaker.engineering.redhat.com/recipes/347188.

Retrying at https://beaker.engineering.redhat.com/jobs/165698

Steps to Reproduce:
1. iscsi discovery via be2iscsi target
2. Loging.
3. IO stress by dt.
  
Actual results:
kernel panic

Expected results:
I/O stress finished correctly.

Additional info:
Sorry, no kdump, it's in my plan for adding kdump setup in auto test.

This is look like Bug #738934, a customer said they hit it without CHAP. Then I hit it in RHEL 5 without CHAP.

Comment 1 Gris Ge 2011-12-02 11:44:15 UTC
When I shutting down the OS, I got stable reproduce (2 out of 2) for this panic
====
Kernel BUG at lib/iomap.c:93
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq
CPU 0
Modules linked in: sg lockd sunrpc acpi_cpufreq freq_table mperf ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 libiscsi_tcp dm_mirror dm_multipath scsi_dh video backlight sbs power_meter i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport be2iscsi i5k_amb libiscsi2 be2net ide_cd hwmon scsi_transport_iscsi2 tpm_tis cdrom bnx2 8021q scsi_transport_iscsi pcspkr tpm tpm_bios serio_raw i5000_edac hpilo edac_mc dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 3100, comm: reboot Not tainted 2.6.18-298.el5 #1
RIP: 0010:[<ffffffff8015a776>]  [<ffffffff8015a776>] iowrite32+0x17/0x2d
RSP: 0018:ffffffff804afdc8  EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffff810005eb4840 RCX: ffff8100cfa2d400
RDX: 0000000000000021 RSI: 0000000000000040 RDI: 0000000001210002
RBP: 0000000001210002 R08: 000000006a072100 R09: ffff8101a6764d80
R10: 0000000000000000 R11: 0000000000000100 R12: ffff8101a08ba1c0
R13: ffff8101a08ba240 R14: 0000000000000000 R15: ffff8101a889c000
FS:  00002ad044e196e0(0000) GS:ffffffff8042f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff211c3ff0 CR3: 00000001a0b0c000 CR4: 00000000000006e0
Process reboot (pid: 3100, threadinfo ffff8101a06e2000, task ffff8101a5a337e0)
Stack:  ffffffff882e57d6 0000000180062bb4 ffffffff804aff08 0000000037fe3000
 ffff8101a08c7a90 0000000000000000 ffffffff804afe70 ffff8101a08b8c88
 0000000000000000 ffff8101a889c000 ffffffff882c6140 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff882e57d6>] :be2iscsi:beiscsi_task_xmit+0x4d2/0x51d
 [<ffffffff882c6140>] :libiscsi2:__iscsi_conn_send_pdu+0x1dd/0x241
 [<ffffffff882c7264>] :libiscsi2:iscsi_check_transport_timeouts+0x0/0x19b
 [<ffffffff882c666d>] :libiscsi2:iscsi_send_nopout+0x5b/0xc3
 [<ffffffff8006ec13>] do_timer_jiffy+0x9/0x1d
 [<ffffffff882c738d>] :libiscsi2:iscsi_check_transport_timeouts+0x129/0x19b
 [<ffffffff882c7264>] :libiscsi2:iscsi_check_transport_timeouts+0x0/0x19b
 [<ffffffff8009a091>] run_timer_softirq+0x18d/0x23a
 [<ffffffff8001251d>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d646>] do_softirq+0x2c/0x7d
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8000caf6>] __delay+0x8/0x10
 [<ffffffff882eb37c>] :be2iscsi:be_mbox_db_ready_wait+0xcd/0xf3
 [<ffffffff882eb402>] :be2iscsi:be_mbox_notify+0x60/0x141
 [<ffffffff882eb5a7>] :be2iscsi:beiscsi_cmd_q_destroy+0xc4/0xd4
 [<ffffffff882e5d8c>] :be2iscsi:hwi_cleanup+0x48/0x195
 [<ffffffff882e6015>] :be2iscsi:beiscsi_clean_port+0x13c/0x17a
 [<ffffffff882e616b>] :be2iscsi:beiscsi_quiesce+0x118/0x17c
 [<ffffffff882e6206>] :be2iscsi:beiscsi_shutdown+0x37/0x40
 [<ffffffff801d1bf0>] device_shutdown+0x56/0x88
 [<ffffffff8009da2e>] kernel_restart+0x9/0x46
 [<ffffffff8009dbb8>] sys_reboot+0x146/0x1c7
 [<ffffffff8003ac75>] hrtimer_try_to_cancel+0x4a/0x53
 [<ffffffff8005a085>] hrtimer_cancel+0xc/0x16
 [<ffffffff80063cf9>] do_nanosleep+0x47/0x70
 [<ffffffff80059f72>] hrtimer_nanosleep+0x58/0x118
 [<ffffffff800ba605>] audit_syscall_entry+0x1a8/0x1d3
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 b1 9b 2c 80 c2 5d 00 eb fe 0f b7 d6 89 f8 ef c3 89
RIP  [<ffffffff8015a776>] iowrite32+0x17/0x2d
 RSP <ffffffff804afdc8>
 <0>Kernel panic - not syncing: Fatal exception
======

If this is another issue, inform me to create new one.

Comment 2 Gris Ge 2011-12-06 03:30:17 UTC
Got stable reproduce (5 out of 5).:

https://beaker.engineering.redhat.com/jobs/166657

The same task is still running on RHEL 5.7 GA, will set regression later.

Requesting blocker as this bug is not race condition and cause kernel panic.

Mike,

Can you take a look on this bug? It looks the same with Bug #738934 in RHEL 6.
Let me know if you need the core dump.

Comment 3 Mike Christie 2011-12-06 22:42:45 UTC
Ah shoot, missed it by 1. Could you try the -99 kernel? The fix we did for Bug #738934 when in under bz
https://bugzilla.redhat.com/show_bug.cgi?id=744343
in kernel-2.6.18-299.el5.

Comment 4 Gris Ge 2011-12-09 09:00:28 UTC
Tried 3 times I/O stress on be2iscsi with kernel -300.

No kernel panic found.

CLOSE - CURRENT RELEASE.