Bug 1071340

Summary:

FCoE target: kernel panic when initiator connects to target

Product:

Red Hat Enterprise Linux 7

Reporter:

Bruno Goncalves <bgoncalv>

Component:

kernel

Assignee:

Andy Grover <agrover>

kernel sub component:

Storage

QA Contact:

Bruno Goncalves <bgoncalv>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

high

Priority:

high

CC:

agrover, coughlan, dhoward, jkurik, qcai, revers

Version:

7.0

Keywords:

Reopened, Triaged, ZStream

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

kernel-3.10.0-125.el7

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1084646 (view as bug list)

Environment:

Last Closed:

2015-03-05 11:41:04 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1070517, 1070921, 1073810, 1077078, 1083244, 1084646, 1086308, 1088110, 1094654

Attachments:

Description	Flags
vmcore-dmesg	none
Comment	none

Description Bruno Goncalves 2014-02-28 14:32:45 UTC

Description of problem:
When FCoE initiator server is booting, it causes kernel panic on FCoE target server.

Version-Release number of selected component (if applicable):
3.10.0-97.el7.x86_64

targetcli-2.1.fb34-1.el7.noarch

# modinfo ixgbe
filename:       /lib/modules/3.10.0-97.el7.x86_64/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
version:        3.15.1-k


How reproducible:
sometimes

Steps to Reproduce:
1.Configure FCoE target to present a LUN to initiator
2.Power on initiator
3.kernel panic on server

[ 2457.927134] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 
[ 2457.962440] CPU: 2 PID: 1362 Comm: fcoethread/2 Not tainted 3.10.0-97.el7.x86_64 #1 
[ 2457.997955] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 
[ 2458.029072]  0000000000000000 ffff88020b446c68 ffffffff815c2e83 ffff88020b446ce0 
[ 2458.063218]  ffffffff815bcc4e 0000000000000010 ffff88020b446cf0 ffff88020b446c90 
[ 2458.099154]  0000000000000000 0000000000000002 0000000000000261 0000000000000002 
[ 2458.133645] Call Trace: 
[ 2458.145929]  <NMI>  [<ffffffff815c2e83>] dump_stack+0x19/0x1b 
[ 2458.173613]  [<ffffffff815bcc4e>] panic+0xc8/0x1d7 
[ 2458.196216]  [<ffffffff810ed9e0>] ? watchdog_enable_all_cpus.part.2+0x40/0x40 
[ 2458.229922]  [<ffffffff810edaa2>] watchdog_overflow_callback+0xc2/0xd0 
[ 2458.259868]  [<ffffffff8112d51e>] __perf_event_overflow+0x8e/0x230 
[ 2458.292100]  [<ffffffff8112c2e9>] ? perf_event_update_userpage+0x19/0x100 
[ 2458.324039]  [<ffffffff8112e094>] perf_event_overflow+0x14/0x20 
[ 2458.355086]  [<ffffffff8102867d>] intel_pmu_handle_irq+0x1bd/0x3c0 
[ 2458.384155]  [<ffffffff815cbf8b>] perf_event_nmi_handler+0x2b/0x50 
[ 2458.415420]  [<ffffffff815cb729>] nmi_handle.isra.0+0x59/0x90 
[ 2458.444751]  [<ffffffff815cb8c9>] do_nmi+0x169/0x340 
[ 2458.469814]  [<ffffffff815cabb1>] end_repeat_nmi+0x1e/0x2e 
[ 2458.499774]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.529466]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.560565]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.591734]  <<EOE>>  [<ffffffffa05fa680>] ft_acl_get+0x30/0x160 [tcm_fc] 
[ 2458.626535]  [<ffffffffa05fb547>] ft_prli+0x47/0x2c0 [tcm_fc] 
[ 2458.656563]  [<ffffffffa0447af3>] fc_rport_enter_prli+0xe3/0x2b0 [libfc] 
[ 2458.689046]  [<ffffffffa04493fb>] fc_rport_recv_req+0x53b/0x1280 [libfc] 
[ 2458.724200]  [<ffffffff8101a0b3>] ? native_sched_clock+0x13/0x80 
[ 2458.753588]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10 
[ 2458.782632]  [<ffffffffa0445068>] fc_lport_recv_els_req+0x78/0x150 [libfc] 
[ 2458.818103]  [<ffffffffa0443d0a>] fc_lport_recv_req+0x8a/0xd0 [libfc] 
[ 2458.851128]  [<ffffffffa0441513>] fc_exch_recv+0x413/0x640 [libfc] 
[ 2458.880624]  [<ffffffffa047b329>] fcoe_percpu_receive_thread+0x299/0x53c [fcoe] 
[ 2458.915983]  [<ffffffffa047b090>] ? fcoe_set_port_id+0x50/0x50 [fcoe] 
[ 2458.945653]  [<ffffffff8107fc10>] kthread+0xc0/0xd0 
[ 2458.968750]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2459.000551]  [<ffffffff815d2bec>] ret_from_fork+0x7c/0xb0 
[ 2459.025325]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2459.056905] drm_kms_helper: panic occurred, switching back to text console 
[ 2459.092846] ------------[ cut here ]------------ 
[ 2459.114816] WARNING: at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70() 
[ 2459.151968] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache tcm_fc target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_service_time bnx2fc cnic uio fcoe 8021q garp libfcoe stp mrp libfc llc scsi_transport_fc scsi_tgt sg coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support aesni_intel lrw gf128mul glue_helper ablk_helper cryptd microcode serio_raw e1000e pcspkr ixgbe lpc_ich mfd_core ptp mdio hpilo hpwdt pps_core dca shpchp ipmi_si ipmi_msghandler mperf nfsd auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci libahci libata i2c_core hpsa dm_mirror dm_region_hash dm_log dm_mod 
[ 2459.508241] CPU: 2 PID: 1362 Comm: fcoethread/2 Not tainted 3.10.0-97.el7.x86_64 #1 
[ 2459.546639] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 
[ 2459.579601]  0000000000000009 ffff88020b443d98 ffffffff815c2e83 ffff88020b443dd0 
[ 2459.614167]  ffffffff81059bd1 0000000000000000 ffff88020b454540 000000010020ecbc 
[ 2459.649782]  ffff88020b414540 0000000000000002 ffff88020b443de0 ffffffff81059caa 
[ 2459.684606] Call Trace: 
[ 2459.696722]  <IRQ>  [<ffffffff815c2e83>] dump_stack+0x19/0x1b 
[ 2459.723685]  [<ffffffff81059bd1>] warn_slowpath_common+0x61/0x80 
[ 2459.751411]  [<ffffffff81059caa>] warn_slowpath_null+0x1a/0x20 
[ 2459.779036]  [<ffffffff81036e5f>] native_smp_send_reschedule+0x5f/0x70 
[ 2459.808782]  [<ffffffff8109dd5d>] trigger_load_balance+0x16d/0x200 
[ 2459.838727]  [<ffffffff8108fe03>] scheduler_tick+0x103/0x150 
[ 2459.864726]  [<ffffffff8106aee6>] update_process_times+0x66/0x80 
[ 2459.893545]  [<ffffffff810b6835>] tick_sched_handle.isra.16+0x25/0x60 
[ 2459.923929]  [<ffffffff810b68b1>] tick_sched_timer+0x41/0x60 
[ 2459.950436]  [<ffffffff81083887>] __run_hrtimer+0x77/0x1d0 
[ 2459.976394]  [<ffffffff810b6870>] ? tick_sched_handle.isra.16+0x60/0x60 
[ 2460.007303]  [<ffffffff8108408f>] hrtimer_interrupt+0xef/0x230 
[ 2460.035256]  [<ffffffff81037f57>] local_apic_timer_interrupt+0x37/0x60 
[ 2460.065892]  [<ffffffff815d4faf>] smp_apic_timer_interrupt+0x3f/0x60 
[ 2460.096689]  [<ffffffff815d391d>] apic_timer_interrupt+0x6d/0x80 
[ 2460.125365]  <EOI>  <NMI>  [<ffffffff81085772>] ? up+0x32/0x50 
[ 2460.153627]  [<ffffffff815bcd19>] ? panic+0x193/0x1d7 
[ 2460.177540]  [<ffffffff815bcc83>] ? panic+0xfd/0x1d7 
[ 2460.200556]  [<ffffffff810ed9e0>] ? watchdog_enable_all_cpus.part.2+0x40/0x40 
[ 2460.234351]  [<ffffffff810edaa2>] watchdog_overflow_callback+0xc2/0xd0 
[ 2460.264699]  [<ffffffff8112d51e>] __perf_event_overflow+0x8e/0x230 
[ 2460.296476]  [<ffffffff8112c2e9>] ? perf_event_update_userpage+0x19/0x100 
[ 2460.327933]  [<ffffffff8112e094>] perf_event_overflow+0x14/0x20 
[ 2460.358240]  [<ffffffff8102867d>] intel_pmu_handle_irq+0x1bd/0x3c0 
[ 2460.387838]  [<ffffffff815cbf8b>] perf_event_nmi_handler+0x2b/0x50 
[ 2460.418081]  [<ffffffff815cb729>] nmi_handle.isra.0+0x59/0x90 
[ 2460.447745]  [<ffffffff815cb8c9>] do_nmi+0x169/0x340 
[ 2460.470891]  [<ffffffff815cabb1>] end_repeat_nmi+0x1e/0x2e 
[ 2460.499971]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.530675]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.561530]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.593590]  <<EOE>>  [<ffffffffa05fa680>] ft_acl_get+0x30/0x160 [tcm_fc] 
[ 2460.628198]  [<ffffffffa05fb547>] ft_prli+0x47/0x2c0 [tcm_fc] 
[ 2460.659372]  [<ffffffffa0447af3>] fc_rport_enter_prli+0xe3/0x2b0 [libfc] 
[ 2460.692845]  [<ffffffffa04493fb>] fc_rport_recv_req+0x53b/0x1280 [libfc] 
[ 2460.729646]  [<ffffffff8101a0b3>] ? native_sched_clock+0x13/0x80 
[ 2460.759043]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10 
[ 2460.786126]  [<ffffffffa0445068>] fc_lport_recv_els_req+0x78/0x150 [libfc] 
[ 2460.822090]  [<ffffffffa0443d0a>] fc_lport_recv_req+0x8a/0xd0 [libfc] 
[ 2460.855206]  [<ffffffffa0441513>] fc_exch_recv+0x413/0x640 [libfc] 
[ 2460.885149]  [<ffffffffa047b329>] fcoe_percpu_receive_thread+0x299/0x53c [fcoe] 
[ 2460.921712]  [<ffffffffa047b090>] ? fcoe_set_port_id+0x50/0x50 [fcoe] 
[ 2460.953981]  [<ffffffff8107fc10>] kthread+0xc0/0xd0 
[ 2460.976836]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2461.010859]  [<ffffffff815d2bec>] ret_from_fork+0x7c/0xb0 
[ 2461.038076]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2461.072842] ---[ end trace 874881bfbaa680ef ]---

Comment 5 Bruno Goncalves 2014-03-03 16:20:44 UTC

Created attachment 870027 [details]
vmcore-dmesg

Comment 8 Bruno Goncalves 2014-03-19 10:48:43 UTC

Created attachment 915871 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 12 Andy Grover 2014-04-04 17:32:53 UTC

have a proposed fix, pushing it upstream.

Comment 17 Jarod Wilson 2014-06-04 15:51:27 UTC

Patch(es) available on kernel-3.10.0-125.el7

Comment 19 Bruno Goncalves 2014-06-09 13:23:08 UTC

Reproduced on kernel 3.10.0-97.el7



Verified on 3.10.0-125.el7, more than 10 reboots and there was no crash.

Comment 20 Ludek Smid 2014-06-13 10:14:19 UTC

This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 21 Bruno Goncalves 2014-07-08 09:22:05 UTC

*** Bug 1099051 has been marked as a duplicate of this bug. ***

Comment 22 Jarod Wilson 2014-09-29 19:10:31 UTC

(In reply to Ludek Smid from comment #20)
> This request was resolved in Red Hat Enterprise Linux 7.0.
> 
> Contact your manager or support representative in case you have further
> questions about the request.

No it wasn't. 123.el7 was the 7.0 kernel, this went into a 7.1 kernel (125.el7).

Comment 24 Bruno Goncalves 2014-09-30 11:56:43 UTC

Verified since kernel -125

Comment 26 errata-xmlrpc 2015-03-05 11:41:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html