Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1071340 - FCoE target: kernel panic when initiator connects to target
FCoE target: kernel panic when initiator connects to target
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.0
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Andy Grover
Bruno Goncalves
: Reopened, ZStream
: 1099051 (view as bug list)
Depends On:
Blocks: 1070921 1073810 1077078 1094654 1070517 1083244 1084646 1086308 1088110
  Show dependency treegraph
 
Reported: 2014-02-28 09:32 EST by Bruno Goncalves
Modified: 2015-03-05 06:41 EST (History)
6 users (show)

See Also:
Fixed In Version: kernel-3.10.0-125.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1084646 (view as bug list)
Environment:
Last Closed: 2015-03-05 06:41:04 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vmcore-dmesg (92.79 KB, text/plain)
2014-03-03 11:20 EST, Bruno Goncalves
no flags Details
Comment (84.44 KB, text/plain)
2014-03-19 06:48 EDT, Bruno Goncalves
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0290 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2015-03-05 11:13:58 EST

  None (edit)
Description Bruno Goncalves 2014-02-28 09:32:45 EST
Description of problem:
When FCoE initiator server is booting, it causes kernel panic on FCoE target server.

Version-Release number of selected component (if applicable):
3.10.0-97.el7.x86_64

targetcli-2.1.fb34-1.el7.noarch

# modinfo ixgbe
filename:       /lib/modules/3.10.0-97.el7.x86_64/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
version:        3.15.1-k


How reproducible:
sometimes

Steps to Reproduce:
1.Configure FCoE target to present a LUN to initiator
2.Power on initiator
3.kernel panic on server

[ 2457.927134] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 
[ 2457.962440] CPU: 2 PID: 1362 Comm: fcoethread/2 Not tainted 3.10.0-97.el7.x86_64 #1 
[ 2457.997955] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 
[ 2458.029072]  0000000000000000 ffff88020b446c68 ffffffff815c2e83 ffff88020b446ce0 
[ 2458.063218]  ffffffff815bcc4e 0000000000000010 ffff88020b446cf0 ffff88020b446c90 
[ 2458.099154]  0000000000000000 0000000000000002 0000000000000261 0000000000000002 
[ 2458.133645] Call Trace: 
[ 2458.145929]  <NMI>  [<ffffffff815c2e83>] dump_stack+0x19/0x1b 
[ 2458.173613]  [<ffffffff815bcc4e>] panic+0xc8/0x1d7 
[ 2458.196216]  [<ffffffff810ed9e0>] ? watchdog_enable_all_cpus.part.2+0x40/0x40 
[ 2458.229922]  [<ffffffff810edaa2>] watchdog_overflow_callback+0xc2/0xd0 
[ 2458.259868]  [<ffffffff8112d51e>] __perf_event_overflow+0x8e/0x230 
[ 2458.292100]  [<ffffffff8112c2e9>] ? perf_event_update_userpage+0x19/0x100 
[ 2458.324039]  [<ffffffff8112e094>] perf_event_overflow+0x14/0x20 
[ 2458.355086]  [<ffffffff8102867d>] intel_pmu_handle_irq+0x1bd/0x3c0 
[ 2458.384155]  [<ffffffff815cbf8b>] perf_event_nmi_handler+0x2b/0x50 
[ 2458.415420]  [<ffffffff815cb729>] nmi_handle.isra.0+0x59/0x90 
[ 2458.444751]  [<ffffffff815cb8c9>] do_nmi+0x169/0x340 
[ 2458.469814]  [<ffffffff815cabb1>] end_repeat_nmi+0x1e/0x2e 
[ 2458.499774]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.529466]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.560565]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.591734]  <<EOE>>  [<ffffffffa05fa680>] ft_acl_get+0x30/0x160 [tcm_fc] 
[ 2458.626535]  [<ffffffffa05fb547>] ft_prli+0x47/0x2c0 [tcm_fc] 
[ 2458.656563]  [<ffffffffa0447af3>] fc_rport_enter_prli+0xe3/0x2b0 [libfc] 
[ 2458.689046]  [<ffffffffa04493fb>] fc_rport_recv_req+0x53b/0x1280 [libfc] 
[ 2458.724200]  [<ffffffff8101a0b3>] ? native_sched_clock+0x13/0x80 
[ 2458.753588]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10 
[ 2458.782632]  [<ffffffffa0445068>] fc_lport_recv_els_req+0x78/0x150 [libfc] 
[ 2458.818103]  [<ffffffffa0443d0a>] fc_lport_recv_req+0x8a/0xd0 [libfc] 
[ 2458.851128]  [<ffffffffa0441513>] fc_exch_recv+0x413/0x640 [libfc] 
[ 2458.880624]  [<ffffffffa047b329>] fcoe_percpu_receive_thread+0x299/0x53c [fcoe] 
[ 2458.915983]  [<ffffffffa047b090>] ? fcoe_set_port_id+0x50/0x50 [fcoe] 
[ 2458.945653]  [<ffffffff8107fc10>] kthread+0xc0/0xd0 
[ 2458.968750]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2459.000551]  [<ffffffff815d2bec>] ret_from_fork+0x7c/0xb0 
[ 2459.025325]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2459.056905] drm_kms_helper: panic occurred, switching back to text console 
[ 2459.092846] ------------[ cut here ]------------ 
[ 2459.114816] WARNING: at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70() 
[ 2459.151968] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache tcm_fc target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_service_time bnx2fc cnic uio fcoe 8021q garp libfcoe stp mrp libfc llc scsi_transport_fc scsi_tgt sg coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support aesni_intel lrw gf128mul glue_helper ablk_helper cryptd microcode serio_raw e1000e pcspkr ixgbe lpc_ich mfd_core ptp mdio hpilo hpwdt pps_core dca shpchp ipmi_si ipmi_msghandler mperf nfsd auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci libahci libata i2c_core hpsa dm_mirror dm_region_hash dm_log dm_mod 
[ 2459.508241] CPU: 2 PID: 1362 Comm: fcoethread/2 Not tainted 3.10.0-97.el7.x86_64 #1 
[ 2459.546639] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 
[ 2459.579601]  0000000000000009 ffff88020b443d98 ffffffff815c2e83 ffff88020b443dd0 
[ 2459.614167]  ffffffff81059bd1 0000000000000000 ffff88020b454540 000000010020ecbc 
[ 2459.649782]  ffff88020b414540 0000000000000002 ffff88020b443de0 ffffffff81059caa 
[ 2459.684606] Call Trace: 
[ 2459.696722]  <IRQ>  [<ffffffff815c2e83>] dump_stack+0x19/0x1b 
[ 2459.723685]  [<ffffffff81059bd1>] warn_slowpath_common+0x61/0x80 
[ 2459.751411]  [<ffffffff81059caa>] warn_slowpath_null+0x1a/0x20 
[ 2459.779036]  [<ffffffff81036e5f>] native_smp_send_reschedule+0x5f/0x70 
[ 2459.808782]  [<ffffffff8109dd5d>] trigger_load_balance+0x16d/0x200 
[ 2459.838727]  [<ffffffff8108fe03>] scheduler_tick+0x103/0x150 
[ 2459.864726]  [<ffffffff8106aee6>] update_process_times+0x66/0x80 
[ 2459.893545]  [<ffffffff810b6835>] tick_sched_handle.isra.16+0x25/0x60 
[ 2459.923929]  [<ffffffff810b68b1>] tick_sched_timer+0x41/0x60 
[ 2459.950436]  [<ffffffff81083887>] __run_hrtimer+0x77/0x1d0 
[ 2459.976394]  [<ffffffff810b6870>] ? tick_sched_handle.isra.16+0x60/0x60 
[ 2460.007303]  [<ffffffff8108408f>] hrtimer_interrupt+0xef/0x230 
[ 2460.035256]  [<ffffffff81037f57>] local_apic_timer_interrupt+0x37/0x60 
[ 2460.065892]  [<ffffffff815d4faf>] smp_apic_timer_interrupt+0x3f/0x60 
[ 2460.096689]  [<ffffffff815d391d>] apic_timer_interrupt+0x6d/0x80 
[ 2460.125365]  <EOI>  <NMI>  [<ffffffff81085772>] ? up+0x32/0x50 
[ 2460.153627]  [<ffffffff815bcd19>] ? panic+0x193/0x1d7 
[ 2460.177540]  [<ffffffff815bcc83>] ? panic+0xfd/0x1d7 
[ 2460.200556]  [<ffffffff810ed9e0>] ? watchdog_enable_all_cpus.part.2+0x40/0x40 
[ 2460.234351]  [<ffffffff810edaa2>] watchdog_overflow_callback+0xc2/0xd0 
[ 2460.264699]  [<ffffffff8112d51e>] __perf_event_overflow+0x8e/0x230 
[ 2460.296476]  [<ffffffff8112c2e9>] ? perf_event_update_userpage+0x19/0x100 
[ 2460.327933]  [<ffffffff8112e094>] perf_event_overflow+0x14/0x20 
[ 2460.358240]  [<ffffffff8102867d>] intel_pmu_handle_irq+0x1bd/0x3c0 
[ 2460.387838]  [<ffffffff815cbf8b>] perf_event_nmi_handler+0x2b/0x50 
[ 2460.418081]  [<ffffffff815cb729>] nmi_handle.isra.0+0x59/0x90 
[ 2460.447745]  [<ffffffff815cb8c9>] do_nmi+0x169/0x340 
[ 2460.470891]  [<ffffffff815cabb1>] end_repeat_nmi+0x1e/0x2e 
[ 2460.499971]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.530675]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.561530]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.593590]  <<EOE>>  [<ffffffffa05fa680>] ft_acl_get+0x30/0x160 [tcm_fc] 
[ 2460.628198]  [<ffffffffa05fb547>] ft_prli+0x47/0x2c0 [tcm_fc] 
[ 2460.659372]  [<ffffffffa0447af3>] fc_rport_enter_prli+0xe3/0x2b0 [libfc] 
[ 2460.692845]  [<ffffffffa04493fb>] fc_rport_recv_req+0x53b/0x1280 [libfc] 
[ 2460.729646]  [<ffffffff8101a0b3>] ? native_sched_clock+0x13/0x80 
[ 2460.759043]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10 
[ 2460.786126]  [<ffffffffa0445068>] fc_lport_recv_els_req+0x78/0x150 [libfc] 
[ 2460.822090]  [<ffffffffa0443d0a>] fc_lport_recv_req+0x8a/0xd0 [libfc] 
[ 2460.855206]  [<ffffffffa0441513>] fc_exch_recv+0x413/0x640 [libfc] 
[ 2460.885149]  [<ffffffffa047b329>] fcoe_percpu_receive_thread+0x299/0x53c [fcoe] 
[ 2460.921712]  [<ffffffffa047b090>] ? fcoe_set_port_id+0x50/0x50 [fcoe] 
[ 2460.953981]  [<ffffffff8107fc10>] kthread+0xc0/0xd0 
[ 2460.976836]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2461.010859]  [<ffffffff815d2bec>] ret_from_fork+0x7c/0xb0 
[ 2461.038076]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2461.072842] ---[ end trace 874881bfbaa680ef ]---
Comment 5 Bruno Goncalves 2014-03-03 11:20:44 EST
Created attachment 870027 [details]
vmcore-dmesg
Comment 8 Bruno Goncalves 2014-03-19 06:48:43 EDT
Created attachment 915871 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
Comment 12 Andy Grover 2014-04-04 13:32:53 EDT
have a proposed fix, pushing it upstream.
Comment 17 Jarod Wilson 2014-06-04 11:51:27 EDT
Patch(es) available on kernel-3.10.0-125.el7
Comment 19 Bruno Goncalves 2014-06-09 09:23:08 EDT
Reproduced on kernel 3.10.0-97.el7



Verified on 3.10.0-125.el7, more than 10 reboots and there was no crash.
Comment 20 Ludek Smid 2014-06-13 06:14:19 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.
Comment 21 Bruno Goncalves 2014-07-08 05:22:05 EDT
*** Bug 1099051 has been marked as a duplicate of this bug. ***
Comment 22 Jarod Wilson 2014-09-29 15:10:31 EDT
(In reply to Ludek Smid from comment #20)
> This request was resolved in Red Hat Enterprise Linux 7.0.
> 
> Contact your manager or support representative in case you have further
> questions about the request.

No it wasn't. 123.el7 was the 7.0 kernel, this went into a 7.1 kernel (125.el7).
Comment 24 Bruno Goncalves 2014-09-30 07:56:43 EDT
Verified since kernel -125
Comment 26 errata-xmlrpc 2015-03-05 06:41:04 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html

Note You need to log in before you can comment on or make changes to this bug.