Bug 1273894 - [UPSTREAM] bnx2x_config_vlan_mac called a NULL function pointer
Summary: [UPSTREAM] bnx2x_config_vlan_mac called a NULL function pointer
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Schmidt
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-21 13:10 UTC by Otto Sabart
Modified: 2016-05-03 12:26 UTC (History)
18 users (show)

Fixed In Version: 4.4-rc2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-03 12:26:18 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
console log (45.90 KB, text/plain)
2015-10-21 13:10 UTC, Otto Sabart
no flags Details
vmcore-dmesg-kernel-4.3-rc7 (96.94 KB, text/plain)
2015-10-29 15:10 UTC, Otto Sabart
no flags Details
objdump -S bnx2x.ko (6.00 MB, text/plain)
2015-10-29 15:15 UTC, Otto Sabart
no flags Details
objdump -S bnx2x.ko on from latest net-next (9.65 MB, text/plain)
2015-11-06 15:25 UTC, Otto Sabart
no flags Details
lspci -vv (91.11 KB, text/plain)
2015-11-13 17:22 UTC, Otto Sabart
no flags Details
boot with bnx2x.debug=0x110032 parameter (444.21 KB, text/plain)
2015-11-13 17:25 UTC, Otto Sabart
no flags Details

Description Otto Sabart 2015-10-21 13:10:17 UTC
Created attachment 1085127 [details]
console log

Description of problem:
I observe these problems only on our hp-dl360g6 machines. Non-hp-dl360g6
machines working fine.

I am attaching a console log (from conserver) where you can see how I
reproduced this bug. At the beginning there are some basic outputs (cmdline,
uname, etc..). After I run our network performance test suit so you are able to
see what our test suite does. At the end is kernel panic trace (right after
adding new VLAN on bnx2x).

It is possible to _prepare_ and lend our hp-dl360 machine for testing.

Version-Release number of selected component (if applicable):
$ uname -a
Linux hp-dl360g6-01.rhts.eng.brq.redhat.com 4.3.0-0.rc5.git0.1.el7.x86_64 #1 SMP Mon Oct 12 12:39:06 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

$ yum info linux-firmware
Loaded plugins: product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name        : linux-firmware
Arch        : noarch
Version     : 20151012
Release     : 57.gitd82d3c1e.el7
Size        : 82 M
Repo        : installed
From repo   : /linux-firmware-20151012-57.gitd82d3c1e.el7.noarch
Summary     : Firmware files used by the Linux kernel
URL         : http://www.kernel.org/
License     : GPL+ and GPLv2+ and MIT and Redistributable, no modification
            : permitted
Description : This package includes firmware files required for some devices to
            : operate.

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.1 (Maipo)


How reproducible:
100%


Steps to Reproduce:
1. provision of RHEL-7.1 or RHEL-7.2
2. install 4.3.0-0.rc5.git0.1.el7.x86_64 and latest linux-firmware (linux-firmware-20151012-57.gitd82d3c1e.el7)
3. reboot
4. run our test suit (./runtest -c Setup --nosubmit --nosync) - I can prepare it for you


Actual results:
Kernel panic. All begins with:
BUG: unable to handle kernel NULL pointer dereference at (null)


Expected results:
No kernel panic. Working network.


Additional info:
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.3.0-0.rc5.git0.1.el7.x86_64 root=/dev/mapper/rhel_hp--dl360g6--01-root ro rd.lvm.lv=rhel_hp-dl360g6-01/swap rd.lvm.lv=rhel_hp-dl360g6-01/root crashkernel=auto console=ttyS1,115200 LANG=en_US.UTF-8 systemd.debug

Problem starts from 4.3.0-0.rc4.

I tried to install upstream kernel on RHEL-7.2-20151015.0 (Snap 5) with latest linux-firmware installed. Problem still persists.

Comment 2 Michal Schmidt 2015-10-21 13:50:13 UTC
First of all, thank you for testing current upstream kernels! It's great to catch bugs early.
On the other hand, filing such bugs under the RHEL product is bound to cause some confusion.
In the future, getting involved directly in upstream by reporting such bugs to the netdev mailing list and CCing the relevant maintainers would be better.
With that said...

> 2. install 4.3.0-0.rc5.git0.1.el7.x86_64

Where exactly did you get the package from?

> Problem starts from 4.3.0-0.rc4.

Does this mean 4.3.0-rc3 is not affected?
There were no bnx2x changes between -rc3 and -rc4.

The most likely commit to blame is from 4.3.0-rc1:

commit 05cc5a39ddb74dd81a716a45e67b938d8ebed463
Author: Yuval Mintz <Yuval.Mintz>
Date:   Wed Jul 29 15:52:46 2015 +0300

    bnx2x: add vlan filtering offload

Comment 3 Otto Sabart 2015-10-22 12:02:51 UTC
> In the future, getting involved directly in upstream by reporting such bugs to the netdev mailing list and CCing the relevant maintainers would be better.

Ok. I try to apply this procedure in the future.


>> 2. install 4.3.0-0.rc5.git0.1.el7.x86_64
> 
> Where exactly did you get the package from?

Justin Forbes creates upstream kernel scratch builds for us. Every time he
releases new build we copy it on our server. All of the builds you can find
here [0].

[0] http://perf-desktop.brq.redhat.com/Kernel/repo_urls


>> Problem starts from 4.3.0-0.rc4.
> 
> Does this mean 4.3.0-rc3 is not affected?
> There were no bnx2x changes between -rc3 and -rc4.
> 
> The most likely commit to blame is from 4.3.0-rc1:
> 
> commit 05cc5a39ddb74dd81a716a45e67b938d8ebed463
> Author: Yuval Mintz <Yuval.Mintz>
> Date:   Wed Jul 29 15:52:46 2015 +0300
> 
>     bnx2x: add vlan filtering offload

I've just successfully reproduced this bug on kernel-4.3.0-0.rc2.git0.1.

I can't test -rc1 due to BZ1264579 where system does not boot at all. For
kernel-4.2.0-1 all our tests passed.

Maybe, we can try to revert 05cc5a patch and see if bnx2x is going to work?

Comment 4 Adam Okuliar 2015-10-22 13:19:16 UTC
I discovered that adding any vlan sub-interface to bnx2x interface will trigger this problem. Simplest reproducer is:

1) Boot machine with stock configuration (Everything is as anaconda pre-configured it)
2) Try to add vlan subinterface to bnx2x interface:
ip link add link ens1f0 name ens1f0.10 type vlan id 10

...and kernel panics:

[root@hp-dl360g6-01 ~]# ip link add link ens1f0 name ens1f0.10 type vlan id 10
[  324.393759] 8021q: 802.1Q VLAN Support v1.8
[  324.411218] 8021q: adding VLAN 0 to HW filter on device ens2
[  324.438256] 8021q: adding VLAN 0 to HW filter on device ens1f0
[  324.466220] INFO: trying to register non-static key.
[  324.491168] the code is fine but needs lockdep annotation.
[  324.517617] turning off the locking correctness validator.
[  324.540242] CPU: 4 PID: 15349 Comm: modprobe Tainted: G          I     4.3.0-0.rc5.git0.1.el7.x86_64+debug #1
[  324.585158] Hardware name: Hewlett-Packard ProLiant DL360 G6, BIOS P64 06/02/2009
[  324.611197]  0000000000000000 0000000064852ed5 ffff88048633f978 ffffffff81390173
[  324.645130]  ffff88047e938000 ffff88048633f988 ffffffff811bbf37 ffff88048633fa10
[  324.680229]  ffffffff810e7cb5 ffffffff810e35a4 ffff88047e938000 0000000000000000
[  324.702269] Call Trace:
[  324.715463]  [<ffffffff81390173>] dump_stack+0x4b/0x68
[  324.729266]  [<ffffffff811bbf37>] register_lock_class.part.26+0x38/0x3c
[  324.738132]  [<ffffffff810e7cb5>] __lock_acquire+0x985/0xcc0
[  324.741161]  [<ffffffff810e35a4>] ? __lock_is_held+0x54/0x70
[  324.746240]  [<ffffffff810e8893>] lock_acquire+0xd3/0x1d0
[  324.757269]  [<ffffffffa0805841>] ? bnx2x_config_vlan_mac+0x211/0x3e0 [bnx2x]
[  324.786176]  [<ffffffff8177edc4>] _raw_spin_lock_bh+0x44/0x80
[  324.786376]  [<ffffffffa0805841>] ? bnx2x_config_vlan_mac+0x211/0x3e0 [bnx2x]
[  324.801193]  [<ffffffffa0805841>] bnx2x_config_vlan_mac+0x211/0x3e0 [bnx2x]
[  324.811206]  [<ffffffffa07b43cc>] bnx2x_set_vlan_one+0x5c/0x120 [bnx2x]
[  324.839137]  [<ffffffffa07b4502>] __bnx2x_vlan_configure_vid+0x72/0x80 [bnx2x]
[  324.855189]  [<ffffffffa07bbe22>] bnx2x_vlan_rx_add_vid+0xa2/0x220 [bnx2x]
[  324.879179]  [<ffffffff817607c2>] vlan_vid_add+0x1f2/0x280
[  324.888200]  [<ffffffffa01ef799>] vlan_device_event+0x139/0x6b0 [8021q]
[  324.918471]  [<ffffffff81635b6e>] register_netdevice_notifier+0x1ae/0x1f0
[  324.926148]  [<ffffffffa01f9000>] ? 0xffffffffa01f9000
[  324.947159]  [<ffffffffa01f903d>] vlan_proto_init+0x3d/0xb3 [8021q]
[  324.962224]  [<ffffffff8100213d>] do_one_initcall+0xcd/0x200
[  324.979179]  [<ffffffff81104a23>] ? rcu_read_lock_sched_held+0x93/0xa0
[  324.983148]  [<ffffffff812245c9>] ? kmem_cache_alloc_trace+0x239/0x310
[  325.016193]  [<ffffffff811bcfcb>] do_init_module+0x60/0x1ea
[  325.040244]  [<ffffffff8112f05b>] load_module+0x133b/0x1aa0
[  325.063184]  [<ffffffff8112b0b0>] ? __symbol_put+0x70/0x70
[  325.068180]  [<ffffffff81026909>] ? sched_clock+0x9/0x10
[  325.090228]  [<ffffffff810c520c>] ? local_clock+0x1c/0x30
[  325.092273]  [<ffffffff8112f918>] SyS_init_module+0x158/0x1a0
[  325.106518]  [<ffffffff8177fb72>] entry_SYSCALL_64_fastpath+0x12/0x76
[  325.133570] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  325.163181] IP: [<          (null)>]           (null)
[  325.166278] PGD 480dc0067 PUD 488d68067 PMD 0 
[  325.188316] Oops: 0010 [#1] SMP 
[  325.193617] Modules linked in: 8021q(+) garp mrp stp llc bnx2x coretemp ipmi_ssif iTCO_wdt gpio_ich iTCO_vendor_support kvm_intel kvm ipmi_si crc32c_intel serio_raw hpilo ipmi_msghandler hpwdt shpchp i7core_edac lpc_ich edac_core acpi_cpufreq xfs radeon ixgbe dca i2c_algo_bit mdio drm_kms_helper syscopyarea vxlan sysfillrect ip6_udp_tunnel sysimgblt fb_sys_fops udp_tunnel ttm sd_mod ptp drm bnx2 hpsa libcrc32c pps_core fjes dm_mirror dm_region_hash dm_log dm_mod [last unloaded: bnx2x]
[  325.356183] CPU: 4 PID: 15349 Comm: modprobe Tainted: G          I     4.3.0-0.rc5.git0.1.el7.x86_64+debug #1
[  325.374199] Hardware name: Hewlett-Packard ProLiant DL360 G6, BIOS P64 06/02/2009
[  325.392289] task: ffff88047e938000 ti: ffff88048633c000 task.ti: ffff88048633c000
[  325.397275] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
[  325.420230] RSP: 0018:ffff88048633fab0  EFLAGS: 00010246
[  325.428645] RAX: ffff88047e938000 RBX: ffff8804849201f8 RCX: 0000000000000000
[  325.442259] RDX: ffff8809046aab40 RSI: 0000000000000000 RDI: ffff880484850b00
[  325.482315] RBP: ffff88048633faf0 R08: 0000000000000000 R09: ffff8809046aab40
[  325.511260] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88048633fb00
[  325.526189] R13: 0000000000000000 R14: ffff880484850b00 R15: 0000000000000000
[  325.541224] FS:  00007f63cbc94740(0000) GS:ffff88048d200000(0000) knlGS:0000000000000000
[  325.549192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  325.552244] CR2: 0000000000000000 CR3: 0000000480dc2000 CR4: 00000000000006e0
[  325.558316] Stack:
[  325.562173]  ffffffffa080585d ffff880484920288 ffff8809046aab40 ffff880484850000
[  325.588273]  0000000000000001 ffff880484850b00 ffff880484850b00 0000000000000001
[  325.599342]  ffff88048633fb58 ffffffffa07b43cc ffff8804849201f8 0000000000000004
[  325.630260] Call Trace:
[  325.644181]  [<ffffffffa080585d>] ? bnx2x_config_vlan_mac+0x22d/0x3e0 [bnx2x]
[  325.684179]  [<ffffffffa07b43cc>] bnx2x_set_vlan_one+0x5c/0x120 [bnx2x]
[  325.689188]  [<ffffffffa07b4502>] __bnx2x_vlan_configure_vid+0x72/0x80 [bnx2x]
[  325.712357]  [<ffffffffa07bbe22>] bnx2x_vlan_rx_add_vid+0xa2/0x220 [bnx2x]
[  325.721187]  [<ffffffff817607c2>] vlan_vid_add+0x1f2/0x280
[  325.723195]  [<ffffffffa01ef799>] vlan_device_event+0x139/0x6b0 [8021q]
[  325.737254]  [<ffffffff81635b6e>] register_netdevice_notifier+0x1ae/0x1f0
[  325.746230]  [<ffffffffa01f9000>] ? 0xffffffffa01f9000
[  325.751234]  [<ffffffffa01f903d>] vlan_proto_init+0x3d/0xb3 [8021q]
[  325.769515]  [<ffffffff8100213d>] do_one_initcall+0xcd/0x200
[  325.773277]  [<ffffffff81104a23>] ? rcu_read_lock_sched_held+0x93/0xa0
[  325.778180]  [<ffffffff812245c9>] ? kmem_cache_alloc_trace+0x239/0x310
[  325.795300]  [<ffffffff811bcfcb>] do_init_module+0x60/0x1ea
[  325.799311]  [<ffffffff8112f05b>] load_module+0x133b/0x1aa0
[  325.806146]  [<ffffffff8112b0b0>] ? __symbol_put+0x70/0x70
[  325.808250]  [<ffffffff81026909>] ? sched_clock+0x9/0x10
[  325.818250]  [<ffffffff810c520c>] ? local_clock+0x1c/0x30
[  325.822253]  [<ffffffff8112f918>] SyS_init_module+0x158/0x1a0
[  325.829213]  [<ffffffff8177fb72>] entry_SYSCALL_64_fastpath+0x12/0x76
[  325.861256] Code:  Bad RIP value.
[  325.869322] RIP  [<          (null)>]           (null)
[  325.894311]  RSP <ffff88048633fab0>
[  325.895204] CR2: 0000000000000000
[  325.905681] ---[ end trace a0ce84d746687450 ]---
[  325.907264] Kernel panic - not syncing: Fatal exception in interrupt
[  325.918585] Kernel Offset: disabled
[  325.923273] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[  325.939365] ------------[ cut here ]------------
[  325.941211] WARNING: CPU: 4 PID: 15349 at arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x5d/0x60()
[  325.966173] Modules linked in: 8021q(+) garp mrp stp llc bnx2x coretemp ipmi_ssif iTCO_wdt gpio_ich iTCO_vendor_support kvm_intel kvm ipmi_si crc32c_intel serio_raw hpilo ipmi_msghandler hpwdt shpchp i7core_edac lpc_ich edac_core acpi_cpufreq xfs radeon ixgbe dca i2c_algo_bit mdio drm_kms_helper syscopyarea vxlan sysfillrect ip6_udp_tunnel sysimgblt fb_sys_fops udp_tunnel ttm sd_mod ptp drm bnx2 hpsa libcrc32c pps_core fjes dm_mirror dm_region_hash dm_log dm_mod [last unloaded: bnx2x]
[  326.133211] CPU: 4 PID: 15349 Comm: modprobe Tainted: G      D   I     4.3.0-0.rc5.git0.1.el7.x86_64+debug #1
[  326.143188] Hardware name: Hewlett-Packard ProLiant DL360 G6, BIOS P64 06/02/2009
[  326.181236]  0000000000000000 0000000064852ed5 ffff88048d203d88 ffffffff81390173
[  326.218224]  0000000000000000 ffff88048d203dc0 ffffffff810898a6 0000000000000000
[  326.248229]  0000000000000004 000000000000e228 0000000100005ea4 ffff88048cfd7a80
[  326.282202] Call Trace:
[  326.296199]  <IRQ>  [<ffffffff81390173>] dump_stack+0x4b/0x68
[  326.329236]  [<ffffffff810898a6>] warn_slowpath_common+0x86/0xc0
[  326.345217]  [<ffffffff810899ea>] warn_slowpath_null+0x1a/0x20
[  326.353304]  [<ffffffff8105569d>] native_smp_send_reschedule+0x5d/0x60
[  326.367308]  [<ffffffff810d4500>] trigger_load_balance+0x2c0/0x490
[  326.374211]  [<ffffffff810d42b4>] ? trigger_load_balance+0x74/0x490
[  326.408209]  [<ffffffff810bf5a2>] scheduler_tick+0xa2/0xe0
[  326.433238]  [<ffffffff811222f0>] ? tick_sched_do_timer+0x50/0x50
[  326.445237]  [<ffffffff81110ed1>] update_process_times+0x51/0x60
[  326.461274]  [<ffffffff81122005>] tick_sched_handle.isra.18+0x25/0x60
[  326.482277]  [<ffffffff81122330>] tick_sched_timer+0x40/0x70
[  326.497378]  [<ffffffff81112020>] __hrtimer_run_queues+0x130/0x4b0
[  326.525283]  [<ffffffff8111a425>] ? ktime_get_update_offsets_now+0xb5/0x170
[  326.530204]  [<ffffffff811125c7>] hrtimer_interrupt+0xb7/0x1d0
[  326.546262]  [<ffffffff81058245>] local_apic_timer_interrupt+0x35/0x60
[  326.560178]  [<ffffffff81782e0d>] smp_apic_timer_interrupt+0x3d/0x60
[  326.583234]  [<ffffffff817809dc>] apic_timer_interrupt+0x8c/0xa0
[  326.617267]  <EOI>  [<ffffffff811bb7bd>] ? panic+0x1d7/0x21f
[  326.645245]  [<ffffffff811bb7c1>] ? panic+0x1db/0x21f
[  326.671269]  [<ffffffff811bb7bd>] ? panic+0x1d7/0x21f
[  326.695194]  [<ffffffff8101ff2e>] oops_end+0xce/0xd0
[  326.710180]  [<ffffffff8106ed89>] no_context+0x139/0x3a0
[  326.740203]  [<ffffffff8106f104>] __bad_area_nosemaphore+0x114/0x210
[  326.741294]  [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
[  326.753206]  [<ffffffff8106f213>] bad_area_nosemaphore+0x13/0x20
[  326.770236]  [<ffffffff8106f47e>] __do_page_fault+0x9e/0x480
[  326.794215]  [<ffffffff8106f890>] do_page_fault+0x30/0x80
[  326.825228]  [<ffffffff81780717>] ? native_iret+0x7/0x7
[  326.834199]  [<ffffffff81781fb8>] page_fault+0x28/0x30
[  326.841179]  [<ffffffffa080585d>] ? bnx2x_config_vlan_mac+0x22d/0x3e0 [bnx2x]
[  326.871179]  [<ffffffffa07b43cc>] bnx2x_set_vlan_one+0x5c/0x120 [bnx2x]
[  326.878328]  [<ffffffffa07b4502>] __bnx2x_vlan_configure_vid+0x72/0x80 [bnx2x]
[  326.884321]  [<ffffffffa07bbe22>] bnx2x_vlan_rx_add_vid+0xa2/0x220 [bnx2x]
[  326.885331]  [<ffffffff817607c2>] vlan_vid_add+0x1f2/0x280
[  326.910231]  [<ffffffffa01ef799>] vlan_device_event+0x139/0x6b0 [8021q]
[  326.914167]  [<ffffffff81635b6e>] register_netdevice_notifier+0x1ae/0x1f0
[  326.938268]  [<ffffffffa01f9000>] ? 0xffffffffa01f9000
[  326.959276]  [<ffffffffa01f903d>] vlan_proto_init+0x3d/0xb3 [8021q]
[  326.971275]  [<ffffffff8100213d>] do_one_initcall+0xcd/0x200
[  326.987288]  [<ffffffff81104a23>] ? rcu_read_lock_sched_held+0x93/0xa0
[  327.011282]  [<ffffffff812245c9>] ? kmem_cache_alloc_trace+0x239/0x310
[  327.024296]  [<ffffffff811bcfcb>] do_init_module+0x60/0x1ea
[  327.046256]  [<ffffffff8112f05b>] load_module+0x133b/0x1aa0
[  327.058266]  [<ffffffff8112b0b0>] ? __symbol_put+0x70/0x70
[  327.077246]  [<ffffffff81026909>] ? sched_clock+0x9/0x10
[  327.091264]  [<ffffffff810c520c>] ? local_clock+0x1c/0x30
[  327.109187]  [<ffffffff8112f918>] SyS_init_module+0x158/0x1a0
[  327.125203]  [<ffffffff8177fb72>] entry_SYSCALL_64_fastpath+0x12/0x76
[  327.143305] ---[ end trace a0ce84d746687451 ]---

Comment 6 Yuval Mintz 2015-10-27 10:30:51 UTC
Hi,

Could you please attach the `objdump -S' output of the bnx2x.ko module loaded?

Thanks,
Yuval

Comment 7 Michal Schmidt 2015-10-27 10:54:59 UTC
I moved this bug to Fedora to avoid confusing RHEL Program Management. And Rawhide is actually on 4.3.0-rc kernels, so it's almost certainly affected.

Comment 8 Otto Sabart 2015-10-29 15:10:19 UTC
Created attachment 1087598 [details]
vmcore-dmesg-kernel-4.3-rc7

Comment 9 Otto Sabart 2015-10-29 15:15:33 UTC
Created attachment 1087599 [details]
objdump -S bnx2x.ko

I retested this bug on kernel 4.3-rc7 (on RHEL7.1) - still same issue.

Same problem on Fedora-rawhide. I can't even install it:

[  OK  ] Started Anaconda NetworkManager configuration.
         Starting Service enabling compressing RAM with zRam...
         Starting Network Manager...
Starting installer, one moment...
[   45.324182] INFO: trying to register non-static key.
[   45.341755] the code is fine but needs lockdep annotation.
[   45.348726] turning off the locking correctness validator.
[   45.377742] CPU: 7 PID: 1654 Comm: NetworkManager Tainted: G          I     4.3.0-0.rc5.git2.1.fc24.x86_64 #1
.........
.........
.........
.........
[   46.599795] BUG: unable to handle kernel NULL pointer dereference at           (null)


@Yuval: Hi Yuval, I am attaching the `objdump -S` of bnx2x.ko. It is from kernel 4.3-rc7. I hope it's what you wanted.

Comment 10 Yuval Mintz 2015-11-01 11:43:13 UTC
Hi,

Something doesn't add up between the objdump and the trace.

Trace shows:
[  496.994451]  [<ffffffffa025496d>] ? bnx2x_config_vlan_mac+0x22d/0x3e0 [bnx2x]
[  497.017427]  [<ffffffffa0203eec>] bnx2x_set_vlan_one+0x5c/0x120 [bnx2x]
[  497.032378]  [<ffffffffa0204022>] __bnx2x_vlan_configure_vid+0x72/0x80 [bnx2x]

And objdump shows:
00000000000686d0 <bnx2x_config_vlan_mac>:

The instructions where this supposedly has failed should have been 688fd,
but according to you parsing such an instruction doesn't exist:

   688f1:	00 00 00             	mov    %eax,%r15d
   688f4:	4c 89 f7 ff          	mov    -0x30(%rbp),%r9
   688f8:	93 f8                	je     6892b <storm_memset_cmng+0x688fb>
   688fa:	00 00 00 85 c0 41 89 	testb  $0x10,0x536(%r14)
   68901:	c7 
   68902:	4c 8b 4d d0 0f 85    	jne    68a03 <storm_memset_cmng+0x689d3>

Can this be reproduced on a vanilla kernel / net-next?
If so, I'll try to locally reproduce it.

Comment 11 Otto Sabart 2015-11-06 15:25:02 UTC
Created attachment 1090696 [details]
objdump -S bnx2x.ko on from latest net-next

Hi,

> Can this be reproduced on a vanilla kernel / net-next?
> If so, I'll try to locally reproduce it.

Yes, it is reproducible on latest net-next.


> Something doesn't add up between the objdump and the trace.

> Trace shows:
> [  496.994451]  [<ffffffffa025496d>] ? bnx2x_config_vlan_mac+0x22d/0x3e0 [bnx2x]
> [  497.017427]  [<ffffffffa0203eec>] bnx2x_set_vlan_one+0x5c/0x120 [bnx2x]
> [  497.032378]  [<ffffffffa0204022>] __bnx2x_vlan_configure_vid+0x72/0x80 [bnx2x]
> 
> And objdump shows:
> 00000000000686d0 <bnx2x_config_vlan_mac>:
> 
> The instructions where this supposedly has failed should have been 688fd,
> but according to you parsing such an instruction doesn't exist:
> 
>    688f1:	00 00 00             	mov    %eax,%r15d
>    688f4:	4c 89 f7 ff          	mov    -0x30(%rbp),%r9
>    688f8:	93 f8                	je     6892b <storm_memset_cmng+0x688fb>
>    688fa:	00 00 00 85 c0 41 89 	testb  $0x10,0x536(%r14)
>    68901:	c7 
>    68902:	4c 8b 4d d0 0f 85    	jne    68a03 <storm_memset_cmng+0x689d3>


I tried to generate objdump once again on net-next (1b1050). Now it should be ok:

$ objdump -S bnx2x.ko:
0000000000067180 <bnx2x_config_vlan_mac>:

Trace:
[  168.438952]  [<ffffffffa037e3ad>] ? bnx2x_config_vlan_mac+0x22d/0x3e0 [bnx2x]


So, 0x67180 + 0x22d = 0x673ad:

            /* Try to cancel this element queue */
            rc = o->optimize(bp, o->owner, elem);
6739a:       4c 89 ca                mov    %r9,%rdx
6739d:       48 8b b3 98 00 00 00    mov    0x98(%rbx),%rsi
673a4:       4c 89 f7                mov    %r14,%rdi
673a7:       ff 93 b0 00 00 00       callq  *0xb0(%rbx)
            if (rc)
673ad:       85 c0                   test   %eax,%eax

    spin_lock_bh(&o->lock);

    if (!restore) {
            /* Try to cancel this element queue */
            rc = o->optimize(bp, o->owner, elem);
673af:       41 89 c7                mov    %eax,%r15d
            if (rc)
673b2:       4c 8b 4d d0             mov    -0x30(%rbp),%r9
673b6:       0f 85 8a 00 00 00       jne    67446 <bnx2x_config_vlan_mac+0x2c6>
                    goto free_and_exit;


Am I right? I am attaching newer objdump output of bnx2x.

Comment 12 Yuval Mintz 2015-11-08 08:20:05 UTC
Otto,

One last question - could you share some information on the adapter you have attached?

`ethtool -i' and `lspci -vv' would prove useful.
If possible, please also supply system logs when machine is booted with bnx2x.debug=0x110032.

Thanks,
Yuval

Comment 13 Otto Sabart 2015-11-13 17:22:39 UTC
Created attachment 1093748 [details]
lspci -vv

Comment 14 Otto Sabart 2015-11-13 17:25:44 UTC
Created attachment 1093749 [details]
boot with bnx2x.debug=0x110032 parameter

Comment 15 Otto Sabart 2015-11-13 17:30:47 UTC
Hi,

`ethtool -i ens1f0`:
driver: bnx2x
version: 1.712.30-0
firmware-version: bc 5.0.11 phy aa0.406
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Outputs of lspci and dmesg are attached.

Thank you,
Ota

Comment 16 Yuval Mintz 2015-11-15 13:06:05 UTC
Hi Otto,

Thanks for taking the time and effort in pin-pointing this one.
It's an actual regressions for 57710, 57711 adapters.

I've just sent a patch to linux-net that should fix it -
https://patchwork.ozlabs.org/patch/544838/

I'll update once it's accepted.

Thanks,
Yuval

Comment 17 Yuval Mintz 2015-11-19 16:18:03 UTC
Fix was applied to `linux-net' - commit ab6d7846cf80.

Comment 18 Otto Sabart 2015-11-22 00:36:53 UTC
Thank you! I'll test it with the new kernel RC and I'll let know if your patch fixed the issue.

Comment 19 Otto Sabart 2015-11-24 13:09:40 UTC
Patch verified. VLANs on bnx2x are working again.

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)

$ uname -a
Linux hp-dl360g6-01.rhts.eng.brq.redhat.com 4.4.0-0.rc2.git0.1.el7.x86_64 #1 SMP Mon Nov 23 15:35:16 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

$ ip l:
2: bnx2_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1280 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:26:55:1a:87:44 brd ff:ff:ff:ff:ff:ff

$ ip link add link bnx2x_0 name bnx2x_0.10 type vlan id 100

$ ip l:
10: bnx2x_0.10@bnx2x_0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 00:10:18:98:c8:90 brd ff:ff:ff:ff:ff:ff

$ ethtool -i bnx2x_0
driver: bnx2x
version: 1.712.30-0
firmware-version: bc 5.0.11 phy aa0.406
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


Note You need to log in before you can comment on or make changes to this bug.