Bug 1994594 - bnxt_en nic: Kernel panic - not syncing: Fatal hardware error [NEEDINFO]
Summary: bnxt_en nic: Kernel panic - not syncing: Fatal hardware error
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch2.13
Version: FDP 21.G
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Mike Pattrick
QA Contact: liting
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-17 13:41 UTC by liting
Modified: 2023-07-13 07:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:
fleitner: needinfo? (tli)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1490 0 None None None 2021-08-17 13:42:03 UTC

Description liting 2021-08-17 13:41:50 UTC
Description of problem:


Version-Release number of selected component (if applicable):
[root@netqe22 ~]# rpm -qa|grep openvswitch
python3-openvswitch2.13-2.13.0-120.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.13-2.13.0-120.el8fdp.x86_64
openvswitch2.13-test-2.13.0-120.el8fdp.noarch
kernel-kernel-networking-openvswitch-perf-1.0-139.noarch
openvswitch2.13-ipsec-2.13.0-120.el8fdp.x86_64

[root@netqe22 ~]# uname -r
4.18.0-193.19.1.el8_2.x86_64


How reproducible:


Steps to Reproduce:
1. Run ovs pvp kernel case on bnxt_en driver


Actual results:
job link:
https://beaker.engineering.redhat.com/jobs/5712308

The kernel panic as following.
[  315.307402] Kernel panic - not syncing: Fatal hardware error! 
[  315.307402] CPU: 0 PID: 44866 Comm: reboot Not tainted 4.18.0-193.19.1.el8_2.x86_64 #1 
[  315.307403] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.7.0 12/005/2017 
[  315.307403] Call Trace: 
[  315.307403]  <NMI> 
[  315.307403]  dump_stack+0x5c/0x80 
[  315.307404]  panic+0xe7/0x2a9 
[  315.307404]  __ghes_panic.cold.30+0x21/0x21 
[  315.307404]  ghes_notify_nmi+0x26b/0x310 
[  315.307404]  nmi_handle+0x63/0x110 
[  315.307405]  default_do_nmi+0x4e/0x100 
[  315.307405]  do_nmi+0x128/0x190 
[  315.307405]  end_repeat_nmi+0x16/0x6a 
[  315.307405] RIP: 0010:native_io_apic_read+0x32/0x40 
[  315.307406] Code: 81 c7 04 02 00 00 48 8d 04 c0 c1 e7 0c 8b 04 c5 34 18 bf b8 48 63 ff 25 ff 0f 00 00 48 2d 00 10 80 00 48 29 f8 89 30 8b 40 10 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 15 79 
[  315.307406] RSP: 0018:ffffb8a0097abd68 EFLAGS: 00000086 
[  315.307407] RAX: 0000000000050000 RBX: 0000000000000003 RCX: 0000000000000000 
[  315.307407] RDX: 0000000000000001 RSI: 0000000000000017 RDI: 0000000000204000 
[  315.307408] RBP: 0000000000000016 R08: 0000000000010000 R09: 0000000000000015 
[  315.307408] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 
[  315.307408] R13: 000000000000004b R14: 00000000fee1dead R15: 0000000000000000 
[  315.307408]  ? native_io_apic_read+0x32/0x40 
[  315.307409]  ? native_io_apic_read+0x32/0x40 
[  315.307409]  </NMI> 
[  315.307409]  __ioapic_read_entry+0x32/0x50 
[  315.307409]  ioapic_read_entry+0x27/0x50 
[  315.307410]  clear_IO_APIC_pin+0x15/0x110 
[  315.307410]  clear_IO_APIC+0x32/0x50 
[  315.307410]  native_machine_shutdown+0xa/0x40 
[  315.307410]  native_machine_restart+0x26/0x3c 
[  315.307411]  __do_sys_reboot+0x1d2/0x210 
[  315.307411]  ? syscall_trace_enter+0x1d3/0x2c0 
[  315.307411]  ? __audit_syscall_exit+0x249/0x2a0 
[  315.307411]  do_syscall_64+0x5b/0x1a0 
[  315.307412]  entry_SYSCALL_64_after_hwframe+0x65/0xca 
[  315.307412] RIP: 0033:0x7fcda423b847 
[  315.307412] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 11 86 2c 00 f7 d8 64 89 02 b8 
[  315.307413] RSP: 002b:00007ffebe54ea98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9 
[  315.307413] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcda423b847 
[  315.307414] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead 
[  315.307414] RBP: 00007ffebe54eae0 R08: 0000000000000002 R09: 0000000000000000 
[  315.307414] R10: 000000000000004b R11: 0000000000000246 R12: 0000000000000001 
[  315.307415] R13: 00000000fffffffe R14: 0000000000000006 R15: 0000000000000000 
[  315.571025] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 

Expected results:
No kernel panic

Additional info:
console log:
https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2021/08/57123/5712308/10504624/console.log

Comment 1 Flavio Leitner 2023-06-14 15:25:23 UTC
Thanks for reporting the bug.
The OVS 2.15 is EOL, so only critical fixes at this point.
Can you confirm if this happens with newer versions?
Thanks,
fbl


Note You need to log in before you can comment on or make changes to this bug.