Bug 1994594

Summary: bnxt_en nic: Kernel panic - not syncing: Fatal hardware error
Product: Red Hat Enterprise Linux Fast Datapath Reporter: liting <tli>
Component: openvswitch2.13Assignee: Mike Pattrick <mpattric>
Status: NEW --- QA Contact: liting <tli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 21.GCC: ctrautma, fleitner, jhsiao, ralongi
Target Milestone: ---Flags: fleitner: needinfo? (tli)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liting 2021-08-17 13:41:50 UTC
Description of problem:


Version-Release number of selected component (if applicable):
[root@netqe22 ~]# rpm -qa|grep openvswitch
python3-openvswitch2.13-2.13.0-120.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.13-2.13.0-120.el8fdp.x86_64
openvswitch2.13-test-2.13.0-120.el8fdp.noarch
kernel-kernel-networking-openvswitch-perf-1.0-139.noarch
openvswitch2.13-ipsec-2.13.0-120.el8fdp.x86_64

[root@netqe22 ~]# uname -r
4.18.0-193.19.1.el8_2.x86_64


How reproducible:


Steps to Reproduce:
1. Run ovs pvp kernel case on bnxt_en driver


Actual results:
job link:
https://beaker.engineering.redhat.com/jobs/5712308

The kernel panic as following.
[  315.307402] Kernel panic - not syncing: Fatal hardware error! 
[  315.307402] CPU: 0 PID: 44866 Comm: reboot Not tainted 4.18.0-193.19.1.el8_2.x86_64 #1 
[  315.307403] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.7.0 12/005/2017 
[  315.307403] Call Trace: 
[  315.307403]  <NMI> 
[  315.307403]  dump_stack+0x5c/0x80 
[  315.307404]  panic+0xe7/0x2a9 
[  315.307404]  __ghes_panic.cold.30+0x21/0x21 
[  315.307404]  ghes_notify_nmi+0x26b/0x310 
[  315.307404]  nmi_handle+0x63/0x110 
[  315.307405]  default_do_nmi+0x4e/0x100 
[  315.307405]  do_nmi+0x128/0x190 
[  315.307405]  end_repeat_nmi+0x16/0x6a 
[  315.307405] RIP: 0010:native_io_apic_read+0x32/0x40 
[  315.307406] Code: 81 c7 04 02 00 00 48 8d 04 c0 c1 e7 0c 8b 04 c5 34 18 bf b8 48 63 ff 25 ff 0f 00 00 48 2d 00 10 80 00 48 29 f8 89 30 8b 40 10 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 15 79 
[  315.307406] RSP: 0018:ffffb8a0097abd68 EFLAGS: 00000086 
[  315.307407] RAX: 0000000000050000 RBX: 0000000000000003 RCX: 0000000000000000 
[  315.307407] RDX: 0000000000000001 RSI: 0000000000000017 RDI: 0000000000204000 
[  315.307408] RBP: 0000000000000016 R08: 0000000000010000 R09: 0000000000000015 
[  315.307408] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 
[  315.307408] R13: 000000000000004b R14: 00000000fee1dead R15: 0000000000000000 
[  315.307408]  ? native_io_apic_read+0x32/0x40 
[  315.307409]  ? native_io_apic_read+0x32/0x40 
[  315.307409]  </NMI> 
[  315.307409]  __ioapic_read_entry+0x32/0x50 
[  315.307409]  ioapic_read_entry+0x27/0x50 
[  315.307410]  clear_IO_APIC_pin+0x15/0x110 
[  315.307410]  clear_IO_APIC+0x32/0x50 
[  315.307410]  native_machine_shutdown+0xa/0x40 
[  315.307410]  native_machine_restart+0x26/0x3c 
[  315.307411]  __do_sys_reboot+0x1d2/0x210 
[  315.307411]  ? syscall_trace_enter+0x1d3/0x2c0 
[  315.307411]  ? __audit_syscall_exit+0x249/0x2a0 
[  315.307411]  do_syscall_64+0x5b/0x1a0 
[  315.307412]  entry_SYSCALL_64_after_hwframe+0x65/0xca 
[  315.307412] RIP: 0033:0x7fcda423b847 
[  315.307412] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 11 86 2c 00 f7 d8 64 89 02 b8 
[  315.307413] RSP: 002b:00007ffebe54ea98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9 
[  315.307413] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcda423b847 
[  315.307414] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead 
[  315.307414] RBP: 00007ffebe54eae0 R08: 0000000000000002 R09: 0000000000000000 
[  315.307414] R10: 000000000000004b R11: 0000000000000246 R12: 0000000000000001 
[  315.307415] R13: 00000000fffffffe R14: 0000000000000006 R15: 0000000000000000 
[  315.571025] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 

Expected results:
No kernel panic

Additional info:
console log:
https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2021/08/57123/5712308/10504624/console.log

Comment 1 Flavio Leitner 2023-06-14 15:25:23 UTC
Thanks for reporting the bug.
The OVS 2.15 is EOL, so only critical fixes at this point.
Can you confirm if this happens with newer versions?
Thanks,
fbl