Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1994594

Summary: bnxt_en nic: Kernel panic - not syncing: Fatal hardware error
Product: Red Hat Enterprise Linux Fast Datapath Reporter: liting <tli>
Component: openvswitch2.13Assignee: Mike Pattrick <mpattric>
Status: CLOSED CURRENTRELEASE QA Contact: liting <tli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 21.GCC: ctrautma, fleitner, jhsiao, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-10-13 06:18:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liting 2021-08-17 13:41:50 UTC
Description of problem:


Version-Release number of selected component (if applicable):
[root@netqe22 ~]# rpm -qa|grep openvswitch
python3-openvswitch2.13-2.13.0-120.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.13-2.13.0-120.el8fdp.x86_64
openvswitch2.13-test-2.13.0-120.el8fdp.noarch
kernel-kernel-networking-openvswitch-perf-1.0-139.noarch
openvswitch2.13-ipsec-2.13.0-120.el8fdp.x86_64

[root@netqe22 ~]# uname -r
4.18.0-193.19.1.el8_2.x86_64


How reproducible:


Steps to Reproduce:
1. Run ovs pvp kernel case on bnxt_en driver


Actual results:
job link:
https://beaker.engineering.redhat.com/jobs/5712308

The kernel panic as following.
[  315.307402] Kernel panic - not syncing: Fatal hardware error! 
[  315.307402] CPU: 0 PID: 44866 Comm: reboot Not tainted 4.18.0-193.19.1.el8_2.x86_64 #1 
[  315.307403] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.7.0 12/005/2017 
[  315.307403] Call Trace: 
[  315.307403]  <NMI> 
[  315.307403]  dump_stack+0x5c/0x80 
[  315.307404]  panic+0xe7/0x2a9 
[  315.307404]  __ghes_panic.cold.30+0x21/0x21 
[  315.307404]  ghes_notify_nmi+0x26b/0x310 
[  315.307404]  nmi_handle+0x63/0x110 
[  315.307405]  default_do_nmi+0x4e/0x100 
[  315.307405]  do_nmi+0x128/0x190 
[  315.307405]  end_repeat_nmi+0x16/0x6a 
[  315.307405] RIP: 0010:native_io_apic_read+0x32/0x40 
[  315.307406] Code: 81 c7 04 02 00 00 48 8d 04 c0 c1 e7 0c 8b 04 c5 34 18 bf b8 48 63 ff 25 ff 0f 00 00 48 2d 00 10 80 00 48 29 f8 89 30 8b 40 10 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 15 79 
[  315.307406] RSP: 0018:ffffb8a0097abd68 EFLAGS: 00000086 
[  315.307407] RAX: 0000000000050000 RBX: 0000000000000003 RCX: 0000000000000000 
[  315.307407] RDX: 0000000000000001 RSI: 0000000000000017 RDI: 0000000000204000 
[  315.307408] RBP: 0000000000000016 R08: 0000000000010000 R09: 0000000000000015 
[  315.307408] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 
[  315.307408] R13: 000000000000004b R14: 00000000fee1dead R15: 0000000000000000 
[  315.307408]  ? native_io_apic_read+0x32/0x40 
[  315.307409]  ? native_io_apic_read+0x32/0x40 
[  315.307409]  </NMI> 
[  315.307409]  __ioapic_read_entry+0x32/0x50 
[  315.307409]  ioapic_read_entry+0x27/0x50 
[  315.307410]  clear_IO_APIC_pin+0x15/0x110 
[  315.307410]  clear_IO_APIC+0x32/0x50 
[  315.307410]  native_machine_shutdown+0xa/0x40 
[  315.307410]  native_machine_restart+0x26/0x3c 
[  315.307411]  __do_sys_reboot+0x1d2/0x210 
[  315.307411]  ? syscall_trace_enter+0x1d3/0x2c0 
[  315.307411]  ? __audit_syscall_exit+0x249/0x2a0 
[  315.307411]  do_syscall_64+0x5b/0x1a0 
[  315.307412]  entry_SYSCALL_64_after_hwframe+0x65/0xca 
[  315.307412] RIP: 0033:0x7fcda423b847 
[  315.307412] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 11 86 2c 00 f7 d8 64 89 02 b8 
[  315.307413] RSP: 002b:00007ffebe54ea98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9 
[  315.307413] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcda423b847 
[  315.307414] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead 
[  315.307414] RBP: 00007ffebe54eae0 R08: 0000000000000002 R09: 0000000000000000 
[  315.307414] R10: 000000000000004b R11: 0000000000000246 R12: 0000000000000001 
[  315.307415] R13: 00000000fffffffe R14: 0000000000000006 R15: 0000000000000000 
[  315.571025] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 

Expected results:
No kernel panic

Additional info:
console log:
https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2021/08/57123/5712308/10504624/console.log

Comment 1 Flavio Leitner 2023-06-14 15:25:23 UTC
Thanks for reporting the bug.
The OVS 2.15 is EOL, so only critical fixes at this point.
Can you confirm if this happens with newer versions?
Thanks,
fbl

Comment 2 Mike Pattrick 2023-09-18 17:58:04 UTC
Is this still occurring? I checked some recent bnxt_10g beaker jobs and couldn't find a similar kernel panic. However, many of the jobs have failed for other reasons.

Comment 3 liting 2023-10-12 06:41:23 UTC
(In reply to Mike Pattrick from comment #2)
> Is this still occurring? I checked some recent bnxt_10g beaker jobs and
> couldn't find a similar kernel panic. However, many of the jobs have failed
> for other reasons.

The netqe22 system has some issues with Beaker. Once it is fixed, I will run more jobs to see if there still has similar kernel panic. thanks

Comment 4 liting 2023-10-13 06:17:40 UTC
(In reply to Mike Pattrick from comment #2)
> Is this still occurring? I checked some recent bnxt_10g beaker jobs and
> couldn't find a similar kernel panic. However, many of the jobs have failed
> for other reasons.

Run rhel8.6 ovs2.17 on netqe22/netqe32, it has no panic. so close it
https://beaker.engineering.redhat.com/jobs/8422980