Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2183908

Summary: [RHCOS9.2] openshift 4.13 kernel panic
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Eran Ifrach <eifrach>
Component: openvswitch3.1Assignee: Aaron Conole <aconole>
Status: CLOSED EOL QA Contact: ovs-qe
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: RHEL 9.0CC: ctrautma, fdupont, fleitner, jhsiao, mcornea, pablo.iranzo, ralongi, sasha
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-10-08 17:49:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eran Ifrach 2023-04-03 04:51:39 UTC
hey Team,

i've deployed openshift4.13 on RHCOS9.2 (arm) 
and deployed the SRIOV operator without any configuration

i experience random kernel panic without any workloads ( default ocp deployment)


error:
[ 2793.867982] Unable to handle kernel paging request at virtual address ffff45aab3683000
[ 2793.875913] Mem abort info:
[ 2793.878702]   ESR = 0x0000000096000004
[ 2793.882446]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 2793.887749]   SET = 0, FnV = 0
[ 2793.890796]   EA = 0, S1PTW = 0
[ 2793.893935]   FSC = 0x04: level 0 translation fault
[ 2793.898809] Data abort info:
[ 2793.901683]   ISV = 0, ISS = 0x00000004
[ 2793.905511]   CM = 0, WnR = 0
[ 2793.908474] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000081f8ca8c000
[ 2793.915170] [ffff45aab3683000] pgd=0000000000000000, p4d=0000000000000000
[ 2793.921957] Internal error: Oops: 96000004 [#1] SMP
[ 2793.926823] Modules linked in: vhost_net vhost vhost_iotlb tap tun veth nf_conntrack_netlink ipt_REJECT nf_reject_ipv4 xt_nat xt_CT xt_MASQUERADE nft_chain_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables rfkill nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay ext4 mbcache jbd2 mlx5_ib ib_uverbs ast acpi_ipmi arm_spe_pmu drm_shmem_helper ipmi_ssif ib_core drm_kms_helper fb_sys_fops syscopyarea sysfillrect sysimgblt ipmi_devintf arm_cmn arm_dmc620_pmu ipmi_msghandler arm_dsu_pmu cppc_cpufreq sctp ip6_udp_tunnel udp_tunnel ip_tables drm xfs libcrc32c crct10dif_ce ghash_ce mlx5_core sha2_ce sha256_arm64 sha1_ce nvme_tcp nvme_fabrics sbsa_gwdt mlxfw psample nvme tls nvme_core pci_hyperv_intf nvme_common igb i2c_algo_bit xgene_hwmon i2c_designware_platform i2c_designware_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
[ 2794.007348] CPU: 0 PID: 92093 Comm: kworker/0:1 Kdump: loaded Not tainted 5.14.0-285.el9.aarch64 #1
[ 2794.016381] Hardware name: GIGABYTE G242-P34-00/MP32-AR2-00, BIOS F31L (SCP: 2.10.20220531) 09/29/2022
[ 2794.025672] Workqueue: ipv6_addrconf addrconf_dad_work
[ 2794.030803] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2794.037752] pc : ovs_dp_upcall+0x98/0x1d0 [openvswitch]
[ 2794.042973] lr : ovs_dp_upcall+0xc0/0x1d0 [openvswitch]
[ 2794.048190] sp : ffff80004bd1b3e0
[ 2794.051492] x29: ffff80004bd1b3e0 x28: 0000000000000000 x27: 0000000000000000
[ 2794.058616] x26: 000000000000a888 x25: ffff080e729ff300 x24: ffff080bad30aa00
[ 2794.065740] x23: ffff80004bd1b4c8 x22: 0000000000000000 x21: ffff07ff937a0b00
[ 2794.072862] x20: ffff80004bd1b470 x19: ffff080e729ff300 x18: 0000000000000000
[ 2794.079984] x17: 80fe00000000e922 x16: ffffc293ab1bd540 x15: 0000000000000000
[ 2794.087107] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001
[ 2794.094229] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc29397393688
[ 2794.101351] x8 : 0000000000000040 x7 : 000000000000003f x6 : ffff80002c41ba10
[ 2794.108473] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 2794.115595] x2 : 0000000000000001 x1 : ffff45aab3683000 x0 : 0000000000000000
[ 2794.122717] Call trace:
[ 2794.125152]  ovs_dp_upcall+0x98/0x1d0 [openvswitch]
[ 2794.130022]  ovs_dp_process_packet+0x170/0x224 [openvswitch]
[ 2794.135672]  ovs_vport_receive+0x78/0xec [openvswitch]
[ 2794.140803]  netdev_port_receive+0xb8/0x170 [openvswitch]
[ 2794.146194]  netdev_frame_hook+0x28/0x3c [openvswitch]
[ 2794.151323]  __netif_receive_skb_core.constprop.0+0x2b0/0xd4c
[ 2794.157058]  __netif_receive_skb_one_core+0x40/0x84
[ 2794.161922]  __netif_receive_skb+0x1c/0x6c
[ 2794.166006]  process_backlog+0xe0/0x1b0
[ 2794.169829]  __napi_poll+0x3c/0x210
[ 2794.173305]  net_rx_action+0x308/0x3b0
[ 2794.177042]  __do_softirq+0x120/0x3d0
[ 2794.180693]  do_softirq+0xa8/0xbc
[ 2794.183997]  __local_bh_enable_ip+0xa0/0xb0
[ 2794.188168]  ip6_finish_output2+0x1c8/0x720
[ 2794.192339]  __ip6_finish_output+0x17c/0x2b0
[ 2794.196596]  ip6_finish_output+0x38/0xf0
[ 2794.200506]  ip6_output+0x78/0x1d0
[ 2794.203895]  NF_HOOK.constprop.0+0xcc/0xdc
[ 2794.207980]  ndisc_send_skb+0x2e8/0x430
[ 2794.211804]  ndisc_send_ns+0x68/0xb0
[ 2794.215367]  addrconf_dad_work+0x2a8/0x380
[ 2794.219451]  process_one_work+0x1e4/0x4a0
[ 2794.223449]  worker_thread+0x158/0x430
[ 2794.227185]  kthread+0xe8/0xf4
[ 2794.230228]  ret_from_fork+0x10/0x20
[ 2794.233792] Code: 8b020021 350008a0 d503201f d2800022 (f822003f) 
[ 2794.239874] SMP: stopping secondary CPUs
[ 2794.245020] Starting crashdump kernel...
[ 2794.248930] Bye!



$ cat /etc/os-release 
NAME="CentOS Stream CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="413.92.202303190222-0"
VERSION_ID="4.13"
VARIANT="CoreOS"
VARIANT_ID=coreos
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream CoreOS 413.92.202303190222-0 (Plow)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:9coreos"
HOME_URL="https://centos.org/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.13/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.13"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.13"
OPENSHIFT_VERSION="4.13"
RHEL_VERSION="9"
OSTREE_VERSION="413.92.202303190222-0"


kdump link:
https://drive.google.com/file/d/1lxjRCFdmWZEPrsM6nrw6GrMo5OjpJptH/view?usp=share_link

must gather:
https://drive.google.com/file/d/19eInoBJRXyRsusxLaOOOH5AambI6arRG/view?usp=share_link

Comment 1 Eran Ifrach 2023-04-16 07:54:14 UTC
i deployed OCP4.13 RC2

The issue is still there, although less reboots then before 


$ ll /var/crash/
total 0
drwxr-xr-x. 2 root root 67 Apr  7 07:30 127.0.0.1-2023-04-07-07:29:58
drwxr-xr-x. 2 root root 67 Apr  8 15:03 127.0.0.1-2023-04-08-15:03:11

$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="413.92.202303281804-0"

Comment 3 Aaron Conole 2023-09-25 16:20:19 UTC
Hi Eran,

Can you collect a kdump that we can use to look at this?  I'm not aware of any issues currently
in ovs module that would cause this kind of crash.  A kdump would help to at least see what is
happening in the aarch64 system.

-Aaron

Comment 4 Eran Ifrach 2023-09-26 05:43:06 UTC
hey Aaron,
Thanks for the replay 

I have attached a link to a kdump
https://drive.google.com/file/d/1lxjRCFdmWZEPrsM6nrw6GrMo5OjpJptH/view?usp=share_link

do you need another ?

Comment 5 ovs-bot 2024-10-08 17:49:14 UTC
This bug did not meet the criteria for automatic migration and is being closed.
If the issue remains, please open a new ticket in https://issues.redhat.com/browse/FDP