Bug 2183908 - [RHCOS9.2] openshift 4.13 kernel panic
Summary: [RHCOS9.2] openshift 4.13 kernel panic
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch3.1
Version: RHEL 9.0
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Aaron Conole
QA Contact: ovs-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-03 04:51 UTC by Eran Ifrach
Modified: 2023-06-29 07:10 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2794 0 None None None 2023-04-03 04:52:26 UTC

Description Eran Ifrach 2023-04-03 04:51:39 UTC
hey Team,

i've deployed openshift4.13 on RHCOS9.2 (arm) 
and deployed the SRIOV operator without any configuration

i experience random kernel panic without any workloads ( default ocp deployment)


error:
[ 2793.867982] Unable to handle kernel paging request at virtual address ffff45aab3683000
[ 2793.875913] Mem abort info:
[ 2793.878702]   ESR = 0x0000000096000004
[ 2793.882446]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 2793.887749]   SET = 0, FnV = 0
[ 2793.890796]   EA = 0, S1PTW = 0
[ 2793.893935]   FSC = 0x04: level 0 translation fault
[ 2793.898809] Data abort info:
[ 2793.901683]   ISV = 0, ISS = 0x00000004
[ 2793.905511]   CM = 0, WnR = 0
[ 2793.908474] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000081f8ca8c000
[ 2793.915170] [ffff45aab3683000] pgd=0000000000000000, p4d=0000000000000000
[ 2793.921957] Internal error: Oops: 96000004 [#1] SMP
[ 2793.926823] Modules linked in: vhost_net vhost vhost_iotlb tap tun veth nf_conntrack_netlink ipt_REJECT nf_reject_ipv4 xt_nat xt_CT xt_MASQUERADE nft_chain_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables rfkill nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay ext4 mbcache jbd2 mlx5_ib ib_uverbs ast acpi_ipmi arm_spe_pmu drm_shmem_helper ipmi_ssif ib_core drm_kms_helper fb_sys_fops syscopyarea sysfillrect sysimgblt ipmi_devintf arm_cmn arm_dmc620_pmu ipmi_msghandler arm_dsu_pmu cppc_cpufreq sctp ip6_udp_tunnel udp_tunnel ip_tables drm xfs libcrc32c crct10dif_ce ghash_ce mlx5_core sha2_ce sha256_arm64 sha1_ce nvme_tcp nvme_fabrics sbsa_gwdt mlxfw psample nvme tls nvme_core pci_hyperv_intf nvme_common igb i2c_algo_bit xgene_hwmon i2c_designware_platform i2c_designware_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
[ 2794.007348] CPU: 0 PID: 92093 Comm: kworker/0:1 Kdump: loaded Not tainted 5.14.0-285.el9.aarch64 #1
[ 2794.016381] Hardware name: GIGABYTE G242-P34-00/MP32-AR2-00, BIOS F31L (SCP: 2.10.20220531) 09/29/2022
[ 2794.025672] Workqueue: ipv6_addrconf addrconf_dad_work
[ 2794.030803] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2794.037752] pc : ovs_dp_upcall+0x98/0x1d0 [openvswitch]
[ 2794.042973] lr : ovs_dp_upcall+0xc0/0x1d0 [openvswitch]
[ 2794.048190] sp : ffff80004bd1b3e0
[ 2794.051492] x29: ffff80004bd1b3e0 x28: 0000000000000000 x27: 0000000000000000
[ 2794.058616] x26: 000000000000a888 x25: ffff080e729ff300 x24: ffff080bad30aa00
[ 2794.065740] x23: ffff80004bd1b4c8 x22: 0000000000000000 x21: ffff07ff937a0b00
[ 2794.072862] x20: ffff80004bd1b470 x19: ffff080e729ff300 x18: 0000000000000000
[ 2794.079984] x17: 80fe00000000e922 x16: ffffc293ab1bd540 x15: 0000000000000000
[ 2794.087107] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001
[ 2794.094229] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc29397393688
[ 2794.101351] x8 : 0000000000000040 x7 : 000000000000003f x6 : ffff80002c41ba10
[ 2794.108473] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 2794.115595] x2 : 0000000000000001 x1 : ffff45aab3683000 x0 : 0000000000000000
[ 2794.122717] Call trace:
[ 2794.125152]  ovs_dp_upcall+0x98/0x1d0 [openvswitch]
[ 2794.130022]  ovs_dp_process_packet+0x170/0x224 [openvswitch]
[ 2794.135672]  ovs_vport_receive+0x78/0xec [openvswitch]
[ 2794.140803]  netdev_port_receive+0xb8/0x170 [openvswitch]
[ 2794.146194]  netdev_frame_hook+0x28/0x3c [openvswitch]
[ 2794.151323]  __netif_receive_skb_core.constprop.0+0x2b0/0xd4c
[ 2794.157058]  __netif_receive_skb_one_core+0x40/0x84
[ 2794.161922]  __netif_receive_skb+0x1c/0x6c
[ 2794.166006]  process_backlog+0xe0/0x1b0
[ 2794.169829]  __napi_poll+0x3c/0x210
[ 2794.173305]  net_rx_action+0x308/0x3b0
[ 2794.177042]  __do_softirq+0x120/0x3d0
[ 2794.180693]  do_softirq+0xa8/0xbc
[ 2794.183997]  __local_bh_enable_ip+0xa0/0xb0
[ 2794.188168]  ip6_finish_output2+0x1c8/0x720
[ 2794.192339]  __ip6_finish_output+0x17c/0x2b0
[ 2794.196596]  ip6_finish_output+0x38/0xf0
[ 2794.200506]  ip6_output+0x78/0x1d0
[ 2794.203895]  NF_HOOK.constprop.0+0xcc/0xdc
[ 2794.207980]  ndisc_send_skb+0x2e8/0x430
[ 2794.211804]  ndisc_send_ns+0x68/0xb0
[ 2794.215367]  addrconf_dad_work+0x2a8/0x380
[ 2794.219451]  process_one_work+0x1e4/0x4a0
[ 2794.223449]  worker_thread+0x158/0x430
[ 2794.227185]  kthread+0xe8/0xf4
[ 2794.230228]  ret_from_fork+0x10/0x20
[ 2794.233792] Code: 8b020021 350008a0 d503201f d2800022 (f822003f) 
[ 2794.239874] SMP: stopping secondary CPUs
[ 2794.245020] Starting crashdump kernel...
[ 2794.248930] Bye!



$ cat /etc/os-release 
NAME="CentOS Stream CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="413.92.202303190222-0"
VERSION_ID="4.13"
VARIANT="CoreOS"
VARIANT_ID=coreos
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream CoreOS 413.92.202303190222-0 (Plow)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:9coreos"
HOME_URL="https://centos.org/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.13/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.13"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.13"
OPENSHIFT_VERSION="4.13"
RHEL_VERSION="9"
OSTREE_VERSION="413.92.202303190222-0"


kdump link:
https://drive.google.com/file/d/1lxjRCFdmWZEPrsM6nrw6GrMo5OjpJptH/view?usp=share_link

must gather:
https://drive.google.com/file/d/19eInoBJRXyRsusxLaOOOH5AambI6arRG/view?usp=share_link

Comment 1 Eran Ifrach 2023-04-16 07:54:14 UTC
i deployed OCP4.13 RC2

The issue is still there, although less reboots then before 


$ ll /var/crash/
total 0
drwxr-xr-x. 2 root root 67 Apr  7 07:30 127.0.0.1-2023-04-07-07:29:58
drwxr-xr-x. 2 root root 67 Apr  8 15:03 127.0.0.1-2023-04-08-15:03:11

$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="413.92.202303281804-0"


Note You need to log in before you can comment on or make changes to this bug.