Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1950268

Summary: Removing a VM and its ports (VFs) produces a kernel crash when using a RT image in computes
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Miguel Angel Nieto <mnietoji>
Component: DPDKAssignee: Flavio Leitner <fleitner>
DPDK sub component: ovs-dpdk QA Contact: liting <tli>
Status: CLOSED EOL Docs Contact:
Severity: unspecified    
Priority: unspecified CC: apevec, ctrautma, jlibosva, ktraynor, lhh, majopela, mlavalle, oblaut, qding, scohen, supadhya
Version: FDP 21.B   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-10-08 17:49:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miguel Angel Nieto 2021-04-16 09:26:53 UTC
Description of problem:
Removing a VM and its ports (VFs) produces a kernel crash when using a RT image in computes

[ 8184.214770] IPv4: martian source 10.35.74.8 from 10.35.74.126, on dev eno1
[ 8184.214773] ll header: 00000000: ff ff ff ff ff ff 9c cc 83 58 1c 60 08 06        .........X.`..
[ 8192.714949] i40e 0000:05:00.2: Setting MAC b6:e2:14:b6:d6:4e on VF 8
[ 8192.800714] i40e 0000:05:00.2: Bring down and up the VF interface to make this change effective.
[ 8192.811921] iavf 0000:05:0b.0: enabling device (0000 -> 0002)
[ 8192.874279] iavf 0000:05:0b.0: Multiqueue Enabled: Queue pair count = 4
[ 8192.878943] iavf 0000:05:0b.0: MAC address: b6:e2:14:b6:d6:4e
[ 8192.878945] iavf 0000:05:0b.0: GRO is enabled
[ 8192.893759] iavf 0000:05:0b.0 enp5s0f2v8: renamed from eth0
[ 8192.999646] iavf 0000:05:0b.0: Reset warning received from the PF
[ 8192.999649] iavf 0000:05:0b.0: Scheduling reset task
[ 8193.105429] i40e 0000:05:00.2: VF 8 is now untrusted
[ 8193.108240] IPv6: ADDRCONF(NETDEV_UP): enp5s0f2v8: link is not ready
[ 8193.121854] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 8193.121856] PGD 0 P4D 0
[ 8193.121860] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 8193.121863] CPU: 21 PID: 5689 Comm: NetworkManager Kdump: loaded Not tainted 4.18.0-193.28.1.rt13.77.el8_2.x86_64 #1
[ 8193.121864] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.8.0 005/17/2018
[ 8193.121872] RIP: 0010:iavf_alloc_rx_buffers+0x4f/0x250 [iavf]
[ 8193.121874] Code: 0f 85 df 00 00 00 0f b7 47 48 41 89 f7 48 89 fb 49 89 c4 48 8d 14 40 49 89 c5 48 8b 47 20 49 c1 e4 05 4c 03 67 08 48 8d 2c d0 <48> 83 7d 08 00 0f b7 4b 46 0f 84 c1 00 00 00 48 83 83 80 00 00 00
[ 8193.121875] RSP: 0018:ffffc16857923558 EFLAGS: 00010246
[ 8193.121877] RAX: 0000000000000000 RBX: ffff9b72e22e1000 RCX: 0000000000000200
[ 8193.121878] RDX: 0000000000000000 RSI: 00000000000001ff RDI: ffff9b72e22e1000
[ 8193.121879] RBP: 0000000000000000 R08: 0000000000000600 R09: ffff9b7b220a0ec0
[ 8193.121880] R10: 0000000092492480 R11: 0000000000000000 R12: 0000000000000000
[ 8193.121881] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000001ff
[ 8193.121882] FS:  00007f7fff96d200(0000) GS:ffff9b7b3f880000(0000) knlGS:0000000000000000
[ 8193.121883] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8193.121884] CR2: 0000000000000008 CR3: 0000003fe9bce001 CR4: 00000000003626e0
[ 8193.121886] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8193.121887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8193.121887] Call Trace:
[ 8193.121896]  iavf_configure+0x124/0x180 [iavf]
[ 8193.121901]  iavf_open+0x100/0x180 [iavf]
[ 8193.121905]  __dev_open+0xcd/0x160
[ 8193.121908]  __dev_change_flags+0x1ad/0x220
[ 8193.121912]  dev_change_flags+0x21/0x60
[ 8193.121916]  do_setlink+0x314/0xed0
[ 8193.121920]  ? preempt_count_add+0x79/0xb0
[ 8193.121922]  ? preempt_count_add+0x79/0xb0
[ 8193.121926]  ? __nla_validate_parse+0x51/0x840

It is reproduce running the following testcase:
python -m testtools.run nfv_tempest_plugin.tests.scenario.test_nfv_sriov_usecases.TestSriovScenarios.test_sriov_free_resource
And the following templates:
https://gitlab.cee.redhat.com/mnietoji/deployment_templates/-/tree/460218cb433959a6b73597a437882966391b1417/tht/panther08/ospd-16.1-geneve-ovn-dpdk-sriov-ctlplane-dataplane-bonding-rt-hybrid-performance-panther08

The testcase does something similar to the following:
#!/usr/bin/env bash

#networks
openstack network create --provider-network-type geneve mgmt
openstack subnet create --gateway 10.10.10.254  --network mgmt --subnet-range 10.10.10.0/24  --dhcp --dns-nameserver 10.46.0.31 --dns-nameserver 8.8.8.8 --allocation-pool start=10.10.10.100,end=10.10.10.200 mgmt_subnet
openstack network create --provider-physical-network sriov-1 --provider-network-type vlan sriov_vf
openstack subnet create --gateway 40.0.0.254  --network sriov_vf --subnet-range 40.0.0.0/24  --dhcp --dns-nameserver 10.46.0.31 --dns-nameserver 8.8.8.8 --allocation-pool start=40.0.0.100,end=40.0.0.200 sriov_vf_subnet

#ports
openstack port create --network mgmt --vnic-type normal mgmt_1
openstack port create --network mgmt --vnic-type normal mgmt_2
openstack port create --network mgmt --vnic-type normal mgmt_3
openstack port create --network mgmt --vnic-type normal mgmt_4
openstack port create --network sriov_vf --vnic-type direct sriov_vf_1
openstack port create --network sriov_vf --vnic-type direct sriov_vf_2
openstack port create --network sriov_vf --vnic-type direct sriov_vf_3
openstack port create --network sriov_vf --vnic-type direct sriov_vf_4
#flavor
openstack flavor create --ram 8192 --disk 20 --vcpus 6 nfv_qe_base_flavor
openstack flavor set nfv_qe_base_flavor --property hw:mem_page_size=large --property hw:cpu_policy=dedicated --property hw:cpu_realtime=yes --property hw:cpu_emulator_threads=isolate --property hw:cpu_realtime_mask=^0-1

#image
curl -o rhel-guest-image-7-6-210-x86-64-qcow2 http://rhos-qe-mirror-tlv.usersys.redhat.com/brewroot/packages/rhel-guest-image/7.6/210/images/rhel-guest-image-7.6-210.x86_64.qcow2
openstack image  create --disk-format qcow2 --container-format bare --public --file ./rhel-guest-image-7-6-210-x86-64-qcow2 rhel-guest-image-7-6-210-x86-64-qcow2

#keypair
openstack keypair create --public-key /home/stack/.ssh/id_rsa.pub mykeypair

#vms
openstack server create --key-name  mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_1 --port sriov_vf_1 myinstance1
openstack server create --key-name  mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_2 --port sriov_vf_2 myinstance2
openstack server create --key-name  mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_3 --port sriov_vf_3 myinstance3
openstack server create --key-name  mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_4 --port sriov_vf_4 myinstance4
#destroy ports and vms
ips=$(openstack server list --a -c Networks -f value | sed 's/[=,;]/ /g' | awk '{print $2,$4}')
ips=$(echo $ips | sed 's/ /|/g')
ports=$(openstack port list -f value | egrep $ips | awk '{print $1}')
servers=$(openstack server list --a -c ID -f value)
for server in $servers;do
    openstack server delete $server
done
for port in $ports;do
    openstack port delete $port
done

It is not reproduced every time the testcase is run, but I have reproduced it several times


Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20210323.n.0(venv) (overcloud) [stack@undercloud-0 ~]
Red Hat Enterprise Linux release 8.2 (Ootpa)
Linux computeovndpdksriovrt-1 4.18.0-193.28.1.rt13.77.el8_2.x86_64 #1 SMP PREEMPT RT Fri Oct 16 14:11:07 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
look above


Actual results:
kernel crash

Expected results:
No kernel crash should be generated


Additional info:
I will upload sos reports and kernel crash dumps

Comment 5 ovs-bot 2024-10-08 17:49:14 UTC
This bug did not meet the criteria for automatic migration and is being closed.
If the issue remains, please open a new ticket in https://issues.redhat.com/browse/FDP