Bug 1962143 - [FDP 21.D Testing] 16.1.6 GA failing spawn vm with direct port
Summary: [FDP 21.D Testing] 16.1.6 GA failing spawn vm with direct port
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Open vSwitch development team
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-19 11:44 UTC by Yariv
Modified: 2022-08-18 16:38 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-24 07:41:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-3994 0 None None None 2022-08-18 16:38:04 UTC

Description Yariv 2021-05-19 11:44:44 UTC
Description of problem:

Running FDP 21.D and  fail to spawn vm with SRIOV [direct attache] ports attached.
 

Version-Release number of selected component (if applicable):

[root@computeovsdpdksriov-1 ~]# rpm -qa | grep openvsw
network-scripts-openvswitch2.13-2.13.0-105.el8fdp.x86_64
openvswitch2.13-2.13.0-105.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
rhosp-openvswitch-2.13-10.el8ost.noarch


How reproducible:

Permanent

Steps to Reproduce:
1. create network with sriov physnet
2. create direct port
3. launch vm, 

Actual results:

vm state in Error

Expected results:

vm state in Active

Additional info:

sos report 
rhosp-openvswitch-2.13-10.el8ost.noarch|http://rhos-release.virt.bos.redhat.com/log/bzovs-16-1-fdp-bz

THTs templates:
https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=tht/ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid;h=be8aefef80109e1e946ae516d0b0d6089abefa78;hb=HEAD

Comment 1 Yariv 2021-05-19 11:49:07 UTC
core_puddle_version 
RHOS-16.1-RHEL-8-20210506.n.1(overcloud)

openswitch, vxlan deployment

Apply the following commands.

[fails with and w/o binding-profile]
openstack port create  --network sriov_net_nic0_138  --vnic-type direct --binding-profile trusted=true sriov_test_trust_port

openstack server create --flavor perf_numa_0_sriov_dut --image trex_testpmd --nic  port-id=b3f1b958-bf30-4d20-9418-f056622d10e0 sriov_test_vm

vm in Error state.
Jobs 
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-nfv-16.1-director-3cont-2comp-ipv4-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/117/

Comment 2 Yariv 2021-05-19 11:57:50 UTC
neutron log, in compute contain errors such as 

Refusing to bind due to unsupported vnic_type: direct with no s
witchdev capability bind_port /usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/mech_driver/mech_openvswitch.py:133
2021-05-19 08:17:04.721 27 ERROR neutron.plugins.ml2.managers [req-4edccc4d-2198-47e2-98c3-34bd6dab0e09 dfd1ae85410047acad7a2dd126cbe34b 8d83c7325
bea463f959e50e6d4cf3b19 - default default] Failed to bind port f88d56e5-d4bd-477e-a90b-b1f30f91f0c0 on host computeovsdpdksriov-1.localdomain for 
vnic_type direct using segments [{'id': '3f0ffb11-0dfc-4488-9abb-99fadc8b155c', 'network_type': 'vlan', 'physical_network': 'sriov-1', 'segmentati
on_id': 138, 'network_id': 'bf9b9543-e1b3-4447-8e70-d6529aa54b5d'}]
2021-05-19 08:17:04.721 27 DEBUG neutron_lib.callbacks.manager [req-4edccc4d-2198-47e2-98c3-34bd6dab0e09 dfd1ae85410047acad7a2dd126cbe34b 8d83c732
5bea463f959e50e6d4cf3b19 - default default] Notify callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_hand
ler--9223372036851598663'] for port, before_update _notify_loop /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193
2021-05-19 08:17:04.737 26 DEBUG neutron.wsgi [-] (26) accepted ('10.0.130.31', 57426) server /usr/lib/python3.6/site-packages/eventlet/wsgi.py:98
5

Comment 3 Yariv 2021-05-20 12:47:52 UTC
Could be THT issue, has to verify

Comment 4 Maxime Coquelin 2021-05-20 14:14:34 UTC
Hi Yariv,

(In reply to Yariv from comment #2)
> neutron log, in compute contain errors such as 
> 
> Refusing to bind due to unsupported vnic_type: direct with no s
> witchdev capability bind_port
> /usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/
> mech_driver/mech_openvswitch.py:133
> 2021-05-19 08:17:04.721 27 ERROR neutron.plugins.ml2.managers
> [req-4edccc4d-2198-47e2-98c3-34bd6dab0e09 dfd1ae85410047acad7a2dd126cbe34b
> 8d83c7325
> bea463f959e50e6d4cf3b19 - default default] Failed to bind port
> f88d56e5-d4bd-477e-a90b-b1f30f91f0c0 on host
> computeovsdpdksriov-1.localdomain for 
> vnic_type direct using segments [{'id':
> '3f0ffb11-0dfc-4488-9abb-99fadc8b155c', 'network_type': 'vlan',
> 'physical_network': 'sriov-1', 'segmentati
> on_id': 138, 'network_id': 'bf9b9543-e1b3-4447-8e70-d6529aa54b5d'}]
> 2021-05-19 08:17:04.721 27 DEBUG neutron_lib.callbacks.manager
> [req-4edccc4d-2198-47e2-98c3-34bd6dab0e09 dfd1ae85410047acad7a2dd126cbe34b
> 8d83c732
> 5bea463f959e50e6d4cf3b19 - default default] Notify callbacks
> ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin.
> _ensure_default_security_group_hand
> ler--9223372036851598663'] for port, before_update _notify_loop
> /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193
> 2021-05-19 08:17:04.737 26 DEBUG neutron.wsgi [-] (26) accepted
> ('10.0.130.31', 57426) server
> /usr/lib/python3.6/site-packages/eventlet/wsgi.py:98
> 5

Looking at the error message:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/mech_driver/mech_openvswitch.py#L133

It seems the test is trying to use switchdev mode but the device does not support it.
Looking at sosreport-computeovsdpdksriov-0-2021-05-19-acerrss lspci, the compute is equipped with Intel X710 NIC,
which AFAICT does not support this mode.

Adding Sean, who should be able to confirm whether or not the X710 does not support  switchdev mode.

If this is the case, it seems the problem may be more related to the test.

Comment 5 smooney 2021-05-20 18:39:14 UTC
unless it was added in a recent frimware update which i doubt no teh X710 does not support the swtichdev api and cannot be used with hardware offloaded ovs.

is the sriov nic agent deployed in that test enviornment.

the openvswich mech driver should not bind the port which it correctly rejected in the log trace above but the sriov nic agent should be able to bind it.

looking at https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=blob;f=tht/ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/network-environment-overrides.yaml;h=489b4a6cbd82e37714fb55769f9023c27d1dc349;hb=HEAD i dont see where you are enabling the sriov nic agent

https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=blob;f=tht/ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/nic-configs/computeovsdpdksriov.yaml;h=9c0d365a2ac8799c9c21f5c6abce063bcb8f7f92;hb=HEAD has the nic configuration but no neutron configuration

in principal 
https://opendev.org/openstack/tripleo-heat-templates/src/commit/865c65b8f43a909c94b1b50712b5baf088af9566/environments/neutron-sriov.yaml
which is enabled in https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=blob;f=tht/ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/overcloud_deploy.sh;h=a3d1423d10003202a92cc3a1516d64db5d1d9db3;hb=HEAD
should enable the sriov nic aget ml2 driver.

but i would assume the sriov_nic_agent was just not deployed in this env on the compute host?

Comment 7 Yariv 2021-05-24 07:41:20 UTC
(In reply to Yariv from comment #3)
> Could be THT issue, has to verify

Closing this BZ, we had THT issues in CI, thanks for taking a look


Note You need to log in before you can comment on or make changes to this bug.