Description of problem: Running FDP 21.D and fail to spawn vm with SRIOV [direct attache] ports attached. Version-Release number of selected component (if applicable): [root@computeovsdpdksriov-1 ~]# rpm -qa | grep openvsw network-scripts-openvswitch2.13-2.13.0-105.el8fdp.x86_64 openvswitch2.13-2.13.0-105.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch rhosp-openvswitch-2.13-10.el8ost.noarch How reproducible: Permanent Steps to Reproduce: 1. create network with sriov physnet 2. create direct port 3. launch vm, Actual results: vm state in Error Expected results: vm state in Active Additional info: sos report rhosp-openvswitch-2.13-10.el8ost.noarch|http://rhos-release.virt.bos.redhat.com/log/bzovs-16-1-fdp-bz THTs templates: https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=tht/ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid;h=be8aefef80109e1e946ae516d0b0d6089abefa78;hb=HEAD
core_puddle_version RHOS-16.1-RHEL-8-20210506.n.1(overcloud) openswitch, vxlan deployment Apply the following commands. [fails with and w/o binding-profile] openstack port create --network sriov_net_nic0_138 --vnic-type direct --binding-profile trusted=true sriov_test_trust_port openstack server create --flavor perf_numa_0_sriov_dut --image trex_testpmd --nic port-id=b3f1b958-bf30-4d20-9418-f056622d10e0 sriov_test_vm vm in Error state. Jobs https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-nfv-16.1-director-3cont-2comp-ipv4-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/117/
neutron log, in compute contain errors such as Refusing to bind due to unsupported vnic_type: direct with no s witchdev capability bind_port /usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/mech_driver/mech_openvswitch.py:133 2021-05-19 08:17:04.721 27 ERROR neutron.plugins.ml2.managers [req-4edccc4d-2198-47e2-98c3-34bd6dab0e09 dfd1ae85410047acad7a2dd126cbe34b 8d83c7325 bea463f959e50e6d4cf3b19 - default default] Failed to bind port f88d56e5-d4bd-477e-a90b-b1f30f91f0c0 on host computeovsdpdksriov-1.localdomain for vnic_type direct using segments [{'id': '3f0ffb11-0dfc-4488-9abb-99fadc8b155c', 'network_type': 'vlan', 'physical_network': 'sriov-1', 'segmentati on_id': 138, 'network_id': 'bf9b9543-e1b3-4447-8e70-d6529aa54b5d'}] 2021-05-19 08:17:04.721 27 DEBUG neutron_lib.callbacks.manager [req-4edccc4d-2198-47e2-98c3-34bd6dab0e09 dfd1ae85410047acad7a2dd126cbe34b 8d83c732 5bea463f959e50e6d4cf3b19 - default default] Notify callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_hand ler--9223372036851598663'] for port, before_update _notify_loop /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193 2021-05-19 08:17:04.737 26 DEBUG neutron.wsgi [-] (26) accepted ('10.0.130.31', 57426) server /usr/lib/python3.6/site-packages/eventlet/wsgi.py:98 5
Could be THT issue, has to verify
Hi Yariv, (In reply to Yariv from comment #2) > neutron log, in compute contain errors such as > > Refusing to bind due to unsupported vnic_type: direct with no s > witchdev capability bind_port > /usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/ > mech_driver/mech_openvswitch.py:133 > 2021-05-19 08:17:04.721 27 ERROR neutron.plugins.ml2.managers > [req-4edccc4d-2198-47e2-98c3-34bd6dab0e09 dfd1ae85410047acad7a2dd126cbe34b > 8d83c7325 > bea463f959e50e6d4cf3b19 - default default] Failed to bind port > f88d56e5-d4bd-477e-a90b-b1f30f91f0c0 on host > computeovsdpdksriov-1.localdomain for > vnic_type direct using segments [{'id': > '3f0ffb11-0dfc-4488-9abb-99fadc8b155c', 'network_type': 'vlan', > 'physical_network': 'sriov-1', 'segmentati > on_id': 138, 'network_id': 'bf9b9543-e1b3-4447-8e70-d6529aa54b5d'}] > 2021-05-19 08:17:04.721 27 DEBUG neutron_lib.callbacks.manager > [req-4edccc4d-2198-47e2-98c3-34bd6dab0e09 dfd1ae85410047acad7a2dd126cbe34b > 8d83c732 > 5bea463f959e50e6d4cf3b19 - default default] Notify callbacks > ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin. > _ensure_default_security_group_hand > ler--9223372036851598663'] for port, before_update _notify_loop > /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193 > 2021-05-19 08:17:04.737 26 DEBUG neutron.wsgi [-] (26) accepted > ('10.0.130.31', 57426) server > /usr/lib/python3.6/site-packages/eventlet/wsgi.py:98 > 5 Looking at the error message: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/mech_driver/mech_openvswitch.py#L133 It seems the test is trying to use switchdev mode but the device does not support it. Looking at sosreport-computeovsdpdksriov-0-2021-05-19-acerrss lspci, the compute is equipped with Intel X710 NIC, which AFAICT does not support this mode. Adding Sean, who should be able to confirm whether or not the X710 does not support switchdev mode. If this is the case, it seems the problem may be more related to the test.
unless it was added in a recent frimware update which i doubt no teh X710 does not support the swtichdev api and cannot be used with hardware offloaded ovs. is the sriov nic agent deployed in that test enviornment. the openvswich mech driver should not bind the port which it correctly rejected in the log trace above but the sriov nic agent should be able to bind it. looking at https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=blob;f=tht/ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/network-environment-overrides.yaml;h=489b4a6cbd82e37714fb55769f9023c27d1dc349;hb=HEAD i dont see where you are enabling the sriov nic agent https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=blob;f=tht/ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/nic-configs/computeovsdpdksriov.yaml;h=9c0d365a2ac8799c9c21f5c6abce063bcb8f7f92;hb=HEAD has the nic configuration but no neutron configuration in principal https://opendev.org/openstack/tripleo-heat-templates/src/commit/865c65b8f43a909c94b1b50712b5baf088af9566/environments/neutron-sriov.yaml which is enabled in https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=blob;f=tht/ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/overcloud_deploy.sh;h=a3d1423d10003202a92cc3a1516d64db5d1d9db3;hb=HEAD should enable the sriov nic aget ml2 driver. but i would assume the sriov_nic_agent was just not deployed in this env on the compute host?
by the way the ml2_conf.ini does not appare to be in the job logs http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-nfv-16.1-director-3cont-2comp-ipv4-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/117/controller-0/var/lib/config-data/neutron/etc/neutron/ http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-nfv-16.1-director-3cont-2comp-ipv4-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/117/controller-1/var/lib/config-data/neutron/etc/neutron/ http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-nfv-16.1-director-3cont-2comp-ipv4-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/117/controller-2/var/lib/config-data/neutron/etc/neutron/ although the sriov nic agent logs are there http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-nfv-16.1-director-3cont-2comp-ipv4-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/117/computeovsdpdksriov-1/var/log/containers/neutron/sriov-nic-agent.log.gz so im guessing that the issue iw whit the neutron server config and the the sriov nic agent ml2 driver is not enabled.
(In reply to Yariv from comment #3) > Could be THT issue, has to verify Closing this BZ, we had THT issues in CI, thanks for taking a look