Bug 1900240 - OVS DPDK bond LACP fails during provisioning
Summary: OVS DPDK bond LACP fails during provisioning
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Open vSwitch development team
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-21 17:02 UTC by jpateteg
Modified: 2020-12-16 16:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-16 16:56:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
journalctl os-net-config (72.85 KB, text/plain)
2020-11-21 17:10 UTC, jpateteg
no flags Details

Description jpateteg 2020-11-21 17:02:15 UTC
Description of problem:
os-net-config fails to create an OVS Bond with LACP, hence, the connectivity tests fail and the overcloud deployment ends.

Version-Release number of selected component (if applicable):
ovs-vsctl (Open vSwitch) 2.11.0
rhosp 13 z12


How reproducible:
100% 

Steps to Reproduce:
1. Create an LACP bonding in the compute-dpdk template
2. run overcloud deploy command
3.

Actual results:
DPDK Bond is not created with the below ovs-vsctl show output
 Port "dpdkbond0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:af:00.0", n_rxq="2"}
                error: "could not open network device dpdk0 (Address family not supported by protocol)"
            Interface "dpdk1"
                type: dpdk
                options: {dpdk-devargs="0000:af:00.1", n_rxq="2"}
                error: "could not open network device dpdk1 (Address family not supported by protocol)"
    ovs_version: "2.11.0"


Bond is not created:
[root@mxtla01lab2com03 openvswitch]# ovs-appctl bond/list
bond    type    recircID        slaves
[root@mxtla01lab2com03 openvswitch]#


Expected results:

The bond should be created

Additional info:
The TenantIp is in this NIC so the deployment do not succeed as there is no ping between compute nodes on this network.

The Dpdk parameters I am using (on network-environment.yaml)
 ComputeOvsDpdkParameters:
    KernelArgs: default_hugepagesz=1GB hugepagesz=1G hugepages=400 iommu=pt intel_iommu=on isolcpus=1-23,25-47,49-71,73-95
    TunedProfileName: "cpu-partitioning"
    IsolCpusList: "1-23,25-47,49-71,73-95" #Para el rol OVSDPDK, solo NUMA0 tiene DPDK, no es necesario aislar todos los CPUs.
    NovaVcpuPinSet: ['3-23,27-47,51-71,75-95']
    NovaReservedHostMemory: 8192
    OvsDpdkSocketMemory: "1024,4096"
    OvsDpdkMemoryChannels: "8"
    OvsDpdkCoreList: "0,24,48,72" # primer thread de cada core
    NovaComputeCpuSharedSet: ['0,24,48,72']
    OvsPmdCoreList: "1,49,2,50,25,73,26,74" #2 CPUs con su sibling thread
    NeutronBridgeMappings:
    - datacentre:br-ex
    - tenant:br-dpdk0
    NovaLibvirtRxQueueSize: 1024
    NovaLibvirtTxQueueSize: 1024
    NeutronDatapathType: netdev

The NIC template piece:
                - type: ovs_user_bridge
                name: br-dpdk0
                use_dhcp: false
                ovs_extra:
                  - str_replace:
                      template: set port br-dpdk0 tag=_VLAN_TAG_
                      params:
                        _VLAN_TAG_:
                           get_param: TenantNetworkVlanID
                addresses:
                  - ip_netmask:
                      get_param: TenantIpSubnet
                mtu: 9000
                members:
                  - type: ovs_dpdk_bond
                    name: dpdkbond0
                    ovs_options: "bond_mode=balance-slb lacp=active"
                    mtu: 9000
                    rx_queue: 2
                    members:
                    - type: ovs_dpdk_port
                      name: dpdk0
                      mtu: 9000
                      members:
                        - type: interface
                          name: ens2f0
                    - type: ovs_dpdk_port
                      name: dpdk1
                      mtu: 9000
                      members:
                        - type: interface
                          name: ens2f1

Comment 1 jpateteg 2020-11-21 17:05:44 UTC
openvswitch process

[root@mxtla01lab2com03 openvswitch]# ps -ef | grep openvsw
openvsw+  4073     1  0 Nov20 ?        00:00:03 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
openvsw+  4130     1  0 Nov20 ?        00:01:25 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root     27384 27371  0 12:05 pts/0    00:00:00 grep --color=auto openvsw

Comment 2 jpateteg 2020-11-21 17:05:48 UTC
openvswitch logs:

[root@mxtla01lab2com03 openvswitch]# cat ovs-vswitchd.log
2020-11-21T08:25:01.915Z|00198|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2020-11-21T16:28:33.467Z|00199|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:33.468Z|00200|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:33.468Z|00201|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:33.468Z|00202|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:33.587Z|00203|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:33.587Z|00204|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:33.587Z|00205|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:33.587Z|00206|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:33.685Z|00207|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:33.685Z|00208|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:33.685Z|00209|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:33.685Z|00210|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:33.799Z|00211|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:33.799Z|00212|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:33.799Z|00213|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:33.799Z|00214|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:33.845Z|00215|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:33.845Z|00216|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:33.845Z|00217|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:33.845Z|00218|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:33.859Z|00219|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:33.859Z|00220|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:33.859Z|00221|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:33.859Z|00222|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:33.965Z|00223|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:33.965Z|00224|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:33.965Z|00225|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:33.965Z|00226|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:33.978Z|00227|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:33.978Z|00228|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:33.978Z|00229|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:33.978Z|00230|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:34.589Z|00231|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:34.589Z|00232|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:34.589Z|00233|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:34.589Z|00234|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
2020-11-21T16:28:34.597Z|00235|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
2020-11-21T16:28:34.597Z|00236|bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol)
2020-11-21T16:28:34.597Z|00237|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
2020-11-21T16:28:34.597Z|00238|bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
[root@mxtla01lab2com03 openvswitch]#

Comment 3 jpateteg 2020-11-21 17:07:49 UTC
config.json (os-net-config)
{"network_config": [{"addresses": [{"ip_netmask": "14.195.13.148/25"}], "bonding_options": "mode=1 miimon=100", "members": [{"name": "ens1f0", "primary": true, "type": "interface", "use_dhcp": false}, {"name": "ens1f1", "type": "interface", "use_dhcp": false}], "name": "bond0", "routes": [{"ip_netmask": "169.254.169.254/32", "next_hop": "14.195.13.135"}], "type": "linux_bond", "use_dhcp": false}, {"bonding_options": "mode=1 miimon=100", "members": [{"mtu": 9000, "name": "ens1f4", "primary": true, "type": "interface", "use_dhcp": false}, {"mtu": 9000, "name": "ens1f5", "type": "interface", "use_dhcp": false}], "mtu": 9000, "name": "bond1", "type": "linux_bond", "use_dhcp": false}, {"addresses": [{"ip_netmask": "10.3.0.10/25"}], "device": "bond1", "mtu": 9000, "type": "vlan", "vlan_id": 456}, {"bonding_options": "mode=1 miimon=100", "members": [{"name": "ens1f6", "primary": true, "type": "interface", "use_dhcp": false}, {"name": "ens1f7", "type": "interface", "use_dhcp": false}], "name": "bond2", "type": "linux_bond", "use_dhcp": false}, {"addresses": [{"ip_netmask": "14.195.11.142/25"}], "device": "bond2", "routes": [{"default": true, "next_hop": "14.195.11.129"}], "type": "vlan", "vlan_id": 440}, {"addresses": [{"ip_netmask": "10.1.0.10/25"}], "device": "bond2", "type": "vlan", "vlan_id": 454}, {"addresses": [{"ip_netmask": "10.2.0.10/25"}], "members": [{"members": [{"members": [{"name": "ens2f0", "type": "interface"}], "mtu": 9000, "name": "dpdk0", "type": "ovs_dpdk_port"}, {"members": [{"name": "ens2f1", "type": "interface"}], "mtu": 9000, "name": "dpdk1", "type": "ovs_dpdk_port"}], "mtu": 9000, "name": "dpdkbond0", "ovs_options": "bond_mode=balance-slb lacp=active", "rx_queue": 2, "type": "ovs_dpdk_bond"}], "mtu": 9000, "name": "br-dpdk0", "ovs_extra": ["set port br-dpdk0 tag=455"], "type": "ovs_user_bridge", "use_dhcp": false}, {"defroute": false, "mtu": 9000, "name": "ens3f0", "type": "interface", "use_dhcp": false}, {"defroute": false, "mtu": 9000, "name": "ens3f1", "type": "interface", "use_dhcp": false}]}

Comment 4 jpateteg 2020-11-21 17:10:24 UTC
Created attachment 1731788 [details]
journalctl os-net-config

This is the output of journalctl for os-net-config

Comment 5 jpateteg 2020-11-21 17:15:36 UTC
Deployment Failure as a consequence of the connectivity issue derived from the lack of the bond

2020-11-21 16:40:49Z [overcloud-ComputeOvsDpdkAllNodesValidationDeployment-oo763hmyjlvr.1]: SIGNAL_IN_PROGRESS  Signal: deployment fffbd117-75f1-4217-a01f-7c3b7d14e39b failed (1)
2020-11-21 16:40:50Z [overcloud-ComputeOvsDpdkAllNodesValidationDeployment-oo763hmyjlvr.1]: CREATE_FAILED  Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2020-11-21 16:40:50Z [overcloud-ComputeOvsDpdkAllNodesValidationDeployment-oo763hmyjlvr]: UPDATE_FAILED  Resource CREATE failed: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2020-11-21 16:40:50Z [ComputeOvsDpdkAllNodesValidationDeployment]: UPDATE_FAILED  resources.ComputeOvsDpdkAllNodesValidationDeployment: Resource CREATE failed: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2020-11-21 16:40:50Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.ComputeOvsDpdkAllNodesValidationDeployment: Resource CREATE failed: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1

 Stack overcloud UPDATE_FAILED

overcloud.ComputeOvsDpdkAllNodesValidationDeployment.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: fffbd117-75f1-4217-a01f-7c3b7d14e39b
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
  deploy_stdout: |
    ...
    Ping to 10.2.0.10 failed. Retrying...
    Ping to 10.2.0.10 failed. Retrying...
    Ping to 10.2.0.10 failed. Retrying...
    Ping to 10.2.0.10 failed. Retrying...
    Ping to 10.2.0.10 failed. Retrying...
    Ping to 10.2.0.10 failed. Retrying...
    Ping to 10.2.0.10 failed. Retrying...
    Ping to 10.2.0.10 failed. Retrying...
    Ping to 10.2.0.10 failed. Retrying...
    FAILURE
    (truncated, view all with --long)
  deploy_stderr: |
    10.2.0.10 is not pingable. Local Network: 10.2.0.0/25
Heat Stack update failed.
Heat Stack update failed.

real    21m32.117s
user    0m5.091s
sys     0m0.508s
[stack@mxtlal01lab2dir ~]$
[stack@mxtlal01lab2dir ~]$
[stack@mxtlal01lab2dir ~]$
[stack@mxtlal01lab2dir ~]$

Comment 6 jpateteg 2020-11-21 17:15:46 UTC
IP link show:
[heat-admin@mxtla01lab2com03 ~]$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 94:f1:28:a7:8f:32 brd ff:ff:ff:ff:ff:ff
3: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 94:f1:28:a7:8f:33 brd ff:ff:ff:ff:ff:ff
4: ens1f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6b brd ff:ff:ff:ff:ff:ff
5: ens1f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6b brd ff:ff:ff:ff:ff:ff
6: ens1f4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6d brd ff:ff:ff:ff:ff:ff
7: ens1f5: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6d brd ff:ff:ff:ff:ff:ff
8: ens1f6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond2 state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6f brd ff:ff:ff:ff:ff:ff
9: ens1f7: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond2 state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6f brd ff:ff:ff:ff:ff:ff
12: enp1s0f4u4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether f6:fd:5c:69:09:7f brd ff:ff:ff:ff:ff:ff
13: ovs-netdev: <BROADCAST,MULTICAST,PROMISC> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a6:e5:a7:fc:a5:80 brd ff:ff:ff:ff:ff:ff
14: br-dpdk0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ea:ee:26:37:76:4c brd ff:ff:ff:ff:ff:ff
15: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6b brd ff:ff:ff:ff:ff:ff
16: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6d brd ff:ff:ff:ff:ff:ff
17: bond2: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6f brd ff:ff:ff:ff:ff:ff
18: vlan454@bond2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6f brd ff:ff:ff:ff:ff:ff
19: vlan440@bond2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6f brd ff:ff:ff:ff:ff:ff
20: vlan456@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:74:56:d0:00:6d brd ff:ff:ff:ff:ff:ff

Comment 7 Chris Fields 2020-12-02 19:49:40 UTC
Hello Jair, I'm focussing on these warnings: |00225|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk

Have you been including neutron-ovs-dpdk.yaml in the container image prepare and the overcloud deployment?  

If that's not the issue please open a support case and include an sosreport and also upload your overcloud templates and deploy command.  Also please attach this bug to the case.  

Thanks

CFields


Note You need to log in before you can comment on or make changes to this bug.