Description of problem: In RHOS 10 during the deployment of OVS + DPDK two ports, compute node looses its ip addresses on the PostDeploySteps. The bug refers to the following BZ, where proper heat template configuration for one OVS DPDK port disscussed. https://bugzilla.redhat.com/show_bug.cgi?id=1384562#c20 Version-Release number of selected component (if applicable): RHOS 10 Product version: 10 Product core version: 10 Product core build: 2016-10-21.3 How reproducible: Perform the deployment of two OVS DPDK port overcloud. Use the following templates: network-environment.yaml - http://pastebin.test.redhat.com/425321 controller.yaml - http://pastebin.test.redhat.com/425323 compute.yaml - http://pastebin.test.redhat.com/425324 first-boot.yaml - http://pastebin.test.redhat.com/425325 post-install.yaml - http://pastebin.test.redhat.com/425326 overcloud deploy command - http://pastebin.test.redhat.com/425327 Actual results: During the deployment, the compute node looses ip addresses during the PostDeploySteps Expected results: The deployment of OVS DPDK two ports should succeed. Additional info: If trying to restart the network service, getting the following error: http://pastebin.test.redhat.com/425330
I have modified the overcloud deploy command to change the bridge mapping, following is the diff. --- overcloud_deploy.orig.sh 2016-11-03 07:37:38.817539412 +0200 +++ overcloud_deploy.sh 2016-11-03 07:37:08.368675684 +0200 @@ -13,8 +13,8 @@ openstack overcloud deploy --debug \ --ntp-server clock.redhat.com \ --neutron-network-type vlan \ --neutron-disable-tunneling \ - --neutron-bridge-mappings datacentre:br-isolated,dpdk0:br-link0,dpdk1:br-link1 \ - --neutron-network-vlan-ranges datacentre:399:399,dpdk0:423:423,dpdk1:424:424 \ + --neutron-bridge-mappings dpdk0:br-link0,dpdk1:br-link1 \ + --neutron-network-vlan-ranges dpdk0:423:423,dpdk1:424:424 \ --control-scale 1 \ --control-flavor baremetal \ --compute-scale 1 \ Basically removed the bridge mapping for the tenant network, after which the deployment is successful. In the single nic, datecentre is mapped to external network (br-ex), where as in double nic, due to nic unavailability, br-ex is not present and datacentre is mapped to br-isolated. So there is no difference because of the number of dpdk nics. But it looks like, there is an issue if the tenant network bridge (br-isolated), which is a regular ovs nic, is provided in the bridge mapping. When applying the bridge mapping and restarting the openvswitch, all the nics are loosing IP. This has to be investigated further from ovs/neutron SMEs. Maxim, I will leave it to you verify the change and to decide whether we should continue with this bug or open a new bug for further investigation.
I just tried to deploy an environment without dpdk config at all, where datacentre physnet were mapped to br-isolated bridge. The deployment succeeded.
As previously discussed with Saravanan, I checked the following. Deployed previously working scenario - single dpdk port deployment. On overcloud deploy command I have changed the neutron bridge mappings to the same as showed in Saravanan comment (#2): --neutron-bridge-mappings datacentre:br-isolated,dpdk:br-link instead of regular: --neutron-bridge-mappings datacentre:br-ex,dpdk:br-link The result was the same as on the two port deployment. During the PostDeploySteps, once openvswitch restarted, compute node nics loosing IP. Looks like the issue is happening once we are using dpdk templates mapping.
(In reply to Maxim Babushkin from comment #4) > Looks like the issue is happening once we are using dpdk templates mapping. To be more specific: Mapping contains DPDK bridges only - No issue Mapping contains DPDK bridge and a non existent bridge - No issue Mapping contains DPDK bridge and non-DPDK bridge - Issue
See Slide 34 of: https://docs.google.com/presentation/d/1FsR7dfydSfYE7l_01nDVnlOAuo7vB2XZYZF1y62M4Ck/edit#slide=id.g132a4086ba_69_0 The compute node management/infrastructure interfaces are not connected to OVS, but are Linux regular interfaces (kernel bridge/bond). By design, when using OVS-DPDK, all bridges have to be OVS-DPDK, meaning that any interface connected to OVS-DPDK cannot be shared with the kernel. Another way to understand this is that when using OVS-DPDK, we can (should?) unload OVS kernel module.
We tried in the Maxim's environment to deploy with Linux Bridge for br-isolated, but the deployment itself is failing. Maxim's environment is all in one interface (single-nic network isolation). As this is not validated with Linux Bridge, I have tried the same in another environment. New environment has 1 provisioning interface + 1 Linux Bridge + 1 OVS DPDK Bridge. With this environment, the deployment is successful, but VM creation with tenant network is failing, with port binding error. We got several error messages like below on neutron openvswitch agent log on the compute node. 2016-11-08 13:28:09.073 40119 ERROR neutron.agent.ovsdb.impl_idl TimeoutException: Commands [SetControllerCommand(bridge=br-int, targets=['tcp:127.0.0.1:6633'])] exceeded timeout 10 seconds post-commit Two issues identified from this bz: 1) Deployment fails in Maxim's env when DPDK and non-DPDK networks are added in the bridge mapping (all openvswitch bridges) 2) VM Creation fails on tenant network with Linux Bridge for network isolation and OVS bridge for DPDK
Created attachment 1218575 [details] compute-node-openvswitch-agent-vm-create-fail
Comment on attachment 1218575 [details] compute-node-openvswitch-agent-vm-create-fail New environment has 1 provisioning interface + 1 Linux Bridge + 1 OVS DPDK Bridge. With this environment, the deployment is successful, but VM creation with tenant network is failing, with port binding error. We got several error messages like below on neutron openvswitch agent log on the compute node. 2016-11-08 13:28:09.073 40119 ERROR neutron.agent.ovsdb.impl_idl TimeoutException: Commands [SetControllerCommand(bridge=br-int, targets=['tcp:127.0.0.1:6633'])] exceeded timeout 10 seconds post-commit
Saravanan, can you check that ovs-vswitchd is launched? 2016-11-08 13:28:09.073 40119 ERROR neutron.agent.ovsdb.impl_idl TimeoutException: Commands [SetControllerCommand(bridge=br-int, targets=['tcp:127.0.0.1:6633'])] exceeded timeout 10 seconds post-commit => to me it seems that it hangs/crash
(In reply to Franck Baudin from comment #10) > Saravanan, can you check that ovs-vswitchd is launched? > > 2016-11-08 13:28:09.073 40119 ERROR neutron.agent.ovsdb.impl_idl > TimeoutException: Commands [SetControllerCommand(bridge=br-int, > targets=['tcp:127.0.0.1:6633'])] exceeded timeout 10 seconds post-commit > > => to me it seems that it hangs/crash ovs-vswitchd process is running. All the pid files are present. But there is an error log in the OVS. I will attach the complete logs in the attachment. 2016-11-09T12:10:20.995Z|00001|ofproto_dpif_upcall(handler1)|INFO|received packet on unassociated datapath port 0 2016-11-09T12:10:20.996Z|00016|bridge|INFO|bridge br-isol: added interface br-isol on port 65534 2016-11-09T12:10:20.997Z|00017|bridge|INFO|bridge br-isol: using datapath ID 0000ea2186ad8b42 2016-11-09T12:10:20.997Z|00018|connmgr|INFO|br-isol: added service controller "punix:/var/run/openvswitch/br-isol.mgmt" 2016-11-09T12:10:21.076Z|00019|dpif|WARN|system@ovs-system: failed to add br-isol as port: File exists 2016-11-09T12:10:21.078Z|00020|bridge|INFO|bridge br-isol: added interface br-isol on port 65534
2016-11-09T12:10:21.076Z|00019|dpif|WARN|system@ovs-system: failed to add br-isol as port: File exists So we try to create a port that is already there. When moving a node from a regular OVS role into an OVS-DPDK role, the OVSDB has to be rested first, as we re-create all ports/bridges.
It is verified and closed But port mix in OVS cause perf degradation.. not recommended see NFV config guide