| Summary: | DPDK-OVS-vlan bridge setup; when nic bind to dpdk driver there is no connectivity between hosts | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> | ||||||
| Component: | openvswitch | Assignee: | Flavio Leitner <fleitner> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | Ofer Blaut <oblaut> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 8.0 (Liberty) | CC: | aconole, apevec, chrisw, edannon, ekuris, fbaudin, jhsiao, mlopes, pmatilai, rhos-maint, rkhan, srevivo, tfreger | ||||||
| Target Milestone: | --- | Keywords: | Regression, ZStream | ||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-04-21 05:26:17 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Can you post the results of: * ovs-vsctl list Bridge * ovs-vsctl list Interface * ip l Thanks Created attachment 1146449 [details]
ovs
Attached file with all output that you asked It sounds like you have a stale interfaces in the OVS DB. I wonder if the host crashed or didn't shutdown properly which might lead to that. Any chance to stop OSP services, delete all bridges and then restart the host? That would make sure if you have everything fresh. thanks, fbl fbl , I saw the issue in fresh setup if you want we can set debug session I will show you the setup . Eran, thanks for that command output - it is most helpful.
Please send the output of the following script:
#!/bin/sh
FILES=("br-int" "br-vlan")
FILE_SUFFIXES=(".mgmt" ".snoop")
FILE_PATHS=("/var/run/openvswitch" "/usr/var/run/openvswitch" "/usr/local/var/run/openvswitch")
found=0
for file in ${FILES[@]}; do
echo -n "Checking for files related to $file... "
for suffix in ${FILE_SUFFIXES[@]}; do
for path in ${FILE_PATHS[@]}; do
if [ -f "${path}/${file}${suffix}" ]; then
echo -n "found file for $file"
found=$(( $found + 1 ))
fi
done
done
echo " ... done"
done
expected=${#FILES[@]}
expected=$(( $expected * ${#FILE_SUFFIXES[@]} ))
if [ "$found" != "$expected" ]; then
echo "E: Suspicous mismatch of files (${found} vs ${expected})"
exit 1
fi
echo "I: Files seem to be in order"
exit 0
Could you please provide a sosreport? It seems like "enp5s0f1" was added to the bridge but it is the one bound to DPDK (dpdk0 interface), right? So that would mean an incorrect installation steps, though the document looks good. One improvement on that doc would be to use driverctl: http://people.redhat.com/~pmatilai/dpdk-guide/setup/binding.html#vfio Thanks! [root@puma48 ~]# ./bug.sh Checking for files related to br-int... ... done Checking for files related to br-vlan... ... done E: Suspicous mismatch of files (0 vs 4) [root@puma48 ~]# ./bug.sh Checking for files related to br-int... ... done Checking for files related to br-vlan... ... done E: Suspicous mismatch of files (0 vs 4) sosreport attached In few days I'm leaving for long vacation. your contact from Neutrn QE is edannon Created attachment 1148049 [details]
sosreport
The sosreport is missing the openvswitch module, not sure if it wasn't enabled or didn't work. Anyway, it seems we are missing the unix sockets in /var/run/openvswitch or the tool is looking somewhere else. Could you check if you have that directory and what is inside of it? Do you have ovs tools installed somewhere else besides the ones provided by the RPM package? Most probably a local compiled OVS would go to another path and search for the unix sockets in another place as well. I told by email but for completeness, the stale interface "enp5s0f1" likely is the one bound to DPDK. So, it shouldn't have been included in OVS at all, please remove it. Also that if it was used by the kernel, chances are that it might not work when moved to DPDK. Therefore, I'd recommend to create an ifcfg- file for it disabling the interface: ONBOOT=no NM_CONTROLLED=no, then reboot, double check if the interface is listed but in 'DOWN' state, then bind to DPDK, start OVS and so on. Thanks! Flavio I dont have this directory /var/run/openvswitch . which is strange : [root@puma48 ~]# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |grep -i openvswitch # '/var/run/openvswitch' is the default value vhostuser_socket_dir = /var/run/openvswitch About your comment that the stale interface "enp5s0f1" likely is the one bound to DPDK. So, it shouldn't have been included in OVS at , I'm not sure why I need to remove it from the bridge , otherwise how the nodes will communicate ? the dir is exist after I restart the services : [root@puma48 ~]# cd /var/run/openvswitch [root@puma48 openvswitch]# ll total 8 srwx------ 1 root qemu 0 Apr 19 08:05 br-int.mgmt srwx------ 1 root qemu 0 Apr 19 08:05 br-int.snoop srwx------ 1 root qemu 0 Apr 19 08:05 br-vlan.mgmt srwx------ 1 root qemu 0 Apr 19 08:05 br-vlan.snoop srwx------ 1 root qemu 0 Apr 19 08:05 db.sock srwx------ 1 root qemu 0 Apr 19 08:05 ovsdb-server.54708.ctl -rw-r--r-- 1 root qemu 6 Apr 19 08:05 ovsdb-server.pid srwx------ 1 root qemu 0 Apr 19 08:05 ovs-vswitchd.54724.ctl -rw-rw-r-- 1 root qemu 6 Apr 19 08:05 ovs-vswitchd.pid srwxrwxr-x 1 root qemu 0 Apr 19 08:05 vhu3cdf6fdc-9b srwxrwxr-x 1 root qemu 0 Apr 19 08:05 vhuec23fdc2-ad [root@puma48 ~]# grep -ir "/var/run/openvswitch/" /var/log/ /var/log/messages:Apr 19 08:04:51 puma48 neutron-openvswitch-agent: 2016-04-19 08:04:51.881 2960 ERROR neutron.agent.linux.async_process [-] Process [ovsdb-client monitor Interface name,ofport,external_ids --format=json] dies due to the error: ovsdb-client: failed to connect to "unix:/var/run/openvswitch/db.sock" (No such file or directory) /var/log/messages:Apr 19 08:05:22 puma48 neutron-openvswitch-agent: 2016-04-19 08:05:22.022 2960 ERROR neutron.agent.linux.async_process [-] Error received from [ovsdb-client monitor Interface name,ofport,external_ids --format=json]: ovsdb-client: failed to connect to "unix:/var/run/openvswitch/db.sock" (No such file or directory) /var/log/messages:Apr 19 08:05:22 puma48 neutron-openvswitch-agent: 2016-04-19 08:05:22.023 2960 ERROR neutron.agent.linux.async_process [-] Process [ovsdb-client monitor Interface name,ofport,external_ids --format=json] dies due to the error: ovsdb-client: failed to connect to "unix:/var/run/openvswitch/db.sock" (No such file or directory) /var/log/messages:Apr 19 08:05:47 puma48 ovs-ctl: VHOST_CONFIG: bind to /var/run/openvswitch/vhuec23fdc2-ad /var/log/messages:Apr 19 08:05:47 puma48 ovs-ctl: VHOST_CONFIG: bind to /var/run/openvswitch/vhu3cdf6fdc-9b [root@puma48 openvswitch]# grep -ir "/var/run/openvswitch/" /var/log/openvswitch/ /var/log/openvswitch/ovs-vswitchd.log:2016-04-19T05:05:46.627Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... /var/log/openvswitch/ovs-vswitchd.log:2016-04-19T05:05:46.631Z|00007|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected /var/log/openvswitch/ovs-vswitchd.log:2016-04-19T05:05:47.282Z|00018|dpdk|INFO|Socket /var/run/openvswitch/vhuec23fdc2-ad created for vhost-user port vhuec23fdc2-ad /var/log/openvswitch/ovs-vswitchd.log:2016-04-19T05:05:47.288Z|00021|dpdk|INFO|Socket /var/run/openvswitch/vhu3cdf6fdc-9b created for vhost-user port vhu3cdf6fdc-9b /var/log/openvswitch/ovs-vswitchd.log:2016-04-19T05:05:47.289Z|00024|connmgr|INFO|br-vlan: added service controller "punix:/var/run/openvswitch/br-vlan.mgmt" /var/log/openvswitch/ovs-vswitchd.log:2016-04-19T05:05:47.324Z|00026|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" (In reply to Eran Kuris from comment #13) > Flavio I dont have this directory /var/run/openvswitch . > which is strange : > [root@puma48 ~]# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini |grep > -i openvswitch > # '/var/run/openvswitch' is the default value > vhostuser_socket_dir = /var/run/openvswitch OVS keeps the sockets there, so if that directory disappears, OVS won't work well. That directory is managed by the openvswitch systemd service. The summary says: """ 2016-04-11 16:22:12.534 2960 ERROR neutron.agent.common.ovs_lib [req-8698ad69-e997-4b7d-a260-53ff226be25e - - - - -] Unable to execute ['ovs-ofctl', 'dump-flows', 'br-int', 'table=23']. Exception: Command: ['ovs-ofctl', 'dump-flows', 'br-int', 'table=23'] """ Looking at the journal I see this: Apr 11 16:22:05 puma48.scl.lab.tlv.redhat.com systemd[1]: Starting Open vSwitch Internal Unit... Apr 11 16:22:06 puma48.scl.lab.tlv.redhat.com ovs-ctl[4489]: Starting ovsdb-server [ OK ] Apr 11 16:22:06 puma48.scl.lab.tlv.redhat.com ovs-ctl[4489]: Starting ovs-vswitchd 2016-04-11T13:22:06Z|00001|dpdk|INFO|No -vhost_sock_dir provided - defaulting to /var/run/openvswitch [...] <the event reported happens here> [...] pr 11 16:22:12 puma48.scl.lab.tlv.redhat.com ovs-ctl[4489]: Enabling remote OVSDB managers [ OK ] Apr 11 16:22:12 puma48.scl.lab.tlv.redhat.com systemd[1]: Started Open vSwitch Internal Unit. Apr 11 16:22:12 puma48.scl.lab.tlv.redhat.com systemd[1]: Starting Open vSwitch... Apr 11 16:22:12 puma48.scl.lab.tlv.redhat.com systemd[1]: Started Open vSwitch. So, the errors seem to be a consequence of running commands while OVS wasn't ready. > About your comment that the stale interface "enp5s0f1" likely is the one > bound to DPDK. So, it shouldn't have been included in OVS at , I'm not > sure why I need to remove it from the bridge , otherwise how the nodes will > communicate ? You can't use enp5s0f1 and dpdk0 if they are the same NIC. If you bound enp5s0f1 to DPDK, it is now called dpdk0 and you can't use enp5s0f1 anymore. Vice-versa is true. You can't use the same interface with both DPDK and kernel. That is most probably the reason for the communication issue in the host. You need to remove enp5s0f1 from the OVSDB, use ifcfg-enp5s0f1 to not bring up the device, then reboot. The interface should be listed by 'ip link' but it must be in DOWN state. After that, you need to bind the interface to DPDK (which takes the interface out of kernel's control) and only then start openvswitch-dpdk. BTW, could you check if after a clean boot, you still see DMAR messages in the logs? If yes, you might want to boot passing 'iommu=pt' to work around the issue which could be related to the connectivity issue as well. Eyal , the setup is yours now so please provide to Flavio the details . I tried what you wrote, I'm still having the same issue, about the DMAR messages, after a clean reboot with the configuration you suggested they disappeared. Flavio Leitner found out we are using "uio_pci_generic" driver for the dpdk interface, we are supporting "vfio-pci" driver, and while using this driver the issue is gone. Eyal, Can we close the issue then? Flavio, Others will do the same thing in future. Uio vs vfio. Can we put a big giant warning in dmesg or somewhere else if we see unsupported uio? Just a thought. If you like the thought, then please start an enhancement request BZ and we will give it to someone in the team Note: the google doc guide mentioned in the description has been published here: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html-single/network_functions_virtualization_configuration_guide/ |
Description of problem: Its look like degradation because it worked for me in the past installed DPDK environment with vlan bridge. Saw that when the physical port is bind to dpdk driver the connectivity between hosts disconnected. When the port is bind and we launch instance it boot without IP address because it cannot connect to DHCP server, When unbind the port and run "dhclient" from the instance it get IP address. Also there is an error in OVS-vsctl "error: "could not open network device enp5s0f1 (No such device)" root@puma48 ~]# ovs-vsctl show 48411fb5-3081-4a79-ba11-19f3a49d7ed1 Bridge br-vlan Port br-vlan Interface br-vlan type: internal Port "enp5s0f1" Interface "enp5s0f1" error: "could not open network device enp5s0f1 (No such device)" Port phy-br-vlan Interface phy-br-vlan type: patch options: {peer=int-br-vlan} Port "dpdk0" Interface "dpdk0" type: dpdk Bridge br-int fail_mode: secure Port "vhu3cdf6fdc-9b" tag: 1 Interface "vhu3cdf6fdc-9b" type: dpdkvhostuser Port br-int Interface br-int type: internal Port int-br-vlan Interface int-br-vlan type: patch options: {peer=phy-br-vlan} ovs_version: "2.4.0" openvswitch log : 2016-04-11 16:22:12.533 2960 ERROR neutron.agent.linux.utils [req-8698ad69-e997-4b7d-a260-53ff226be25e - - - - -] Command: ['ovs-ofctl', 'dump-flows', 'br-int', 'table=23'] Exit code: 1 Stdin: Stdout: Stderr: ovs-ofctl: br-int is not a bridge or a socket 2016-04-11 16:22:12.534 2960 ERROR neutron.agent.common.ovs_lib [req-8698ad69-e997-4b7d-a260-53ff226be25e - - - - -] Unable to execute ['ovs-ofctl', 'dump-flows', 'br-int', 'table=23']. Exception: Command: ['ovs-ofctl', 'dump-flows', 'br-int', 'table=23'] Exit code: 1 Stdin: Stdout: Stderr: ovs-ofctl: br-int is not a bridge or a socket Version-Release number of selected component (if applicable): [root@puma48 ~]# rpm -qa |grep dpdk dpdk-2.2.0-3.el7.x86_64 dpdk-tools-2.2.0-3.el7.x86_64 openvswitch-dpdk-2.4.0-0.10346.git97bab959.3.el7_2.x86_64 [root@puma48 ~]# rpm -qa |grep neutron openstack-neutron-common-7.0.1-15.el7ost.noarch openstack-neutron-7.0.1-15.el7ost.noarch python-neutronclient-3.1.0-1.el7ost.noarch python-neutron-7.0.1-15.el7ost.noarch openstack-neutron-openvswitch-7.0.1-15.el7ost.noarch [root@puma48 ~]# rpm -qa |grep packstack openstack-packstack-7.0.0-0.14.dev1702.g490e674.el7ost.noarch openstack-packstack-puppet-7.0.0-0.14.dev1702.g490e674.el7ost.noarch How reproducible: always Steps to Reproduce: 1.run installation on vlan setup use this guide : https://docs.google.com/document/d/1K_ku6_08ooq46dFLiE7fAJ0ByFdPCb0W_q6kKqF3Y0o/edit# 2. 3. Actual results: Expected results: Additional info: