Description of problem: During the deployment of OVS-DPDK, 'ovs-vsctl show' displays error in the dpdk port binding as: Port "dpdk0" Interface "dpdk0" type: dpdk error: "could not open network device dpdk0 (Address family not supported by protocol)" If the openvswitch is restarted, then the error is gone and DPDK works as expected. To achieve this, a post install script to restart the openvswitch is required for the deployment - https://review.openstack.org/#/c/395431/1/doc/source/advanced_deployment/ovs_dpdk_config.rst@195 But this same problem happens, if the DPDK compute node is restarted, the same error occurs. Again if the openvswitch is restarted, the problem goes off. Need to identify why the restart of openvswitch is required. Version-Release number of selected component (if applicable): openvswitch.x86_64 2.5.0-14.git20160727.el7fdp @rhos-10.0-rhel-7-fast-datapath How reproducible: 100% Steps to Reproduce: Deploy with environment as guided in https://mojo.redhat.com/docs/DOC-1100744 Additional info: All the required kernel args are set with the help of first-boot templates and the compute has been restarted before the actual configuration starts.
Hi Saravanan, I'm not facing this behavior. In my post-install script I'm restarting openvswitch service additional to openvswitch-nonetwork. For me the section of services restart in post-install.yaml looks like the following: systemctl daemon-reload systemctl restart openvswitch-nonetwork systemctl restart openvswitch
(In reply to Maxim Babushkin from comment #1) > Hi Saravanan, > > I'm not facing this behavior. > In my post-install script I'm restarting openvswitch service additional to > openvswitch-nonetwork. > For me the section of services restart in post-install.yaml looks like the > following: > > systemctl daemon-reload > systemctl restart openvswitch-nonetwork > systemctl restart openvswitch The reason for this BZ is to undertand why do we need to restart open-vswitch in the post install. As the compute has been already rebooted after all the configuration changes, it should work when puppet enables openvswitch. In my setup, when i restart the compute node (after the complete deployment), the out of "ovs-vsctl show" displays as below: Bridge br-link Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port phy-br-link Interface phy-br-link type: patch options: {peer=int-br-link} Port "dpdk0" Interface "dpdk0" type: dpdk error: "could not open network device dpdk0 (No such device)" The templates used for the deployment is at - https://github.com/krsacme/tht-dpdk/tree/33651e4c10b28714727435e49c0da9665175d149
(In reply to Maxim Babushkin from comment #1) > systemctl restart openvswitch-nonetwork Please do not add dependencies on openvswitch-nonetwork service because it is for internal use of OVS initialization. > systemctl restart openvswitch That should be enough if you need to restart OVS. "error: "could not open network device dpdk0 (No such device)" That means the DPDK port wasn't available when OVS started. You need to have all physical DPDK ports available when OVS initializes. That behavior is changing in upstream, so OVS 2.7 most probably can hotplug DPDK ports. The questions then are how are you binding the NIC and when?
From the tests: systemctl restart openvswitch is enough to restart the OVS. The restart of openvswitch should be located on the post-install.yaml script. If trying to restart the openvswitch service on the first-boot.yaml script, I get the following error. /var/log/openvswitch/ovs-vswitchd.log netdev|WARN|could not create netdev dpdk0 of unknown type dpdk bridge|WARN|could not open network device dpdk0 (Address family not supported by protocol) netdev|WARN|could not create netdev dpdk1 of unknown type dpdk bridge|WARN|could not open network device dpdk1 (Address family not supported by protocol)
The bind done with the heat templates, by using the 'ovs_dpdk_port' within the ovs dpdk bridge. First-boot.yaml file contains some workaround that changes the openvswitch service so the instance will be able to boot up. The actual bind happen after the first reboot of the compute node, when post-install.yaml script restart the openvswitch service.
(In reply to Maxim Babushkin from comment #5) > The actual bind happen after the first reboot of the compute node, when > post-install.yaml script restart the openvswitch service. OK, so it reboots, then OVS is started by default, Neutron binds the NIC and OVS is restarted. Is that correct? I've found this NeutronDpdkDriverType: "vfio-pci", but I can't tell when and how the NIC is being configured.
BTW, the opevswitch-nonetwork.service used in [1] doesn't exist in 22.git20160727.el7fdp due to bz#1397049. [1] https://github.com/krsacme/tht-dpdk/blob/33651e4c10b28714727435e49c0da9665175d149/first-boot.yaml#L48 We have one service per daemon now, so that needs to be updated.
(In reply to Flavio Leitner from comment #6) > (In reply to Maxim Babushkin from comment #5) > > The actual bind happen after the first reboot of the compute node, when > > post-install.yaml script restart the openvswitch service. > > OK, so it reboots, then OVS is started by default, Neutron binds the NIC and > OVS is restarted. Is that correct? > > I've found this NeutronDpdkDriverType: "vfio-pci", but I can't tell when and > how the NIC is being configured. The sequence of steps AFAIK a) The vfio-pci/igb_uio driver will be bind to the DPDK nic by os-net-config [1] b) the first-boot scripts [2] will run. This script will perform a reboot. c) The DPDK_OPTIONS in /etc/sysconfig/openvswitch will be set by puppet [3] and openvswitch service shall be enabled. After step c) we still observe ""error: "could not open network device dpdk0 (No such device)"". As a workaround we've restarted openvswitch. [1] https://github.com/openstack/os-net-config/blob/35823f261506f9256c1a227dd4a2770a0508c62d/os_net_config/utils.py#L180 [2] https://github.com/krsacme/tht-dpdk/blob/33651e4c10b28714727435e49c0da9665175d149/first-boot.yaml#L48 [3] https://github.com/openstack/puppet-vswitch/blob/master/manifests/dpdk.pp#L67
OK, most probably udev is racing with openvswitch service. Could you try patching the ovs-vswitchd.service? --- ovs-vswitchd.service.bk 2016-12-02 15:19:09.363393965 -0200 +++ ovs-vswitchd.service 2016-12-02 15:19:32.968918348 -0200 @@ -1,6 +1,7 @@ [Unit] Description=Open vSwitch Forwarding Unit -After=ovsdb-server.service +Wants=systemd-udev-settle.service +After=ovsdb-server.service systemd-udev-settle.service Requires=ovsdb-server.service ReloadPropagatedFrom=ovsdb-server.service AssertPathIsReadWrite=/var/run/openvswitch/db.sock Thanks
Karthik, As told by Flavio, can you check this scenario by taking the patch (udev) and check by removing the explicit re-start of openVswitchd. Ideally if this works the post install script that goes to Deepthi should not have any restarts :). Regards Vijay.
(In reply to Flavio Leitner from comment #9) > OK, most probably udev is racing with openvswitch service. > > Could you try patching the ovs-vswitchd.service? > > --- ovs-vswitchd.service.bk 2016-12-02 15:19:09.363393965 -0200 > +++ ovs-vswitchd.service 2016-12-02 15:19:32.968918348 -0200 > @@ -1,6 +1,7 @@ > [Unit] > Description=Open vSwitch Forwarding Unit > -After=ovsdb-server.service > +Wants=systemd-udev-settle.service > +After=ovsdb-server.service systemd-udev-settle.service > Requires=ovsdb-server.service > ReloadPropagatedFrom=ovsdb-server.service > AssertPathIsReadWrite=/var/run/openvswitch/db.sock > > > Thanks Did Bengaluru Team tested the patch?
Checking the patch right now.
The patch suggested by Flavio match 2.5.0-22 version, but for 2.5.0-14 can't be matched. For the 2.5.0-22 version, the patch doesn't work. I tried to match the patch for the 2.5.0-14 version by implementing the changes on the openvswitch-nonetwork.service. But without success.
> That means the DPDK port wasn't available when OVS started. You need to have all physical DPDK ports available when OVS initializes. Thanks Flavio for the pointer, the issue is that the openvswitch is not restarted after modifying the /etc/sysconfig/openvswitch file with DPDK_OPTIONS. After adding the puppet code, it is working. I have raised the review upstream. https://review.openstack.org/#/c/409779/
This bug refers to 2 issues. 1) DPDK port is not up during the deployment - for which above review will address 2) DPDK port is not up, if we restart compute node, after a successful deployment - This issue is still present. We need to investigate it.
The DPDK error mentioned in ovs-vsctl show after a reboot is Port "dpdk0" Interface "dpdk0" type: dpdk error: "could not open network device dpdk0 (No such device)"
As Saravanan mentioned, the errors were seen in 2 cases. An update on the 2nd case. [heat-admin@overcloud-compute-0 ~]$ cat /usr/lib/systemd/system/openvswitch-nonetwork.service [Unit] Description=Open vSwitch Internal Unit After=syslog.target systemd-udev-settle.service PartOf=openvswitch.service Wants=openvswitch.service systemd-udev-settle.service [Service] Type=oneshot RemainAfterExit=yes EnvironmentFile=-/etc/sysconfig/openvswitch ExecStart=/usr/share/openvswitch/scripts/ovs-ctl start \ --system-id=random $OPTIONS ExecStop=/usr/share/openvswitch/scripts/ovs-ctl stop RuntimeDirectory=openvswitch RuntimeDirectoryMode=0775 Group=qemu UMask=0002 After using the above openvswitch-nonetwork.service file in ovs 2.5.0.14, with the changes suggested by Flavio, we are not able to reproduce this issue. (Attempted 100 times). I think this issue needs to be addressed in OVS.
Let's keep this bug for just the initial work. I have opened a new bug which will be used to track any backport effort to 2.6