Hide Forgot
Description of problem: ovs-vswitchd process fails when dpdk is enabled with below command. ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true Version-Release number of selected component (if applicable): openvswitch-2.7 osp12-puddle (2017-06-19.1) RHEL7.4 nightly (http://download-node-02.eng.bos.redhat.com/composes/nightly/latest-RHEL-7/compose/Server/x86_64/os/Packages/) Tried to deploy OSP12 puddle with DPDK. The deployment failed and the compute nodes were not reachable. While trying to reproduce the issue in controller node, found that by enabling dpdk, ovs-vswitcd fails. LOGS - http://pastebin.test.redhat.com/496216
Please attach an sosreport or at least the systemd logs since boot and the logs in /var/run/openvswitch/* Thanks, fbl
(In reply to Flavio Leitner from comment #1) > Please attach an sosreport or at least the systemd logs since boot and the > logs in /var/run/openvswitch/* > > Thanks, > fbl sosreport is in google drive as its more that 20MB - https://drive.google.com/open?id=0B2NDG0wO_XsqcDRETkN2bXlQNTg
An observation. In the same RHEL7.4 image, instead of OvS2.7, I have tried with OvS2.6 package (from fdp). It is also having the same issue. After initializing dpdk-init=true, restarting of openvswitch fails as ovs-vswitchd service goes to failed stated.
We could reproduce it in a standalone RHEL7.4 based VM and OvS 2.7. The RHEL 7.4 vm is obtained from [1] And OvS rpm is obtained from [2] [1] http://download-node-02.eng.bos.redhat.com/rel-eng/latest-RHEL-7/compose/Server/x86_64/images/rhel-guest-image-7.4-176.x86_64.qcow2 [2] http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch/2.7.0/8.git20170530.el7fdb/x86_64/openvswitch-2.7.0-8.git20170530.el7fdb.x86_64.rpm After installation of the openvswitch packages, I started the openvswitch service by doing "systemctl start openvswitch" and did "ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true". After enabling dpdk, the openvswitch service fails.
I tried viewing the sosreport, but I found that the tarball was corrupted. Is it possible to get a complete version? Just a guess - either hugepage configuration could be wrong, or there could be some kind of hardware issue with the i40e that they have. NOTE - that is a complete guess based on very minimal information.
It appears that hugepage allocation is very slow on those machines, and makes the system believe that ovs-vswitchd has become unresponsive. However, manually running the ovs-vswitchd: cd /var/run/openvswitch && ovs-vswitchd \ unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err \ -vfile:info --mlockall --no-chdir \ --log-file=/var/log/openvswitch/ovs-vswitchd.log \ --pidfile=/var/run/openvswitch/ovs-vswitchd.pid) allows the system to come up (notice the lack of --detach in the above command). If this is an acceptable workaround for now, please go with that. If you need something like systemd integration and all (for instance because this is for a customer), let me know. In the meantime I will work with upstream on a solution that we can include in RHEL.
We can't manually start vswitchd during the deployment and and without out it deployment fails. We tried removing "--detach" option in /usr/share/openvswitch/scripts/ovs-lib. But it didn't help. Any other alternative to use it with deployment (like file modification)?
Created attachment 1294704 [details] private ovs build
Attached a private OVS build with a possible remedy. Please try the attached and let me know.
Created attachment 1294973 [details] Sos report for the private build 2.7.0-9.bz1463627.el7fdb
We see failure in openvswitch service. Attached the sos report
That SOS report is corrupted. But it's okay, enough logs are there. It looks like after reload, no systemctl daemon-reload was executed. Not sure if that was due to my changes to the specfile, but if so, apologies. After running systemctl daemon-reload, and systemctl restart openvswitch, I see the following: [heat-admin@overcloud-computeovsdpdk-0 ~]$ systemctl status openvswitch ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled) Active: active (exited) since Thu 2017-07-06 10:18:50 EDT; 21s ago Process: 35801 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 35801 (code=exited, status=0/SUCCESS) Jul 06 10:18:50 overcloud-computeovsdpdk-0 systemd[1]: Starting Open vSwitch... Jul 06 10:18:50 overcloud-computeovsdpdk-0 systemd[1]: Started Open vSwitch. Hint: Some lines were ellipsized, use -l to show in full. [heat-admin@overcloud-computeovsdpdk-0 ~]$ systemctl status ovs-vswitchd ● ovs-vswitchd.service - Open vSwitch Forwarding Unit Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled) Active: active (running) since Thu 2017-07-06 10:18:50 EDT; 1min 11s ago Process: 35591 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS) Process: 35677 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random start $OPTIONS (code=exited, status=0/SUCCESS) Main PID: 35703 (ovs-vswitchd) CGroup: /system.slice/ovs-vswitchd.service └─35703 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:e... Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: probe ... Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: PCI devi... Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: probe ... Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: PCI devi... Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: probe ... Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: PCI devi... Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: probe ... Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-ctl[35677]: [ OK ] Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-ctl[35677]: Enabling remote OV... Jul 06 10:18:50 overcloud-computeovsdpdk-0 systemd[1]: Started Open vSwitch F... Hint: Some lines were ellipsized, use -l to show in full. [heat-admin@overcloud-computeovsdpdk-0 ~]$ systemctl status ovsdb-server ● ovsdb-server.service - Open vSwitch Database Unit Loaded: loaded (/usr/lib/systemd/system/ovsdb-server.service; static; vendor preset: disabled) Active: active (running) since Thu 2017-07-06 10:17:33 EDT; 2min 35s ago Process: 35615 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop (code=exited, status=0/SUCCESS) Process: 35638 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd --no-monitor --system-id=random start $OPTIONS (code=exited, status=0/SUCCESS) Main PID: 35667 (ovsdb-server) CGroup: /system.slice/ovsdb-server.service └─35667 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsys... Jul 06 10:17:33 overcloud-computeovsdpdk-0 systemd[1]: Starting Open vSwitch ... Jul 06 10:17:33 overcloud-computeovsdpdk-0 ovs-ctl[35638]: Starting ovsdb-ser... Jul 06 10:17:33 overcloud-computeovsdpdk-0 ovs-vsctl[35668]: ovs|00001|vsctl|... Jul 06 10:17:33 overcloud-computeovsdpdk-0 ovs-vsctl[35674]: ovs|00001|vsctl|... Jul 06 10:17:33 overcloud-computeovsdpdk-0 ovs-ctl[35638]: Configuring Open v... Jul 06 10:17:33 overcloud-computeovsdpdk-0 systemd[1]: Started Open vSwitch D... Hint: Some lines were ellipsized, use -l to show in full. Should be all set to go, now?
We found that the ovs-ctl script file for the permission workaround was patched wrongly, as it did not accommodate ovs2.7 version. After fixing it, we are able to enable DPDK successfully. Posted the review for THT https://review.openstack.org/#/c/478163/
Verified with the following THT https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=ospd-12-vlan-dpdk-two-ports-ctlplane-dataplane-bonding;h=e8a7616a6bda3f79e89aa7eccaf9996766f8d366;hb=refs/heads/ci
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462