Bug 1463627 - ovs-vswitchd fails when dpdk is enabled in OSP12 puddle (RHEL7.4) with ovs 2.7
ovs-vswitchd fails when dpdk is enabled in OSP12 puddle (RHEL7.4) with ovs 2.7
Status: ON_QA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
urgent Severity urgent
: rc
: 12.0 (Pike)
Assigned To: Emilien Macchi
Yariv
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-21 07:23 EDT by Saravanan KR
Modified: 2017-11-08 13:47 EST (History)
13 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170805163048.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
private ovs build (4.70 MB, application/x-rpm)
2017-07-05 14:57 EDT, Aaron Conole
no flags Details
Sos report for the private build 2.7.0-9.bz1463627.el7fdb (9.61 MB, application/x-xz)
2017-07-06 09:57 EDT, Karthik Sundaravel
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 478163 None None None 2017-07-17 03:40 EDT

  None (edit)
Description Saravanan KR 2017-06-21 07:23:18 EDT
Description of problem:
ovs-vswitchd process fails when dpdk is enabled with below command. 
  ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true


Version-Release number of selected component (if applicable):
openvswitch-2.7
osp12-puddle (2017-06-19.1)
RHEL7.4 nightly (http://download-node-02.eng.bos.redhat.com/composes/nightly/latest-RHEL-7/compose/Server/x86_64/os/Packages/)


Tried to deploy OSP12 puddle with DPDK. The deployment failed and the compute nodes were not reachable. While trying to reproduce the issue in controller node, found that by enabling dpdk, ovs-vswitcd fails.
LOGS - http://pastebin.test.redhat.com/496216
Comment 1 Flavio Leitner 2017-06-21 10:07:42 EDT
Please attach an sosreport or at least the systemd logs since boot and the logs in /var/run/openvswitch/*

Thanks,
fbl
Comment 2 Saravanan KR 2017-06-22 03:25:30 EDT
(In reply to Flavio Leitner from comment #1)
> Please attach an sosreport or at least the systemd logs since boot and the
> logs in /var/run/openvswitch/*
> 
> Thanks,
> fbl

sosreport is in google drive as its more that 20MB - 
https://drive.google.com/open?id=0B2NDG0wO_XsqcDRETkN2bXlQNTg
Comment 3 Saravanan KR 2017-06-23 02:01:31 EDT
An observation. In the same RHEL7.4 image, instead of OvS2.7, I have tried with OvS2.6 package (from fdp). It is also having the same issue. After initializing dpdk-init=true, restarting of openvswitch fails as ovs-vswitchd service goes to failed stated.
Comment 4 Karthik Sundaravel 2017-06-28 05:08:27 EDT
We could reproduce it in a standalone RHEL7.4 based VM and OvS 2.7.

The RHEL 7.4 vm is obtained from [1] 
And

OvS rpm is obtained from [2] 

[1] http://download-node-02.eng.bos.redhat.com/rel-eng/latest-RHEL-7/compose/Server/x86_64/images/rhel-guest-image-7.4-176.x86_64.qcow2

[2] http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch/2.7.0/8.git20170530.el7fdb/x86_64/openvswitch-2.7.0-8.git20170530.el7fdb.x86_64.rpm

After installation of the openvswitch packages, I started the openvswitch service by doing "systemctl start openvswitch" and did "ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true".
After enabling dpdk, the openvswitch service fails.
Comment 5 Aaron Conole 2017-06-28 10:30:41 EDT
I tried viewing the sosreport, but I found that the tarball was corrupted.  Is it possible to get a complete version?

Just a guess - either hugepage configuration could be wrong, or there could be some kind of hardware issue with the i40e that they have. NOTE - that is a complete guess based on very minimal information.
Comment 6 Aaron Conole 2017-06-28 14:49:08 EDT
It appears that hugepage allocation is
very slow on those machines, and makes the system believe that
ovs-vswitchd has become unresponsive.

However, manually running the ovs-vswitchd:
 cd /var/run/openvswitch && ovs-vswitchd \
    unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err \
    -vfile:info --mlockall --no-chdir \
    --log-file=/var/log/openvswitch/ovs-vswitchd.log \
    --pidfile=/var/run/openvswitch/ovs-vswitchd.pid)

allows the system to come up (notice the lack of --detach in the above
command).  If this is an acceptable workaround for now, please go with
that.  If you need something like systemd integration and all (for
instance because this is for a customer), let me know.  In the meantime
I will work with upstream on a solution that we can include in RHEL.
Comment 7 Saravanan KR 2017-06-30 05:05:49 EDT
We can't manually start vswitchd during the deployment and and without out it deployment fails. We tried removing "--detach" option in /usr/share/openvswitch/scripts/ovs-lib. But it didn't help. Any other alternative to use it with deployment (like file modification)?
Comment 8 Aaron Conole 2017-07-05 14:57 EDT
Created attachment 1294704 [details]
private ovs build
Comment 9 Aaron Conole 2017-07-05 14:58:26 EDT
Attached a private OVS build with a possible remedy.  Please try the attached and let me know.
Comment 10 Karthik Sundaravel 2017-07-06 09:57 EDT
Created attachment 1294973 [details]
Sos report for the private build 2.7.0-9.bz1463627.el7fdb
Comment 11 Karthik Sundaravel 2017-07-06 09:58:19 EDT
We see failure in openvswitch service. Attached the sos report
Comment 12 Aaron Conole 2017-07-06 10:22:18 EDT
That SOS report is corrupted.  But it's okay, enough logs are there.

It looks like after reload, no systemctl daemon-reload was executed.  Not sure if that was due to my changes to the specfile, but if so, apologies.  After running systemctl daemon-reload, and systemctl restart openvswitch, I see the following:

[heat-admin@overcloud-computeovsdpdk-0 ~]$ systemctl status openvswitch
● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled)
   Active: active (exited) since Thu 2017-07-06 10:18:50 EDT; 21s ago
  Process: 35801 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 35801 (code=exited, status=0/SUCCESS)

Jul 06 10:18:50 overcloud-computeovsdpdk-0 systemd[1]: Starting Open vSwitch...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 systemd[1]: Started Open vSwitch.
Hint: Some lines were ellipsized, use -l to show in full.
[heat-admin@overcloud-computeovsdpdk-0 ~]$ systemctl status ovs-vswitchd
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2017-07-06 10:18:50 EDT; 1min 11s ago
  Process: 35591 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS)
  Process: 35677 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random start $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 35703 (ovs-vswitchd)
   CGroup: /system.slice/ovs-vswitchd.service
           └─35703 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:e...

Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL:   probe ...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: PCI devi...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL:   probe ...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: PCI devi...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL:   probe ...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL: PCI devi...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-vswitchd[35703]: EAL:   probe ...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-ctl[35677]: [  OK  ]
Jul 06 10:18:50 overcloud-computeovsdpdk-0 ovs-ctl[35677]: Enabling remote OV...
Jul 06 10:18:50 overcloud-computeovsdpdk-0 systemd[1]: Started Open vSwitch F...
Hint: Some lines were ellipsized, use -l to show in full.
[heat-admin@overcloud-computeovsdpdk-0 ~]$ systemctl status ovsdb-server
● ovsdb-server.service - Open vSwitch Database Unit
   Loaded: loaded (/usr/lib/systemd/system/ovsdb-server.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2017-07-06 10:17:33 EDT; 2min 35s ago
  Process: 35615 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop (code=exited, status=0/SUCCESS)
  Process: 35638 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd --no-monitor --system-id=random start $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 35667 (ovsdb-server)
   CGroup: /system.slice/ovsdb-server.service
           └─35667 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsys...

Jul 06 10:17:33 overcloud-computeovsdpdk-0 systemd[1]: Starting Open vSwitch ...
Jul 06 10:17:33 overcloud-computeovsdpdk-0 ovs-ctl[35638]: Starting ovsdb-ser...
Jul 06 10:17:33 overcloud-computeovsdpdk-0 ovs-vsctl[35668]: ovs|00001|vsctl|...
Jul 06 10:17:33 overcloud-computeovsdpdk-0 ovs-vsctl[35674]: ovs|00001|vsctl|...
Jul 06 10:17:33 overcloud-computeovsdpdk-0 ovs-ctl[35638]: Configuring Open v...
Jul 06 10:17:33 overcloud-computeovsdpdk-0 systemd[1]: Started Open vSwitch D...
Hint: Some lines were ellipsized, use -l to show in full.

Should be all set to go, now?
Comment 13 Saravanan KR 2017-07-17 03:40:02 EDT
We found that the ovs-ctl script file for the permission workaround was patched wrongly, as it did not accommodate ovs2.7 version. After fixing it, we are able to enable DPDK successfully. 

Posted the review for THT
https://review.openstack.org/#/c/478163/

Note You need to log in before you can comment on or make changes to this bug.