Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1540017 - ovs-vswitchd systemd process timesout in environments with large (200G) hugepages
ovs-vswitchd systemd process timesout in environments with large (200G) hugep...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch (Show other bugs)
10.0 (Newton)
All Linux
high Severity high
: async
: 10.0 (Newton)
Assigned To: Aaron Conole
Yariv
: Triaged, ZStream
Depends On:
Blocks: 1540158
  Show dependency treegraph
 
Reported: 2018-01-30 00:38 EST by Jaison Raju
Modified: 2018-06-27 19:35 EDT (History)
8 users (show)

See Also:
Fixed In Version: openvswitch-2.6.1-18.git20180130.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1559571 (view as bug list)
Environment:
Last Closed: 2018-06-27 19:33:21 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3358421 None None None 2018-02-20 07:43 EST
Red Hat Product Errata RHSA-2018:2102 None None None 2018-06-27 19:35 EDT

  None (edit)
Description Jaison Raju 2018-01-30 00:38:46 EST
Description of problem:
While deployment on environments with 200G+ hugepages, ovs-vswitchd is known to take longer time that systemd timeout (1.5mins) .
Large hugepages like 400G is usually known to take 5mins for ovs-dpdk.
https://mail.openvswitch.org/pipermail/ovs-git/2017-July/019944.html

Another concern is that this is irrespective of how much the dpdk configuration uses.
Like 4G hugepage for ovs-dpdk will also take the same time when using 200G hugepage.
The problem is that map_all_hugepages() would map all free huge pages, and then select the proper ones. If I have 500 free huge pages (each 1G), and application only needs 1G per NUMA socket, it is unreasonable for such mapping.

http://dpdk.org/ml/archives/dev/2017-September/074621.html

Version-Release number of selected component (if applicable):
RHOS

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Jaison Raju 2018-01-30 01:16:48 EST
I will raise a new bug for  map_all_hugepages() mapping all free huge pages
Comment 3 Jaison Raju 2018-01-30 09:14:45 EST
# for i in openvswitch.service ovs-vswitchd.service ; do systemctl show $i | grep -i timeout ; done
TimeoutStartUSec=0
TimeoutStopUSec=1min 30s
JobTimeoutUSec=0
JobTimeoutAction=none
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
JobTimeoutUSec=0
JobTimeoutAction=none
Comment 4 Jaison Raju 2018-01-31 04:23:05 EST
$ git remote -v
origin	https://github.com/openvswitch/ovs.git (fetch)
$ git tag --contains c1c69e8a45ead25f4309ec3d340c805a10bcae79
v2.8.0
v2.8.1
Comment 19 errata-xmlrpc 2018-06-27 19:33:21 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2102

Note You need to log in before you can comment on or make changes to this bug.