Red Hat Bugzilla – Bug 1540017
ovs-vswitchd systemd process timesout in environments with large (200G) hugepages
Last modified: 2018-06-27 19:35:37 EDT
Description of problem: While deployment on environments with 200G+ hugepages, ovs-vswitchd is known to take longer time that systemd timeout (1.5mins) . Large hugepages like 400G is usually known to take 5mins for ovs-dpdk. https://mail.openvswitch.org/pipermail/ovs-git/2017-July/019944.html Another concern is that this is irrespective of how much the dpdk configuration uses. Like 4G hugepage for ovs-dpdk will also take the same time when using 200G hugepage. The problem is that map_all_hugepages() would map all free huge pages, and then select the proper ones. If I have 500 free huge pages (each 1G), and application only needs 1G per NUMA socket, it is unreasonable for such mapping. http://dpdk.org/ml/archives/dev/2017-September/074621.html Version-Release number of selected component (if applicable): RHOS How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I will raise a new bug for map_all_hugepages() mapping all free huge pages
# for i in openvswitch.service ovs-vswitchd.service ; do systemctl show $i | grep -i timeout ; done TimeoutStartUSec=0 TimeoutStopUSec=1min 30s JobTimeoutUSec=0 JobTimeoutAction=none TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s JobTimeoutUSec=0 JobTimeoutAction=none
$ git remote -v origin https://github.com/openvswitch/ovs.git (fetch) $ git tag --contains c1c69e8a45ead25f4309ec3d340c805a10bcae79 v2.8.0 v2.8.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2102