Description of problem: While deployment on environments with 200G+ hugepages, ovs-vswitchd is known to take longer time that systemd timeout (1.5mins) . Large hugepages like 400G is usually known to take 5mins for ovs-dpdk. https://mail.openvswitch.org/pipermail/ovs-git/2017-July/019944.html Another concern is that this is irrespective of how much the dpdk configuration uses. Like 4G hugepage for ovs-dpdk will also take the same time when using 200G hugepage. The problem is that map_all_hugepages() would map all free huge pages, and then select the proper ones. If I have 500 free huge pages (each 1G), and application only needs 1G per NUMA socket, it is unreasonable for such mapping. http://dpdk.org/ml/archives/dev/2017-September/074621.html Version-Release number of selected component (if applicable): RHOS How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I will raise a new bug for map_all_hugepages() mapping all free huge pages
# for i in openvswitch.service ovs-vswitchd.service ; do systemctl show $i | grep -i timeout ; done TimeoutStartUSec=0 TimeoutStopUSec=1min 30s JobTimeoutUSec=0 JobTimeoutAction=none TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s JobTimeoutUSec=0 JobTimeoutAction=none
$ git remote -v origin https://github.com/openvswitch/ovs.git (fetch) $ git tag --contains c1c69e8a45ead25f4309ec3d340c805a10bcae79 v2.8.0 v2.8.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2102
Workaround for older versions of OVS. Add this to firstboot.yaml: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html-single/advanced_overcloud_customization/index#sect-Customizing_Configuration_on_First_Boot ~~~ mkdir /etc/systemd/system/ovs-vswitchd.service.d cat<<'EOF'>/etc/systemd/system/ovs-vswitchd.service.d/timeout.conf [Service] TimeoutSec=300 EOF systemctl daemon-reload ~~~ Verification: ~~~ [root@overcloud-compute-0 ~]# systemctl show ovs-vswitchd | grep -i timeout TimeoutStartUSec=5min TimeoutStopUSec=5min DropInPaths=/etc/systemd/system/ovs-vswitchd.service.d/timeout.conf JobTimeoutUSec=0 JobTimeoutAction=none ~~~