+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1709628 +++ ====================================================================== Description of problem: This is seen very frequently in vdsm logs during host monitoring: vdsm[121675]: WARN Worker blocked: <Worker name=periodic/82 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor object at 0x7f94e443fe50> at 0x7f94e443fe90> timeout=15, File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 765, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run self._execute_task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 220, in __call__ self._func() File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 576, in __call__ sample = HostSample(self._pid) File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 240, in __init__ self.interfaces = _get_interfaces_and_samples() File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 200, in _get_interfaces_and_samples for link in ipwrapper.getLinks(): File: "/usr/lib/python2.7/site-packages/vdsm/network/ipwrapper.py", line 267, in getLinks in six.viewitems(dpdk.get_dpdk_devices())) File: "/usr/lib/python2.7/site-packages/vdsm/network/link/dpdk.py", line 47, in get_dpdk_devices dpdk_devices = _get_dpdk_devices() File: "/usr/lib/python2.7/site-packages/vdsm/common/cache.py", line 41, in __call__ value = self.func(*args) File: "/usr/lib/python2.7/site-packages/vdsm/network/link/dpdk.py", line 111, in _get_dpdk_devices devices = _lshw_command() File: "/usr/lib/python2.7/site-packages/vdsm/network/link/dpdk.py", line 123, in _lshw_command rc, out, err = cmd.exec_sync(['lshw', '-json'] + filterout_cmd) File: "/usr/lib/python2.7/site-packages/vdsm/network/cmd.py", line 38, in exec_sync retcode, out, err = exec_sync_bytes(cmds) File: "/usr/lib/python2.7/site-packages/vdsm/common/cmdutils.py", line 156, in exec_cmd out, err = p.communicate() File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 924, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1706, in _communicate orig_timeout) File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1779, in _communicate_with_poll ready = poller.poll(self._remaining_time(endtime)) It is because the lshw command used to get the network devices can taker a longer time apparently depending on the number of devices present on the system. Even with the improvements from "dpdk: update lshw command", which filters out several classes of devices, it can still take more than 15s on some systems. $ time lshw -disable usb -disable pcmcia -disable isapnp -disable ide -disable scsi -disable dmi -disable memory -disable cpuinfo real 0m23.723s user 0m23.521s sys 0m0.176s Version-Release number of selected component (if applicable): vdsm-4.20.43-1.el7ev.x86_64 How reproducible: 100% Steps to Reproduce: # ip link add name dummy type dummy # for i in {1..300} ; do ip link add link dummy type macvtap; done # time lshw -disable usb -disable pcmcia -disable isapnp -disable ide -disable scsi -disable dmi -disable memory -disable cpuinfo Actual results: lshw time goes up with the number of devices (such as VM vnets), it can take more than 15s Expected results: Perhaps something can be done more efficiently by _get_dpdk_devices? Or allow for a greater time... (Originally by Germano Veit Michel)
Verified on - vdsm-4.30.19-1.el7ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2432