Bug 1719125

Summary: [downstream clone - 4.3.5] lshw can take more than 15 seconds to execute depending on the system
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: vdsmAssignee: Edward Haas <edwardh>
Status: CLOSED ERRATA QA Contact: Michael Burman <mburman>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.8CC: dholler, lleistne, lsurette, mburman, mtessun, srevivo, ycui
Target Milestone: ovirt-4.3.5Keywords: ZStream
Target Release: 4.3.5   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.30.19 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1709628 Environment:
Last Closed: 2019-08-12 11:53:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1709628    
Bug Blocks:    

Description RHV bug bot 2019-06-11 06:23:15 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1709628 +++
======================================================================

Description of problem:

This is seen very frequently in vdsm logs during host monitoring:

vdsm[121675]: WARN Worker blocked: <Worker name=periodic/82 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor object at 0x7f94e443fe50> at 0x7f94e443fe90> timeout=15,
                                         File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
                                           self.__bootstrap_inner()
                                         File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
                                           self.run()
                                         File: "/usr/lib64/python2.7/threading.py", line 765, in run
                                           self.__target(*self.__args, **self.__kwargs)
                                         File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run
                                           ret = func(*args, **kwargs)
                                         File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
                                           self._execute_task()
                                         File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
                                           task()
                                         File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
                                           self._callable()
                                         File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 220, in __call__
                                           self._func()
                                         File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 576, in __call__
                                           sample = HostSample(self._pid)
                                         File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 240, in __init__
                                           self.interfaces = _get_interfaces_and_samples()
                                         File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 200, in _get_interfaces_and_samples
                                           for link in ipwrapper.getLinks():
                                         File: "/usr/lib/python2.7/site-packages/vdsm/network/ipwrapper.py", line 267, in getLinks
                                           in six.viewitems(dpdk.get_dpdk_devices()))
                                         File: "/usr/lib/python2.7/site-packages/vdsm/network/link/dpdk.py", line 47, in get_dpdk_devices
                                           dpdk_devices = _get_dpdk_devices()
                                         File: "/usr/lib/python2.7/site-packages/vdsm/common/cache.py", line 41, in __call__
                                           value = self.func(*args)
                                         File: "/usr/lib/python2.7/site-packages/vdsm/network/link/dpdk.py", line 111, in _get_dpdk_devices
                                           devices = _lshw_command()
                                         File: "/usr/lib/python2.7/site-packages/vdsm/network/link/dpdk.py", line 123, in _lshw_command
                                           rc, out, err = cmd.exec_sync(['lshw', '-json'] + filterout_cmd)
                                         File: "/usr/lib/python2.7/site-packages/vdsm/network/cmd.py", line 38, in exec_sync
                                           retcode, out, err = exec_sync_bytes(cmds)
                                         File: "/usr/lib/python2.7/site-packages/vdsm/common/cmdutils.py", line 156, in exec_cmd
                                           out, err = p.communicate()
                                         File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 924, in communicate
                                           stdout, stderr = self._communicate(input, endtime, timeout)
                                         File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1706, in _communicate
                                           orig_timeout)
                                         File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1779, in _communicate_with_poll
                                           ready = poller.poll(self._remaining_time(endtime))

It is because the lshw command used to get the network devices can taker a longer time apparently depending on the number of devices present on the system.
Even with the improvements from "dpdk: update lshw command", which filters out several classes of devices, it can still take more than 15s on some systems.

$ time lshw -disable usb -disable pcmcia -disable isapnp -disable ide -disable scsi -disable dmi -disable memory -disable cpuinfo
real    0m23.723s    
user    0m23.521s    
sys     0m0.176s  

Version-Release number of selected component (if applicable):
vdsm-4.20.43-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
# ip link add name dummy type dummy
# for i in {1..300} ; do ip link add link dummy type macvtap; done
# time lshw -disable usb -disable pcmcia -disable isapnp -disable ide -disable scsi -disable dmi -disable memory -disable cpuinfo

Actual results:
lshw time goes up with the number of devices (such as VM vnets), it can take more than 15s

Expected results:
Perhaps something can be done more efficiently by  _get_dpdk_devices? Or allow for a greater time...

(Originally by Germano Veit Michel)

Comment 4 Michael Burman 2019-06-17 05:27:17 UTC
Verified on - vdsm-4.30.19-1.el7ev.x86_64

Comment 6 errata-xmlrpc 2019-08-12 11:53:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2432