Bug 1398572 - hostdev listing takes large amount of time
Summary: hostdev listing takes large amount of time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.0.5
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-4.1.0-beta
: ---
Assignee: Francesco Romani
QA Contact: guy chen
URL:
Whiteboard:
Depends On:
Blocks: 1405802
TreeView+ depends on / blocked
 
Reported: 2016-11-25 10:30 UTC by Martin Polednik
Modified: 2020-04-15 14:59 UTC (History)
11 users (show)

Fixed In Version: v4.19.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1405802 (view as bug list)
Environment:
Last Closed: 2017-04-25 00:52:54 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:0998 0 normal SHIPPED_LIVE VDSM bug fix and enhancement update 4.1 GA 2017-04-18 20:11:39 UTC
oVirt gerrit 67572 0 None MERGED hostdev: cache parsed device tree 2020-11-12 18:13:46 UTC

Description Martin Polednik 2016-11-25 10:30:56 UTC
Description of problem:
Hostdev list_by_caps method can run for large amount of time if the number of storage devices on a host is high (~1000).

Version-Release number of selected component (if applicable):
VDSM tag v4.18.15.3

How reproducible:
100%

Steps to Reproduce:
1. Acquire host with ~1000+ storage devices (or mock such environment),
2. try to run vdsm-restore-net-config or any action that requires refresh of host devices

Actual results:
The call takes hours to finish.

Expected results:
The call is executed within reasonable timeframe.

Additional info:
Caused due to ineffective algorithm for storage device construction.

Comment 2 Dan Kenigsberg 2016-11-25 11:19:08 UTC
can you attach the logs showing how much time is eaten by _restore_sriov_numvfs()? Do you suggest that the problem is in libvirt's listAllDevices()?

it was introduced in 3.6. At worse, we can add a configurable to disable it for people who do not care for sr-iov.

Comment 3 Martin Polednik 2016-11-29 14:39:28 UTC
Sorry, no exact figure for the specific function; that being said, the problem is in list_by_caps logic and it's introduced by the addition of proper scsi parsing. Currently, the whole tree is parsed in O(n) libvirt calls with O(n^2) passes over the tree. It wasn't really designed for such a large amount of disks.

To back up my statement, I've added a VDSM test that parses ~3000 devices and off-line tested ~30000 devices (most of them are storage, leading to worst case performance). Without any code improvements, I've interrupted the parsing at 8 minutes:

Ran 1 test in 506.604s

^C

real    8m26.956s
user    8m26.572s
sys     0m0.385s

as it'd seemed to easily reproduce what was happening in this bug. After few perf optimizations and bringing the complexity to O(1) libvirt calls & O(n) tree passes, the same number of devices can be parsed in ~.35 seconds:

an 1 test in 0.337s

OK

real    0m0.696s
user    0m0.640s
sys     0m0.056s

and 30000 devices can be parsed in ~3.2 seconds:

Ran 1 test in 3.210s

OK

real    0m3.586s
user    0m3.484s
sys     0m0.100s

Comment 4 Michal Skrivanek 2016-12-15 15:59:01 UTC
more patches under https://gerrit.ovirt.org/#/q/topic:hostdev-caching
all in

Comment 6 Michal Skrivanek 2016-12-15 16:07:46 UTC
reassigning for backport to 4.0.

Comment 7 Francesco Romani 2016-12-16 14:30:59 UTC
this optimization doesn't require doc_text: same behaviour as above, just faster

Comment 8 Francesco Romani 2016-12-16 14:52:18 UTC
if we need this in 4.0.7, then this is not MODIFIED

Comment 10 Michal Skrivanek 2016-12-19 09:59:27 UTC
(In reply to Francesco Romani from comment #8)
> if we need this in 4.0.7, then this is not MODIFIED

it is, this is a 4.1 bug

Comment 11 guy chen 2017-01-19 13:57:05 UTC
I have tested this on latest 4.1 build 4 with 1000 storage devices - fixed :


[root@ucs1-b420-2 ~]# time python /usr/share/vdsm/vdsm-restore-net-config          

real        0m0.228s
user        0m0.151s
sys        0m0.077s
[root@ucs1-b420-2 ~]# time python /usr/share/vdsm/vdsm-restore-net-config

real        0m0.237s
user        0m0.161s
sys        0m0.076s
[root@ucs1-b420-2 ~]# time python /usr/share/vdsm/vdsm-restore-net-config

real        0m0.256s
user        0m0.180s
sys        0m0.077s


Note You need to log in before you can comment on or make changes to this bug.