Bug 1398572
Summary: | hostdev listing takes large amount of time | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Martin Polednik <mpoledni> | |
Component: | vdsm | Assignee: | Francesco Romani <fromani> | |
Status: | CLOSED ERRATA | QA Contact: | guy chen <guchen> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 4.0.5 | CC: | bazulay, danken, fdeutsch, gklein, lsurette, mgoldboi, michal.skrivanek, mpoledni, srevivo, ycui, ykaul | |
Target Milestone: | ovirt-4.1.0-beta | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | v4.19.1 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1405802 (view as bug list) | Environment: | ||
Last Closed: | 2017-04-25 00:52:54 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1405802 |
Description
Martin Polednik
2016-11-25 10:30:56 UTC
can you attach the logs showing how much time is eaten by _restore_sriov_numvfs()? Do you suggest that the problem is in libvirt's listAllDevices()? it was introduced in 3.6. At worse, we can add a configurable to disable it for people who do not care for sr-iov. Sorry, no exact figure for the specific function; that being said, the problem is in list_by_caps logic and it's introduced by the addition of proper scsi parsing. Currently, the whole tree is parsed in O(n) libvirt calls with O(n^2) passes over the tree. It wasn't really designed for such a large amount of disks. To back up my statement, I've added a VDSM test that parses ~3000 devices and off-line tested ~30000 devices (most of them are storage, leading to worst case performance). Without any code improvements, I've interrupted the parsing at 8 minutes: Ran 1 test in 506.604s ^C real 8m26.956s user 8m26.572s sys 0m0.385s as it'd seemed to easily reproduce what was happening in this bug. After few perf optimizations and bringing the complexity to O(1) libvirt calls & O(n) tree passes, the same number of devices can be parsed in ~.35 seconds: an 1 test in 0.337s OK real 0m0.696s user 0m0.640s sys 0m0.056s and 30000 devices can be parsed in ~3.2 seconds: Ran 1 test in 3.210s OK real 0m3.586s user 0m3.484s sys 0m0.100s more patches under https://gerrit.ovirt.org/#/q/topic:hostdev-caching all in reassigning for backport to 4.0. this optimization doesn't require doc_text: same behaviour as above, just faster if we need this in 4.0.7, then this is not MODIFIED (In reply to Francesco Romani from comment #8) > if we need this in 4.0.7, then this is not MODIFIED it is, this is a 4.1 bug I have tested this on latest 4.1 build 4 with 1000 storage devices - fixed : [root@ucs1-b420-2 ~]# time python /usr/share/vdsm/vdsm-restore-net-config real 0m0.228s user 0m0.151s sys 0m0.077s [root@ucs1-b420-2 ~]# time python /usr/share/vdsm/vdsm-restore-net-config real 0m0.237s user 0m0.161s sys 0m0.076s [root@ucs1-b420-2 ~]# time python /usr/share/vdsm/vdsm-restore-net-config real 0m0.256s user 0m0.180s sys 0m0.077s |