Bug 1405802

Summary:	[z-stream clone - 4.0.7] hostdev listing takes large amount of time
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	rhev-integ
Component:	vdsm	Assignee:	Francesco Romani <fromani>
Status:	CLOSED ERRATA	QA Contact:	eberman
Severity:	urgent	Docs Contact:
Priority:	high
Version:	4.0.5	CC:	bazulay, danken, eberman, emarcian, fdeutsch, fromani, gklein, lsurette, mavital, mgoldboi, michal.skrivanek, mpoledni, nashok, nsoffer, rgolan, srevivo, ycui, ykaul
Target Milestone:	ovirt-4.0.7	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1398572	Environment:
Last Closed:	2017-03-16 15:35:47 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Virt	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1398572
Bug Blocks:

Description rhev-integ 2016-12-18 14:41:33 UTC

+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1398572 +++
======================================================================

Description of problem:
Hostdev list_by_caps method can run for large amount of time if the number of storage devices on a host is high (~1000).

Version-Release number of selected component (if applicable):
VDSM tag v4.18.15.3

How reproducible:
100%

Steps to Reproduce:
1. Acquire host with ~1000+ storage devices (or mock such environment),
2. try to run vdsm-restore-net-config or any action that requires refresh of host devices

Actual results:
The call takes hours to finish.

Expected results:
The call is executed within reasonable timeframe.

Additional info:
Caused due to ineffective algorithm for storage device construction.

(Originally by Martin Polednik)

Comment 3 rhev-integ 2016-12-18 14:41:46 UTC

can you attach the logs showing how much time is eaten by _restore_sriov_numvfs()? Do you suggest that the problem is in libvirt's listAllDevices()?

it was introduced in 3.6. At worse, we can add a configurable to disable it for people who do not care for sr-iov.

(Originally by danken)

Comment 4 rhev-integ 2016-12-18 14:41:50 UTC

Sorry, no exact figure for the specific function; that being said, the problem is in list_by_caps logic and it's introduced by the addition of proper scsi parsing. Currently, the whole tree is parsed in O(n) libvirt calls with O(n^2) passes over the tree. It wasn't really designed for such a large amount of disks.

To back up my statement, I've added a VDSM test that parses ~3000 devices and off-line tested ~30000 devices (most of them are storage, leading to worst case performance). Without any code improvements, I've interrupted the parsing at 8 minutes:

Ran 1 test in 506.604s

^C

real    8m26.956s
user    8m26.572s
sys     0m0.385s

as it'd seemed to easily reproduce what was happening in this bug. After few perf optimizations and bringing the complexity to O(1) libvirt calls & O(n) tree passes, the same number of devices can be parsed in ~.35 seconds:

an 1 test in 0.337s

OK

real    0m0.696s
user    0m0.640s
sys     0m0.056s

and 30000 devices can be parsed in ~3.2 seconds:

Ran 1 test in 3.210s

OK

real    0m3.586s
user    0m3.484s
sys     0m0.100s

(Originally by Martin Polednik)

Comment 5 rhev-integ 2016-12-18 14:41:55 UTC

more patches under https://gerrit.ovirt.org/#/q/topic:hostdev-caching
all in

(Originally by michal.skrivanek)

Comment 7 rhev-integ 2016-12-18 14:42:04 UTC

reassigning for backport to 4.0.

(Originally by michal.skrivanek)

Comment 8 rhev-integ 2016-12-18 14:42:09 UTC

this optimization doesn't require doc_text: same behaviour as above, just faster

(Originally by Francesco Romani)

Comment 9 rhev-integ 2016-12-18 14:42:14 UTC

if we need this in 4.0.7, then this is not MODIFIED

(Originally by Francesco Romani)

Comment 10 Francesco Romani 2017-01-11 11:09:07 UTC

all patches merged, hence moving to MODIFIED.

The backport was quite large (~20 patches), and affected the general hostdev code. However, most if not all of the patches are either trivial or simple backports, so the risk is still low(ish).

The code is very similar or identical to the master branch, where it received already a fair amount of testing.

Backporting the code helps keeping the trees as similar as possible, facilitating further backports.

Without the backport, fixing the old code would had required to rewrite the patch heavily, increasing the risk.

For all of the above reasons we took the large backport.

Comment 11 Francesco Romani 2017-01-16 08:53:36 UTC

no need for doc_text: the code should behave as before, just faster.

Comment 20 eberman 2017-02-23 15:17:45 UTC

HI , when  trying to add disks to a Vm :

post with /api/vms/4c9dfc81-b088-4a28-a4e0-0003558f6a53/diskattachments
 body of post is
 <disk_attachment>
   <bootable>false</bootable>
   <interface>virtio</interface>
   <active>true</active>
   <disk>
     <description>My disk</description>
     <format>cow</format>
     <name>mydisk_${disk_counter}</name>
     <provisioned_size>0</provisioned_size>
     <storage_domains>
       <storage_domain>
         <name>eberman_SD_NFS_1</name>
       </storage_domain>
     </storage_domains>
   </disk>
 </disk_attachment>

it failed:
<fault>
     <detail>[Cannot add Virtual Disk. Maximum PCI devices exceeded.]</detail>
     <reason>Operation Failed</reason>
 </fault>
(same from the UI)

how can i change the limits of the VM PCI devices ? /other way to test this scenario?

Comment 21 Martin Polednik 2017-02-23 15:29:02 UTC

You can't change number of PCI devices the VM has, PCI standard doesn't allow more than 32 devices. You can, however, use virtio-scsi to get over the limitation (virtio-scsi disk isn't PCI device).

Guess
   <interface>virtio-scsi</interface>
is all you need if you have enabled virtio-scsi for the VM.

Comment 22 Yaniv Kaul 2017-02-26 11:31:27 UTC

(In reply to Martin Polednik from comment #21)
> You can't change number of PCI devices the VM has, PCI standard doesn't
> allow more than 32 devices. You can, however, use virtio-scsi to get over
> the limitation (virtio-scsi disk isn't PCI device).
> 
> Guess
>    <interface>virtio-scsi</interface>
> is all you need if you have enabled virtio-scsi for the VM.

I kinda remember OUR limit is 28 - as we use some slots already? (NIC, virtio-serial, ...).

But yes, virtio-scsi is your friend. Do we allow multiple virtio-SCSI controllers already?

Comment 23 Martin Polednik 2017-02-27 08:40:41 UTC

That's QEMU limit considering default devices, we actually use ~6 devices by default AFAIK.

You don't even need multiple virtio-scsi controllers in the first place, but more can be created by setting iothreads.

Comment 27 errata-xmlrpc 2017-03-16 15:35:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0544.html