Bug 1468439 - VM startup takes 2m and 50 seconds with 1200 devices (use case: multiple direct LUNs)
Summary: VM startup takes 2m and 50 seconds with 1200 devices (use case: multiple dire...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.19.17
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.0
: 4.20.9.1
Assignee: Nir Soffer
QA Contact: guy chen
URL:
Whiteboard:
Depends On: 1488892
Blocks: 1545229
TreeView+ depends on / blocked
 
Reported: 2017-07-07 06:23 UTC by guy chen
Modified: 2018-04-18 12:32 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-04-18 12:32:47 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 81418 0 master MERGED bll: Optimization - don't connect same targets in RunVM 2017-09-28 12:08:43 UTC

Description guy chen 2017-07-07 06:23:07 UTC
Description of problem:
with 1200 devices on the host - 150 direct LUN with 4 paths, and 300 VMS, each attached 2 direct LUN, VM startup takes 2m and 50 seconds


Version-Release number of selected component (if applicable):
vdsm version 4.19.17

How reproducible:
Always

Steps to Reproduce:
1.Attache 300 luns with 4 paths to storage
2.Add 150 VMS 
3.Attache 2 direct LUN to each VMS
4.Start 150 VMS
5.Start a single VM

Actual results:
Takes very long time

Expected results:
Should be shorter and reasonable time

Additional info:
Logs and additional info will be attached

Comment 4 Yaniv Kaul 2017-07-09 07:09:50 UTC
Where's supervdsm log? It probably has some clues.

Comment 12 Roy Golan 2017-09-10 05:48:29 UTC
Adding a patch here to address redundant call to connectStorageServer during a RunVm flow. VM with different luns IQN, but the SAME server address, will request a single request, instead of 2.

https://gerrit.ovirt.org/c/81418/

Allon, please double check my logic here to see I'm not missing anything.

Comment 14 Roy Golan 2017-09-10 08:05:02 UTC
(In reply to Roy Golan from comment #12)
> Adding a patch here to address redundant call to connectStorageServer during
> a RunVm flow. VM with different luns IQN, but the SAME server address, will

ignore IQN

Comment 15 Allon Mureinik 2017-09-12 12:08:36 UTC
Roy, thanks for contributing the patch you linked to. However, it used an API (connectStorageToLunByVdsId) that should be used per LUN. We need to double-check  why exactly this API needs the lun object.

If it really needs the lun, we need a more robust API that can receive a collection of luns, extract their connections and call connectStorageServer once with all the connections. If it doesn't, we should just connectStorageServer directly.

Comment 16 Roy Golan 2017-09-12 12:21:10 UTC
Initial check by derez show that there is no side-effect on the lun the is passed inside.

If you both are okay with that I have no problem with invoking the connectStorageServer VDS command directly.

Comment 17 Allon Mureinik 2017-09-13 08:47:40 UTC
(In reply to Roy Golan from comment #16)
> Initial check by derez show that there is no side-effect on the
> lun the is passed inside.
> 
> If you both are okay with that I have no problem with invoking the
> connectStorageServer VDS command directly.

I went over it myself too, and there seems to be a lot of spam there that's only relevant to domains.
In pseudocode, we should have something like this:

vm.getDisks()
  .stream()
  .flatMap(getDiskConnections)
  .distinct()
  .collect(groupingBy(conn -> conn.getType(), toList())
  .forEach(connectStorageServer(key, values)

Comment 18 guy chen 2018-04-08 11:33:16 UTC
Was tested on :

RHEVM: org.ovirt.engine-root-4.2.2.6-1
VDSM: vdsm-4.20.23-1.el7ev

With the same configuration startup time was reduced to 1m and 15s thus time is reasonable and verifying the bug.

Comment 19 Sandro Bonazzola 2018-04-18 12:32:47 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.