Bug 1468439 - VM startup takes 2m and 50 seconds with 1200 devices (use case: multiple direct LUNs)
VM startup takes 2m and 50 seconds with 1200 devices (use case: multiple dire...
Status: ON_QA
Product: vdsm
Classification: oVirt
Component: General (Show other bugs)
4.19.17
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-4.2.0
: 4.20.9.1
Assigned To: Nir Soffer
guy chen
: Performance
Depends On: 1488892
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-07 02:23 EDT by guy chen
Modified: 2017-12-13 02:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2?
eberman: testing_ack?


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 81418 master MERGED bll: Optimization - don't connect same targets in RunVM 2017-09-28 08:08 EDT

  None (edit)
Description guy chen 2017-07-07 02:23:07 EDT
Description of problem:
with 1200 devices on the host - 150 direct LUN with 4 paths, and 300 VMS, each attached 2 direct LUN, VM startup takes 2m and 50 seconds


Version-Release number of selected component (if applicable):
vdsm version 4.19.17

How reproducible:
Always

Steps to Reproduce:
1.Attache 300 luns with 4 paths to storage
2.Add 150 VMS 
3.Attache 2 direct LUN to each VMS
4.Start 150 VMS
5.Start a single VM

Actual results:
Takes very long time

Expected results:
Should be shorter and reasonable time

Additional info:
Logs and additional info will be attached
Comment 4 Yaniv Kaul 2017-07-09 03:09:50 EDT
Where's supervdsm log? It probably has some clues.
Comment 12 Roy Golan 2017-09-10 01:48:29 EDT
Adding a patch here to address redundant call to connectStorageServer during a RunVm flow. VM with different luns IQN, but the SAME server address, will request a single request, instead of 2.

https://gerrit.ovirt.org/c/81418/

Allon, please double check my logic here to see I'm not missing anything.
Comment 14 Roy Golan 2017-09-10 04:05:02 EDT
(In reply to Roy Golan from comment #12)
> Adding a patch here to address redundant call to connectStorageServer during
> a RunVm flow. VM with different luns IQN, but the SAME server address, will

ignore IQN
Comment 15 Allon Mureinik 2017-09-12 08:08:36 EDT
Roy, thanks for contributing the patch you linked to. However, it used an API (connectStorageToLunByVdsId) that should be used per LUN. We need to double-check  why exactly this API needs the lun object.

If it really needs the lun, we need a more robust API that can receive a collection of luns, extract their connections and call connectStorageServer once with all the connections. If it doesn't, we should just connectStorageServer directly.
Comment 16 Roy Golan 2017-09-12 08:21:10 EDT
Initial check by derez@redhat.com show that there is no side-effect on the lun the is passed inside.

If you both are okay with that I have no problem with invoking the connectStorageServer VDS command directly.
Comment 17 Allon Mureinik 2017-09-13 04:47:40 EDT
(In reply to Roy Golan from comment #16)
> Initial check by derez@redhat.com show that there is no side-effect on the
> lun the is passed inside.
> 
> If you both are okay with that I have no problem with invoking the
> connectStorageServer VDS command directly.

I went over it myself too, and there seems to be a lot of spam there that's only relevant to domains.
In pseudocode, we should have something like this:

vm.getDisks()
  .stream()
  .flatMap(getDiskConnections)
  .distinct()
  .collect(groupingBy(conn -> conn.getType(), toList())
  .forEach(connectStorageServer(key, values)

Note You need to log in before you can comment on or make changes to this bug.