Bug 1468439

Summary: VM startup takes 2m and 50 seconds with 1200 devices (use case: multiple direct LUNs)
Product: [oVirt] vdsm Reporter: guy chen <guchen>
Component: GeneralAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: guy chen <guchen>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.19.17CC: amureini, bugs, guchen, rgolan, tnisan
Target Milestone: ovirt-4.2.0Keywords: Performance
Target Release: 4.20.9.1Flags: rule-engine: ovirt-4.2+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-18 12:32:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1488892    
Bug Blocks: 1545229    

Description guy chen 2017-07-07 06:23:07 UTC
Description of problem:
with 1200 devices on the host - 150 direct LUN with 4 paths, and 300 VMS, each attached 2 direct LUN, VM startup takes 2m and 50 seconds


Version-Release number of selected component (if applicable):
vdsm version 4.19.17

How reproducible:
Always

Steps to Reproduce:
1.Attache 300 luns with 4 paths to storage
2.Add 150 VMS 
3.Attache 2 direct LUN to each VMS
4.Start 150 VMS
5.Start a single VM

Actual results:
Takes very long time

Expected results:
Should be shorter and reasonable time

Additional info:
Logs and additional info will be attached

Comment 4 Yaniv Kaul 2017-07-09 07:09:50 UTC
Where's supervdsm log? It probably has some clues.

Comment 12 Roy Golan 2017-09-10 05:48:29 UTC
Adding a patch here to address redundant call to connectStorageServer during a RunVm flow. VM with different luns IQN, but the SAME server address, will request a single request, instead of 2.

https://gerrit.ovirt.org/c/81418/

Allon, please double check my logic here to see I'm not missing anything.

Comment 14 Roy Golan 2017-09-10 08:05:02 UTC
(In reply to Roy Golan from comment #12)
> Adding a patch here to address redundant call to connectStorageServer during
> a RunVm flow. VM with different luns IQN, but the SAME server address, will

ignore IQN

Comment 15 Allon Mureinik 2017-09-12 12:08:36 UTC
Roy, thanks for contributing the patch you linked to. However, it used an API (connectStorageToLunByVdsId) that should be used per LUN. We need to double-check  why exactly this API needs the lun object.

If it really needs the lun, we need a more robust API that can receive a collection of luns, extract their connections and call connectStorageServer once with all the connections. If it doesn't, we should just connectStorageServer directly.

Comment 16 Roy Golan 2017-09-12 12:21:10 UTC
Initial check by derez show that there is no side-effect on the lun the is passed inside.

If you both are okay with that I have no problem with invoking the connectStorageServer VDS command directly.

Comment 17 Allon Mureinik 2017-09-13 08:47:40 UTC
(In reply to Roy Golan from comment #16)
> Initial check by derez show that there is no side-effect on the
> lun the is passed inside.
> 
> If you both are okay with that I have no problem with invoking the
> connectStorageServer VDS command directly.

I went over it myself too, and there seems to be a lot of spam there that's only relevant to domains.
In pseudocode, we should have something like this:

vm.getDisks()
  .stream()
  .flatMap(getDiskConnections)
  .distinct()
  .collect(groupingBy(conn -> conn.getType(), toList())
  .forEach(connectStorageServer(key, values)

Comment 18 guy chen 2018-04-08 11:33:16 UTC
Was tested on :

RHEVM: org.ovirt.engine-root-4.2.2.6-1
VDSM: vdsm-4.20.23-1.el7ev

With the same configuration startup time was reduced to 1m and 15s thus time is reasonable and verifying the bug.

Comment 19 Sandro Bonazzola 2018-04-18 12:32:47 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.