Description of problem: with 1200 devices on the host - 150 direct LUN with 4 paths, and 300 VMS, each attached 2 direct LUN, VM startup takes 2m and 50 seconds Version-Release number of selected component (if applicable): vdsm version 4.19.17 How reproducible: Always Steps to Reproduce: 1.Attache 300 luns with 4 paths to storage 2.Add 150 VMS 3.Attache 2 direct LUN to each VMS 4.Start 150 VMS 5.Start a single VM Actual results: Takes very long time Expected results: Should be shorter and reasonable time Additional info: Logs and additional info will be attached
Where's supervdsm log? It probably has some clues.
Adding a patch here to address redundant call to connectStorageServer during a RunVm flow. VM with different luns IQN, but the SAME server address, will request a single request, instead of 2. https://gerrit.ovirt.org/c/81418/ Allon, please double check my logic here to see I'm not missing anything.
(In reply to Roy Golan from comment #12) > Adding a patch here to address redundant call to connectStorageServer during > a RunVm flow. VM with different luns IQN, but the SAME server address, will ignore IQN
Roy, thanks for contributing the patch you linked to. However, it used an API (connectStorageToLunByVdsId) that should be used per LUN. We need to double-check why exactly this API needs the lun object. If it really needs the lun, we need a more robust API that can receive a collection of luns, extract their connections and call connectStorageServer once with all the connections. If it doesn't, we should just connectStorageServer directly.
Initial check by derez show that there is no side-effect on the lun the is passed inside. If you both are okay with that I have no problem with invoking the connectStorageServer VDS command directly.
(In reply to Roy Golan from comment #16) > Initial check by derez show that there is no side-effect on the > lun the is passed inside. > > If you both are okay with that I have no problem with invoking the > connectStorageServer VDS command directly. I went over it myself too, and there seems to be a lot of spam there that's only relevant to domains. In pseudocode, we should have something like this: vm.getDisks() .stream() .flatMap(getDiskConnections) .distinct() .collect(groupingBy(conn -> conn.getType(), toList()) .forEach(connectStorageServer(key, values)
Was tested on : RHEVM: org.ovirt.engine-root-4.2.2.6-1 VDSM: vdsm-4.20.23-1.el7ev With the same configuration startup time was reduced to 1m and 15s thus time is reasonable and verifying the bug.
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.