Bug 1467818
| Summary: | [SCALE] Add new iscsi storage domain takes 4 minutes when the host is logged in to 1200 devices | ||
|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | guy chen <guchen> |
| Component: | General | Assignee: | Dan Kenigsberg <danken> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | guy chen <guchen> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.19.17 | CC: | alitke, bugs, dagur, ebenahar, guchen, nsoffer, tnisan |
| Target Milestone: | ovirt-4.3.0 | Keywords: | Performance, Reopened |
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.3+
|
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-02-13 07:46:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1488892 | ||
| Bug Blocks: | |||
|
Description
guy chen
2017-07-05 08:45:31 UTC
I think supervdsm log would be useful here. Do we have any insights to why any of these take so long? Do we have anything (I believe each connectStorageServer clears the lvs cache, so a complete lvs refresh is required). Guy, please add some of the analysis you've done. First I have found that iscsi re-scan is executed multiple times at connectStorageServer which effect this bug as well - added bug 1488892 as depended. Also, the problem is the connectStorageServer at this flow is executed twice, when attaching the the SD and when activating the SD, this makes the problem worse. Guy, please test again with vdsm > v4.20.10, fixing the multiple rescans per connection. Tal and Idan - how can this bug be MODIFIED if one of the patches attached here isn't merged? Is the patch not required, or is the status wrong? It shouldn't be on MODIFIED, thanks Allon. I created 250 LUNs of 2G that are exposed via 4 targets in targetcli, and logged my host to the targets. Also, I logged in to another two external targets (kaminario) that exposed 25 LUNs of 50G. So in total, I had 1050 devices. Then I created an iSCSI domain by selecting 2 targets of 50 LUNs and it took almost 2 minutes (113 seconds). I used the master versions of both vdsm and engine. Are we ok with that? What is the cause for the slowness? How is having 1050 devices impacts SD creation? I didn't investigate why and how much, but I do know (from Nir's comment 10) that the multiple rescans per connection bug was fixed so it should be faster now. We need the reproducer to recheck in his environment and post the results. Guy, can you please try to run this test again with the newest vdsm and post the results, please? I have retested the scenario on version 4.3.0-0.5.alpha1.el7 with vdsm version vdsm-4.30.3-1.el7ev. Results got improved and duration was reduced to 100S. If this is acceptable time bug can be verified. 100 seconds is 2.4X improvement compared with 240 seconds, but is still slow. Did you test on same hardware/storage/network as the first test? We need more data - at least what we have in comment 3: - how much time same operation take with few LUNs? - how number of LUNs affect creation time (e.g. 600, 300, 150, ... LUNs)? - how time is spent - engine, vdsm? - how time is spent in each component? This bugzilla is included in oVirt 4.3.0 release, published on February 4th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |