Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1467818

Summary: [SCALE] Add new iscsi storage domain takes 4 minutes when the host is logged in to 1200 devices
Product: [oVirt] vdsm Reporter: guy chen <guchen>
Component: GeneralAssignee: Dan Kenigsberg <danken>
Status: CLOSED CURRENTRELEASE QA Contact: guy chen <guchen>
Severity: high Docs Contact:
Priority: medium    
Version: 4.19.17CC: alitke, bugs, dagur, ebenahar, guchen, nsoffer, tnisan
Target Milestone: ovirt-4.3.0Keywords: Performance, Reopened
Target Release: ---Flags: rule-engine: ovirt-4.3+
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-13 07:46:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1488892    
Bug Blocks:    

Description guy chen 2017-07-05 08:45:31 UTC
Description of problem:
with 1200 devices on the host adding a new iscsi storage domain takes 4 minutes 

Version-Release number of selected component (if applicable):
vdsm version 4.19.17

How reproducible:
Always

Steps to Reproduce:
1.Attache 300 luns with 4 paths to storage
2.Add new storage domain

Actual results:
Takes very long time, 4 minutes

Expected results:
Should be shorter and reasonable time

Additional info:
Logs and additional info will be attached

Comment 5 Yaniv Kaul 2017-08-06 11:39:28 UTC
I think supervdsm log would be useful here.
Do we have any insights to why any of these take so long? Do we have anything (I believe each connectStorageServer clears the lvs cache, so a complete lvs refresh is required).

Comment 6 Yaniv Kaul 2017-09-04 19:45:09 UTC
Guy, please add some of the analysis you've done.

Comment 7 guy chen 2017-09-06 13:13:45 UTC
First I have found that iscsi re-scan is executed multiple times at connectStorageServer which effect this bug as well - added bug 1488892 as depended.
Also, the problem is the connectStorageServer at this flow is executed twice, when attaching the the SD and when activating the SD, this makes the problem worse.

Comment 10 Nir Soffer 2018-01-04 14:49:09 UTC
Guy, please test again with vdsm > v4.20.10, fixing the multiple rescans per
connection.

Comment 11 Allon Mureinik 2018-01-08 18:32:09 UTC
Tal and Idan - how can this bug be MODIFIED if one of the patches attached here isn't merged?

Is the patch not required, or is the status wrong?

Comment 12 Idan Shaby 2018-01-09 06:21:43 UTC
It shouldn't be on MODIFIED, thanks Allon.

Comment 13 Idan Shaby 2018-01-14 10:30:04 UTC
I created 250 LUNs of 2G that are exposed via 4 targets in targetcli, and logged my host to the targets.
Also, I logged in to another two external targets (kaminario) that exposed 25 LUNs of 50G.
So in total, I had 1050 devices.
Then I created an iSCSI domain by selecting 2 targets of 50 LUNs and it took almost 2 minutes (113 seconds).
I used the master versions of both vdsm and engine.

Are we ok with that?

Comment 14 Yaniv Lavi 2018-01-17 12:46:07 UTC
What is the cause for the slowness? How is having 1050 devices impacts SD creation?

Comment 15 Idan Shaby 2018-01-21 12:17:57 UTC
I didn't investigate why and how much, but I do know (from Nir's comment 10) that the multiple rescans per connection bug was fixed so it should be faster now.
We need the reproducer to recheck in his environment and post the results.

Comment 16 Idan Shaby 2018-06-04 08:36:10 UTC
Guy, can you please try to run this test again with the newest vdsm and post the results, please?

Comment 20 guy chen 2018-11-26 09:59:14 UTC
I have retested the scenario on version 4.3.0-0.5.alpha1.el7 with vdsm version vdsm-4.30.3-1.el7ev.
Results got improved and duration was reduced to 100S.
If this is acceptable time bug can be verified.

Comment 21 Nir Soffer 2018-11-26 10:31:09 UTC
100 seconds is 2.4X improvement compared with 240 seconds, but is still slow.

Did you test on same hardware/storage/network as the first test?

We need more data - at least what we have in comment 3:
- how much time same operation take with few LUNs?
- how number of LUNs affect creation time (e.g. 600, 300, 150, ... LUNs)?
- how time is spent - engine, vdsm?
- how time is spent in each component?

Comment 24 Sandro Bonazzola 2019-02-13 07:46:42 UTC
This bugzilla is included in oVirt 4.3.0 release, published on February 4th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.