Bug 1271737

Summary: [vdsClient] getStoragedomainsList query hangs for a long time
Product: [oVirt] vdsm Reporter: Ori Gofen <ogofen>
Component: ServicesAssignee: Nir Soffer <nsoffer>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Aharon Canan <acanan>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.17.7CC: acanan, amureini, bugs, ogofen, ybronhei
Target Milestone: ovirt-3.6.1Flags: amureini: ovirt-3.6.z?
amureini: ovirt-4.0.0?
ogofen: planning_ack?
ogofen: devel_ack?
ogofen: testing_ack?
Target Release: 4.17.14   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-28 00:55:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ori Gofen 2015-10-14 14:51:27 UTC
Description of problem:
when executing "vdsClient -s 0 getStoragedomainsList" from a 3.6 hypervisor, the operation seem to hang/get stuck before it returns with an answer, this has nothing to do with the amount of domains on the pool.
In order to emphasize the problem I have executed 10 times this command on a 3.5 and 3.6 hypervisors and compared the average execution time.

3.5 hypervisor:
[root@green-vdsc ~]# time vdsClient -s 0 getStorageDomainsList
9c95978b-0f3a-4887-9c15-61b36490ae28
b9feec08-20d8-45d6-bee5-f58a7357798c
294efd08-19bf-4bda-8c78-79d720e634d1


real    0m0.388s
user    0m0.110s
sys     0m0.015s

3.6 hypervisor:
[root@purple-vds1 ~]# time vdsClient -s 0 getStorageDomainsList
f8a1ae58-e886-4d3a-b0f4-06d32e61a339


real    2m0.655s
user    0m0.165s
sys     0m0.027s

The average 'real' execution time differences are pretty big almost 3 minutes on 3.6 hypervisor in comparison to less than half a SECOND on a 3.5 hypervisor.


Version-Release number of selected component (if applicable):
vdsm-infra-4.17.7-1.el7ev.noarch
vdsm-4.17.7-1.el7ev.noarch
vdsm-xmlrpc-4.17.7-1.el7ev.noarch
vdsm-cli-4.17.7-1.el7ev.noarch
vdsm-yajsonrpc-4.17.7-1.el7ev.noarch
vdsm-jsonrpc-4.17.7-1.el7ev.noarch
vdsm-python-4.17.7-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1.as descripted above

Comment 1 Allon Mureinik 2015-10-15 08:31:05 UTC
Does this happen only in vdsClient, or in engine flows as well?

Comment 2 Ori Gofen 2015-10-18 11:59:22 UTC
I guess Every action that calls "HSMGetStorageDomainsListVDSCommand.java" will suffer from a longer delay than it used too.

Comment 3 Yaniv Bronhaim 2015-10-18 14:24:34 UTC
signing as a regression and I would call it a blocker. Probably this the reason for the long vdsm startup hanging as well

Comment 4 Nir Soffer 2015-10-18 15:09:07 UTC
Are you testing the same storage from the same hypervisor in both cases?

On the 3.5 version, I see 3 domains. In the 3.6, 1 domain (with different id)

Please repeat the test using the same storage and the same hypervisor:

1. Install 3.5 vdsm
2. Create couple of domains
3. time the vdsClient operation
4. Upgrade to 3.6 vdsm
5. time the vdsClient operation

If you get same results, try in the opposite order:

1. Install 3.6 vdsm
2. Create couple of domains
3. time the vdsClient operation
4. Downgrade to 3.5 vdsm
5. time the vdsClient operation

Removing the regression until we have results for this test.

Please keep vdsm logs for each test - remove vdsm log before installing
new vdsm version and keep the vdsm logs from each test in another
location for attaching to the bug later.

Comment 5 Allon Mureinik 2015-10-18 15:14:02 UTC
(In reply to Ori Gofen from comment #2)
> I guess Every action that calls "HSMGetStorageDomainsListVDSCommand.java"
> will suffer from a longer delay than it used too.
The word "guess" has no place here. I was asking if you actually observed this while using engine (which may mean it's an issue worth examining), or whether it's a problem in vdsClient's code (which isn't supported anyway).

Comment 6 Red Hat Bugzilla Rules Engine 2015-10-19 10:57:35 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 7 Nir Soffer 2015-10-28 00:55:44 UTC
Closing as we did not get any response. Please reopen when you have more
data.