Bug 1155146

Summary:

[scale] REST API performance degradation 3.4 vs 3.5 on get VM details and VM shutdown API requests

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Yuri Obshansky <yobshans>

Component:

ovirt-engine

Assignee:

Michal Skrivanek <michal.skrivanek>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Eldad Marciano <emarcian>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

3.5.0

CC:

amureini, bazulay, fromani, gklein, lpeer, lsurette, michal.skrivanek, nsoffer, oourfali, pstehlik, rbalakri, Rhev-m-bugs, scohen, sherold, srevivo, ybronhei, ykaul, yobshans

Target Milestone:

ovirt-3.6.0-rc

Keywords:

Regression

Target Release:

3.6.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

vdsm-4.16.7.6-1.el6ev

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-04-20 01:26:01 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Virt

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
VDSM profiles	none

Description Yuri Obshansky 2014-10-21 12:45:14 UTC

Description of problem:
During REST API performance sanity test execution on RHEV-M 3.5 beta detected critical performance degradation compare to RHEV-M 3.4. Additional test without JSON-RPC protocol showed better results close to RHEV-M 3.4 baseline test results.
Summary comparison table results:
Transactions 	Average 	        90% Line 	        Max
		3.4 	3.5 		3.4 	3.5 		3.4 	3.5
			JSON 	NOT 		JSON 	NOT 		JSON 	NOT
Login to VM 	312 	1058 	570 	401 	2318 	807 	2163 	4456 	2043
VM Application 	24 	32 	19 	42 	28 	24 	294 	656 	265
VM Disks 	27 	26 	24 	38 	33 	31 	163 	150 	99
VM Nics 	35 	27 	25 	54 	35 	32 	151 	234 	97
VM Cdroms 	23 	19 	18 	41 	24 	23 	58 	43 	71
VM Snapshots 	8 	10 	9 	10 	14 	12 	199 	152 	43
VM Permissions 	8 	11 	10 	11 	15 	13 	221 	40 	42
VM Statistics 	24 	21 	20 	42 	27 	24 	136 	44 	55
VM Devices 	8 	10 	9 	11 	14 	12 	71 	42 	81
VM Shutdown 	743 	46221 	1556 	2249 	147839 	5145 	11969 	364358 	15654
VM Status 	28 	28 	26 	48 	32 	31 	192 	3571 	1080
VM Start 	2206 	106513 	3617 	5749 	241912 	8732 	18389 	599149 	25040

Detailed tests reports could be found on 
RHEV-M 3.5 beta User Portal REST API’s Performance Test Results - 50 VMs
https://mojo.redhat.com/docs/DOC-994846
and
RHEV-M 3.5 beta User Portal REST API’s Performance Test Results - 50 VMs - without JSON protocol
https://mojo.redhat.com/docs/DOC-995332

Version-Release number of selected component (if applicable):
RHEV-M: 3.5.0-0.14.beta
RHEL: 6Server - 6.5.0.1.el6
Kernel Version: 2.6.32 - 431.29.2.el6.x86_64
KVM Version: 0.12.1.2 - 2.415.el6_5.10
LIBVIRT Version: libvirt-0.10.2-29.el6_5.11
VDSM Version: vdsm-4.16.6-1.el6ev

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Piotr Kliczewski 2014-10-22 07:00:41 UTC

Can you please provide jprofiler snapshots after each operation that you performed? I would like to identify bottlenecks.

From REST api perspective jsonrpc emulates synchronous behavior so there should be no or little difference.

Comment 2 Yuri Obshansky 2014-10-22 07:04:35 UTC

This test is 50 concurrent threads load.
Please, take a look of detailed test reports on mojo
What do you mean each operation? How do you think I can do it?
Yuri

Comment 3 Piotr Kliczewski 2014-10-22 07:12:28 UTC

I already looked at your detailed reports but I would like to understand how different part of the code behave during/after the test to identify bottlenecks.

Can you please provide few jprofiler snapshots?

Comment 4 Yuri Obshansky 2014-10-22 07:17:38 UTC

I can create jprofiler snapshot for full test is running.
I can decreased number of threads. Let's say 1 or 5 threads.
Please, confirm is it OK.

Comment 5 Piotr Kliczewski 2014-10-22 07:19:46 UTC

I confirm.

Comment 6 Oved Ourfali 2014-10-30 06:43:36 UTC

Status update: It seems like a DB issue. Still being examined by scale team and infra.

Comment 7 Barak 2014-11-06 12:25:13 UTC

Francessco,

Please take a look at the 2 operations (vdsm profiling) from the table above
VM Shutdown & VM Start.

The above results are the totals from the engine side,
I'm wondering whether there is a problem (a big difference in vdsm) between usinf XML rpc & Json.

It could be one call for each operation + profiling (in the same env) just to get a clearer picture.

Comment 8 Francesco Romani 2014-11-06 13:25:41 UTC

This is very interesting information.
From virt's perspective there is no difference between JSONRPC and XMLRPC, but surely there are bugs.

The timeouts/errors can cause spikes in the response time, so we need these patches in before to repeat the tests to have meaningful information:

http://gerrit.ovirt.org/#/c/34147/
http://gerrit.ovirt.org/#/c/34813/
most likely also
http://gerrit.ovirt.org/#/c/34879/

Meanwhile, I'll setup some profiling to see where the time is spent when everything goes well and if there are some actual differences I'm not aware of.

Comment 9 Barak 2014-11-09 11:59:19 UTC

I see only http://gerrit.ovirt.org/#/c/34879/ wasn't accepted yet (and all should be back-ported to 3.5).
I'm leaving the need info to make sure we get the profiling info once it's ready.

Comment 10 Francesco Romani 2014-11-17 09:28:25 UTC

(In reply to Francesco Romani from comment #8)
> This is very interesting information.
> From virt's perspective there is no difference between JSONRPC and XMLRPC,
> but surely there are bugs.
> 
> The timeouts/errors can cause spikes in the response time, so we need these
> patches in before to repeat the tests to have meaningful information:
> 
> http://gerrit.ovirt.org/#/c/34147/
> http://gerrit.ovirt.org/#/c/34813/
> most likely also
> http://gerrit.ovirt.org/#/c/34879/
> 
> Meanwhile, I'll setup some profiling to see where the time is spent when
> everything goes well and if there are some actual differences I'm not aware
> of.

The above are fixes for correctness. Here I'm adding performance improvement patches:

http://gerrit.ovirt.org/#/c/35212/
http://gerrit.ovirt.org/#/c/34989/
http://gerrit.ovirt.org/#/c/35024/

all of the above are simple and quite safe.

profiling is ongoing.

Comment 11 Francesco Romani 2014-11-18 16:08:34 UTC

I managed to get a decent profiling of the create/shutdown flow.
The problem is to minimize the noise introduced by sampling, without actually changing (too much) the VDSM code.

So, I used a script like:
#!/bin/bash

[ -f /var/run/vdsm/vdsmd.prof ] && rm -f /var/run/vdsm/vdsmd.prof
vdsClient -s 0 startProfile &> /dev/null
for i in `seq 1 100`; do
	<create VM> && <destroy VM>
done
vdsClient -s 0 stopProfile &> /dev/null
mv /var/run/vdsm/vdsmd.prof $1.prof


and I checked VDSM master (at f40345d8e4) - relevant patches being backported

patches needed to run profiles:
http://gerrit.ovirt.org/#/c/35016/
http://gerrit.ovirt.org/#/c/35181/
http://gerrit.ovirt.org/#/c/35292

patches used in measurement in addition to master at SHA1 above:
http://gerrit.ovirt.org/#/c/34989/

results for XMLRPC
         2192340 function calls (2229132 primitive calls) in 51.597 CPU seconds

   Ordered by: internal time
   List reduced from 1126 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3   36.125   12.042   36.204   12.068 PolicyEngine.py:108(PolicyEngine.do_controls)
    24436    2.304    0.000    0.114    0.000 pthread.py:98(Lock.unlock)
    11650    2.042    0.000    2.156    0.000 __init__.py:423(Formatter.format)
       17    0.668    0.039    0.672    0.040 storage_mailbox.py:621(SPM_MailMonitor._handleRequests)
     7812    0.366    0.000    0.580    0.000 __init__.py:226(LogRecord.__init__)
    88446    0.309    0.000    0.420    0.000 StringIO.py:208(StringIO.write)
    21800    0.259    0.000    0.436    0.000 lvm.py:217(makeLV)
      100    0.248    0.002    1.650    0.016 lvm.py:411(LVMCache._reloadlvs)
      100    0.247    0.002    2.008    0.020 blockSD.py:146(_getVolsTree)
      100    0.239    0.002    0.243    0.002 lvm.py:446(<genexpr>)
    69912    0.230    0.000    0.280    0.000 string.py:511(replace)
 201/2502    0.211    0.001    0.827    0.000 xmlrpclib.py:765(Marshaller.dump_struct)
   107032    0.199    0.000    0.255    0.000 <string>:8(__new__)
802/10024    0.193    0.000    1.048    0.000 vmxml.py:473(Element.__hacked_writexml)
      100    0.191    0.002    2.360    0.024 blockSD.py:1012(BlockStorageDomain.getAllVolumesImages)
    24436    0.178    0.000    0.292    0.000 pthreading.py:77(Lock.release)
    87200    0.163    0.000    0.638    0.000 codecs.py:347(StreamWriter.write)
     7812    0.162    0.000    1.632    0.000 __init__.py:1204(Logger.callHandlers)
     5586    0.134    0.000    0.262    0.000 expatbuilder.py:743(ExpatBuilderNS.start_element_handler)
22362/22363    0.131    0.000    1.231    0.000 protocoldetector.py:108(MultiProtocolAcceptor._process_events)

results for JSONRPC
         1788411 function calls (1813944 primitive calls) in 13.510 CPU seconds

   Ordered by: internal time
   List reduced from 1161 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     8768    1.831    0.000    0.712    0.000 __init__.py:1142(Logger.makeRecord)
    10074    0.585    0.000    0.009    0.000 minidom.py:473(NamedNodeMap.__init__)
       15    0.521    0.035    0.525    0.035 storage_mailbox.py:621(SPM_MailMonitor._handleRequests)
21829/22018    0.403    0.000    0.414    0.000 lvm.py:217(makeLV)
    89079    0.267    0.000    0.379    0.000 StringIO.py:208(StringIO.write)
      101    0.234    0.002    0.238    0.002 lvm.py:446(<genexpr>)
      101    0.225    0.002    1.614    0.016 lvm.py:411(LVMCache._reloadlvs)
     8312    0.201    0.000    0.159    0.000 __init__.py:1121(Logger.findCaller)
  322/326    0.195    0.001    9.402    0.029 threading.py:481(Thread.run)
     8768    0.190    0.000    1.956    0.000 __init__.py:1204(Logger.callHandlers)
    13802    0.185    0.000    0.569    0.000 __init__.py:423(Formatter.format)
23951/23952    0.177    0.000    1.574    0.000 protocoldetector.py:108(MultiProtocolAcceptor._process_events)
    86587    0.172    0.000    0.595    0.000 codecs.py:347(StreamWriter.write)
      423    0.171    0.000    0.036    0.000 stompReactor.py:266(_AsyncoreEvent.set)
    13323    0.169    0.000    0.932    0.000 __init__.py:764(UserGroupEnforcingHandler.emit)
    13802    0.159    0.000    0.175    0.000 __init__.py:289(LogRecord.getMessage)
     8090    0.152    0.000    0.259    0.000 minidom.py:111(Document.appendChild)
    13800    0.151    0.000    1.767    0.000 __init__.py:666(UserGroupEnforcingHandler.handle)
    27175    0.150    0.000    0.150    0.000 pthread.py:95(RLock.lock)
    22795    0.148    0.000    0.291    0.000 minidom.py:299(_write_data)

there is still a bit too much noise in the profiles, but nothing suggests JSONRPC is slower than XMLRPC in VDSM.
There are instead suggestion that the contrary is true.
Of course this is true with these patches applied:
http://gerrit.ovirt.org/#/c/35024/ (-> backport to 3.5: http://gerrit.ovirt.org/#/c/35290/)
http://gerrit.ovirt.org/#/c/35212/

Comment 12 Francesco Romani 2014-11-18 16:12:26 UTC

Created attachment 958655 [details]
VDSM profiles

content of the archive:

NS_C000_vdscli.json - VM definition to be used with vdsClient
NS_C000_jsonrpc.json - VM definition to be used with jsonrpc
NS_C000_xmlrpc.prof - profile gathered with 'startxml.sh'
NS_C000_jsonrpc.prof - profile gathered with 'starthson.sh'
startjson.sh - JSONRPC test runner
startxml.sh - XMLRPC test runner
jsonrpc - JSONRPC simple client
profile-stats - python script to inspect .prof files (like in vdsm/contrib)

Comment 13 Yuri Obshansky 2014-12-09 15:11:43 UTC

Bug was verified on
RHEV-M 3.5.22 and vdsm-4.16.7.6-1.el6ev

Performance problem with JSON-RPC protocol compare to RHEV-M 3.5.17 version has resolved. Response time of major REST APIs dramatically decreased (90% Line response time):
- GET VM details from 2318 to 615 ms
- POST VM Shutdown from 147839 to 3040 ms
- POST VM Start from 241912 to 3014 ms

But still those value larger than we measured in RHEV-M 3.4 version 
(except the POST VM Start REST API):
- GET VM details 615 vs 401 ms (~ 30%)
- POST VM Shutdown 3040 vs 2249 ms (~25%)
All processes are evenly and stable.
No critical exception detected.
We cannot close that bug till find out a solution for existed performance degradation for 2 REST APIs GET VM Details and POST VM Shutdown.

Detailed report here:
https://mojo.redhat.com/docs/DOC-1004044

Comment 14 Gil Klein 2014-12-10 11:04:27 UTC

Moving back to ASSIGNED based on comment #13.

From QE perspective, if 3.5 does not show any improvement in performance, we would at least expect to see the same results we had with 3.4.

Comment 15 Oved Ourfali 2014-12-10 11:09:57 UTC

Updated the bug title accordingly.
In addition, I don't think this gap on two requests compared for 3.4 should be considered as a blocker for the version.

Scott - please input on that.

Barak - should this be moved to virt as JSONRPC isn't the root cause here, and both requests are virt-related?

Comment 16 Oved Ourfali 2014-12-10 11:53:41 UTC

(In reply to Oved Ourfali from comment #15)
> Updated the bug title accordingly.
> In addition, I don't think this gap on two requests compared for 3.4 should
> be considered as a blocker for the version.
> 
> Scott - please input on that.
> 
> Barak - should this be moved to virt as JSONRPC isn't the root cause here,
> and both requests are virt-related?

After discussing offline with Barak, I'm removing the blocker flag, and targeting to 3.5.Z.

Scott - leaving needinfo on you, in case you think otherwise.

Comment 17 Michal Skrivanek 2014-12-10 11:56:24 UTC

I'd be happy to review specific items, but IMHO we need to replay the same scenarios with the same speed. I'm not sure this is the case here.
I.e. when your test creates 50 VMs...does it try to do it "as fast as it can"? If so then we can't look at it from the response time perspective only...
Can you point me to the specific cases and what they were doing, how fast were the parallel requests send?

I suppose we can talk only about the problematic VM create, VM shutdown and VM login cases...

Comment 18 Yuri Obshansky 2014-12-10 12:01:12 UTC

Please, see details in report
https://mojo.redhat.com/docs/DOC-1004044
Test didn't create VMs.
It is a load test.
The same scenarios, the same hardware.

Comment 20 Michal Skrivanek 2014-12-10 15:27:59 UTC

(In reply to Yuri Obshansky from comment #18)
> Please, see details in report
> https://mojo.redhat.com/docs/DOC-1004044
> Test didn't create VMs.

I don't understand then what the Create VM and Shutdown VM calls were/are supposed to do. Are they not creating and shutting down VMs?

> It is a load test.

you can't consistently measure responses of synchronous actions in a load test. As the load directly affects the responses.So faster you get in one call the more load you create, slowing down other calls.

> The same scenarios, the same hardware.

Comment 23 Yuri Obshansky 2014-12-10 16:04:51 UTC

(In reply to Michal Skrivanek from comment #20)
> (In reply to Yuri Obshansky from comment #18)
> > Please, see details in report
> > https://mojo.redhat.com/docs/DOC-1004044
> > Test didn't create VMs.
> 
> I don't understand then what the Create VM and Shutdown VM calls were/are
> supposed to do. Are they not creating and shutting down VMs?
They start and shutdown VMs in loop during test running
50 concurrent threads vs. 50 Vms using User security credentials
> 
> > It is a load test.
> 
> you can't consistently measure responses of synchronous actions in a load
> test. As the load directly affects the responses.So faster you get in one
> call the more load you create, slowing down other calls.
> 
We are not talking about synchronous actions. We don't need it. JMeter measured each API calls time (from request till response), calculate min. average, 90% line and max response time. This test is performance regression test which perform the same scenario on the same hardware to detect degradation or improving of rhev-m REST API performance. Nothing should be synchronized in this scenario. Let me know, if you need additional an explanation about regression REST API performance test scenario and structure. 
 
> > The same scenarios, the same hardware.

Comment 24 Michal Skrivanek 2014-12-15 09:45:37 UTC

(In reply to Yuri Obshansky from comment #23)
> (In reply to Michal Skrivanek from comment #20)
> > (In reply to Yuri Obshansky from comment #18)
> > > Please, see details in report
> > > https://mojo.redhat.com/docs/DOC-1004044
> > > Test didn't create VMs.
> > 
> > I don't understand then what the Create VM and Shutdown VM calls were/are
> > supposed to do. Are they not creating and shutting down VMs?
> They start and shutdown VMs in loop during test running
> 50 concurrent threads vs. 50 Vms using User security credentials
> > 
> > > It is a load test.
> > 
> > you can't consistently measure responses of synchronous actions in a load
> > test. As the load directly affects the responses.So faster you get in one
> > call the more load you create, slowing down other calls.
> > 
> We are not talking about synchronous actions. We don't need it. JMeter
> measured each API calls time (from request till response), calculate min.
> average, 90% line and max response time. This test is performance regression
> test which perform the same scenario on the same hardware to detect
> degradation or improving of rhev-m REST API performance. Nothing should be
> synchronized in this scenario. Let me know, if you need additional an
> explanation about regression REST API performance test scenario and
> structure. 
>  
> > > The same scenarios, the same hardware.

that doesn't matter. As I said, you are measuring synchronous actions. Shutdown returns only once the VM properly shuts down. That depends on the overall load on the system. If you happen to stuff more parallel requests on the host (due to other improvements) and load the host more this particular call will take longer. Same applies to the other two calls (albeit only partially for start VM)

What would be interesting to see is the duration of the test.
if you're running for fixed time - the number of API calls. Or vice versa.

Comment 25 Yuri Obshansky 2014-12-15 11:14:57 UTC

(In reply to Michal Skrivanek from comment #24)
> (In reply to Yuri Obshansky from comment #23)
> > (In reply to Michal Skrivanek from comment #20)
> > > (In reply to Yuri Obshansky from comment #18)
> > > > Please, see details in report
> > > > https://mojo.redhat.com/docs/DOC-1004044
> > > > Test didn't create VMs.
> > > 
> > > I don't understand then what the Create VM and Shutdown VM calls were/are
> > > supposed to do. Are they not creating and shutting down VMs?
> > They start and shutdown VMs in loop during test running
> > 50 concurrent threads vs. 50 Vms using User security credentials
> > > 
> > > > It is a load test.
> > > 
> > > you can't consistently measure responses of synchronous actions in a load
> > > test. As the load directly affects the responses.So faster you get in one
> > > call the more load you create, slowing down other calls.
> > > 
> > We are not talking about synchronous actions. We don't need it. JMeter
> > measured each API calls time (from request till response), calculate min.
> > average, 90% line and max response time. This test is performance regression
> > test which perform the same scenario on the same hardware to detect
> > degradation or improving of rhev-m REST API performance. Nothing should be
> > synchronized in this scenario. Let me know, if you need additional an
> > explanation about regression REST API performance test scenario and
> > structure. 
> >  
> > > > The same scenarios, the same hardware.
> 
> that doesn't matter. As I said, you are measuring synchronous actions.
> Shutdown returns only once the VM properly shuts down. That depends on the
> overall load on the system. If you happen to stuff more parallel requests on
> the host (due to other improvements) and load the host more this particular
> call will take longer. Same applies to the other two calls (albeit only
> partially for start VM)
> 
It is not exactly like you think. VM state changed to "powering_down" when I execute REST API /api/vms/${vm_id}/shutdown. REST API didn't wait till VM state will be changed to "down" and sent me response. I measure this time, which doesn't depends on other resources/overload/etc. 
My script has a synchronization i.e. it didn't start VM till it's state = "dowm"
and vice versa for Shutdown API. 

> What would be interesting to see is the duration of the test.
> if you're running for fixed time - the number of API calls. Or vice versa.

Yes, I have that information:
Count of performed API calls during 2 hours test.
3.4 - 750
3.5.17 - 450
3.5.22 - 714

Comment 26 Michal Skrivanek 2014-12-15 14:44:30 UTC

ok, so start VM is clear. I just looked at wrong column. There's a 50% improvement.
then we're left with shutdown and login.
Both do invoke VM action synchronously, as part of your call. They both invokes guest agent. 
Since 3.4 vs 3.5.22 numbers are comparable it should +- match, indeed. Can you please check if the guests are also exactly the same? Can you reproduce the difference when you run just one single API call...and if so, get the logs from both versions. Only one call is enough

Comment 27 Yuri Obshansky 2014-12-16 12:49:42 UTC

I think, it was the same version
rhevm-guest-agent-common-1.0.9-1.el6ev

Comment 28 Michal Skrivanek 2014-12-19 06:52:32 UTC

current thinking is the problem is related to the guest agent communication. The difference doesn't seem to be related to any code change, rather the differences in various runs.

Comment 29 Oved Ourfali 2014-12-25 07:22:31 UTC

Moving this one to virt, upon latest comments.

Comment 30 Michal Skrivanek 2015-01-14 10:43:21 UTC

Yuri, what exactly is the "Login to VM" API call in your tests?

Comment 31 Yuri Obshansky 2015-01-14 11:47:34 UTC

Actually, "Login to VM" is not Login. 
This is logical name of transaction.
I send HTTP request to  {engine_url}/api/vms
using Authorization in HTTP Header 
with encoded string ${user}@nott04.eng.lab.tlv.redhat.com:{password}
related to specific user defined in RHEV-M.
This cal will return back specific VM details (xml).
That's all.

Comment 32 Michal Skrivanek 2015-01-14 12:21:33 UTC

ok, interesting. 
Can you reproduce the difference when you run just one single API call...and if so, get the logs from both versions. Only one call is enough

Anyway the difference in the fixed version is not that big (~20%) and only for 2 calls, shutdown being the slowest, but also probably the least interesting - it is a long operation and it is the end of that particular VM so you don't do it often

Comment 33 Yuri Obshansky 2015-01-14 13:41:21 UTC

It is a little bit complicated right now.
I'm in the middle of other test execution.
It requires install 3.4 run single and reinstall 3.5
which requires downgrade vdsm version oh host as well.
It could get a several days of working.
If you need single API call why you don't run in any development environment?

Comment 34 Michal Skrivanek 2015-01-15 15:17:29 UTC

sure, but that's not high prio so I can't tell when we get to that

there were no changes in this area between 3.4 and 3.5 which wouldn't show up in other calls. There's nothing specific to "shutdown" other than it's a synchronous call to the host, so unless you have the complete picture of the performance of the host the different result do not really mean anything. As I said, robably not interesting to do that

For now I don't see anything to fix and I would suggest to close this.

Comment 35 Michal Skrivanek 2015-01-21 09:08:16 UTC

(In reply to Yuri Obshansky from comment #31)
> Actually, "Login to VM" is not Login. 
> This is logical name of transaction.
> I send HTTP request to  {engine_url}/api/vms
> using Authorization in HTTP Header 
> with encoded string ${user}@nott04.eng.lab.tlv.redhat.com:{password}
> related to specific user defined in RHEV-M.
> This cal will return back specific VM details (xml).
> That's all.

so, is it 50 concurrent queries to all vms or a specific random vm? There were a few more features added in 3.5 so the response is larger than 3.4

Comment 36 Yuri Obshansky 2015-01-21 09:30:59 UTC

(In reply to Michal Skrivanek from comment #35)
> (In reply to Yuri Obshansky from comment #31)
> > Actually, "Login to VM" is not Login. 
> > This is logical name of transaction.
> > I send HTTP request to  {engine_url}/api/vms
> > using Authorization in HTTP Header 
> > with encoded string ${user}@nott04.eng.lab.tlv.redhat.com:{password}
> > related to specific user defined in RHEV-M.
> > This cal will return back specific VM details (xml).
> > That's all.
> 
> so, is it 50 concurrent queries to all vms or a specific random vm? 

RHEV-M setup has only 50 VMs, so each thread works with specific VM (user). 
Threads run concurrent, but calls performed randomly.
That means, each thread started with delay 0.5 sec
and sent query to VM in different time.
This is real-life simulation method. 

> There were a few more features added in 3.5 so the response is larger than 3.4
I'm sure.

Comment 37 Michal Skrivanek 2015-01-23 08:17:00 UTC

alright, let's move summarize the remaining issues...

shutdown - well, unless this is reproducible on other setup and/or with a single VM my theory is that the difference is due to a different load during both runs. It's a sync call which can block on vdsm side for the time the qemu process needs to exit.

vm get - you're authorizing/logging in with each request. When you care for performance you're supposed to update the scripts to use jsessionid to avoid this. So if an individual authentication is a bit slower it's not a big deal, IMHO. It also might depend on performance of your LDAP server.

Overall, I don't see much to fix anymore, so proposing closing currentrelease (due to the previous real fix in jsonrpc)

Comment 38 Yuri Obshansky 2015-01-29 07:36:49 UTC

(In reply to Michal Skrivanek from comment #37)
> alright, let's move summarize the remaining issues...
> 
> shutdown - well, unless this is reproducible on other setup and/or with a
> single VM my theory is that the difference is due to a different load during
> both runs. It's a sync call which can block on vdsm side for the time the
> qemu process needs to exit.
> 
> vm get - you're authorizing/logging in with each request. When you care for
> performance you're supposed to update the scripts to use jsessionid to avoid
> this. So if an individual authentication is a bit slower it's not a big
> deal, IMHO. It also might depend on performance of your LDAP server.
> 
> Overall, I don't see much to fix anymore, so proposing closing
> currentrelease (due to the previous real fix in jsonrpc)

I'm agreed with you.
It's a minor difference (3 sec vs 2 sec) on current load schema.
I suggest close it and go ahead to run new performance tests.