Bug 1401926

Summary: Calamari performance benchmarks
Product: Red Hat Ceph Storage Reporter: Nishanth Thomas <nthomas>
Component: CalamariAssignee: Christina Meno <gmeno>
Calamari sub component: Back-end QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: unspecified CC: ceph-eng-bugs, hnallurv, nthomas, vsarmila
Version: 2.1   
Target Milestone: rc   
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-12 17:38:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1406357    

Description Nishanth Thomas 2016-12-06 12:22:51 UTC
Description of problem:

In a pre-defined environment, calamari should be stress tested for a
refresh interval of 2-3 seconds per API call.

Comment 2 Christina Meno 2016-12-07 06:24:43 UTC
Do you have a list of specific calls that are exceeding this?
My experience says that the only endpoint that won't be able to meet this is /cli are you seeing slowness elsewhere?

Comment 3 Nishanth Thomas 2016-12-08 14:11:32 UTC
Yeah, cli is a good candidate where we get a good number of monitoring data using this endpoint. Another endpoint where I experienced slowness is cushnode/crushrule related endpoints when operates on a reasonably big cluster(say 20 OSDs). Also we need to have a closer look at the apis(not yet) we will be using on a frequent basis(syncobject, pool, cluster etc)

Performance of individual endpoints is important, but I am more worried about how calamari handles a burst of requests over a period of time say one request per second(probably more than that). Do we have any data exists on this?

Comment 4 Christina Meno 2016-12-09 19:12:28 UTC
There is no real load test like data right now. I'm getting calamari_setup in our lab and will write some tests that stress it.

Yeah the crush endpoint is a bit of a concern now that you mention it. It's implementation is by writing out a new map on each change. Perhaps it would be better to try to group the modifications some how.

Comment 5 Christina Meno 2017-01-04 19:35:26 UTC
Sankarshan and I discussed this before break. What we agreed on was that I would share a procedure to gather the measurements and that the console team would gather and report what they found.

Here is the current procedure:
On a monitor node
source /opt/calamari/venv/bin/activate
supervisorctl stop calamari-lite
calamari-lite 2>&1 | tee /var/log/calamari/request_timing.log # run in
the foreground collect access logs that have format like this

172.21.0.100 - - [2017-01-03 19:05:39] "POST
/api/v2/cluster/27246bd8-969a-4c1d-ac5f-8e7b477ad901/pool HTTP/1.1"
202 8264 2.466778

Send me your results and we can investigate the slowest endpoints

Comment 6 Nishanth Thomas 2017-01-06 06:13:00 UTC
Setting up the test bench for the implementation of sequence of steps would require us to free up hardware and all that would need around 10 days. I will start on this and update the results here.

Comment 7 Christina Meno 2017-01-11 16:04:27 UTC
Looks like we won't be able to do anything till next release (since we'll have measurements to late to act in the 2.2 cycle)