Hide Forgot
Description of problem: In a pre-defined environment, calamari should be stress tested for a refresh interval of 2-3 seconds per API call.
Do you have a list of specific calls that are exceeding this? My experience says that the only endpoint that won't be able to meet this is /cli are you seeing slowness elsewhere?
Yeah, cli is a good candidate where we get a good number of monitoring data using this endpoint. Another endpoint where I experienced slowness is cushnode/crushrule related endpoints when operates on a reasonably big cluster(say 20 OSDs). Also we need to have a closer look at the apis(not yet) we will be using on a frequent basis(syncobject, pool, cluster etc) Performance of individual endpoints is important, but I am more worried about how calamari handles a burst of requests over a period of time say one request per second(probably more than that). Do we have any data exists on this?
There is no real load test like data right now. I'm getting calamari_setup in our lab and will write some tests that stress it. Yeah the crush endpoint is a bit of a concern now that you mention it. It's implementation is by writing out a new map on each change. Perhaps it would be better to try to group the modifications some how.
Sankarshan and I discussed this before break. What we agreed on was that I would share a procedure to gather the measurements and that the console team would gather and report what they found. Here is the current procedure: On a monitor node source /opt/calamari/venv/bin/activate supervisorctl stop calamari-lite calamari-lite 2>&1 | tee /var/log/calamari/request_timing.log # run in the foreground collect access logs that have format like this 172.21.0.100 - - [2017-01-03 19:05:39] "POST /api/v2/cluster/27246bd8-969a-4c1d-ac5f-8e7b477ad901/pool HTTP/1.1" 202 8264 2.466778 Send me your results and we can investigate the slowest endpoints
Setting up the test bench for the implementation of sequence of steps would require us to free up hardware and all that would need around 10 days. I will start on this and update the results here.
Looks like we won't be able to do anything till next release (since we'll have measurements to late to act in the 2.2 cycle)