Bug 1129997
Summary: | qpid c++ client AMQP 1.0 throughput performance regression | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> |
Component: | qpid-cpp | Assignee: | Gordon Sim <gsim> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Eric Sammons <esammons> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | Development | CC: | esammons, gsim, iboverma, jross, mlesko, rrajasek |
Target Milestone: | 3.0 | Keywords: | Performance, Regression |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
It was discovered that if the capacity (property of qpid::messaging::Receiver class) and ack-frequency (setting on qpid-recv utility) were both at 100, there was a sudden drop in throughput. Reducing the ack-frequency or increasing the capacity, even by a very small amount, was found to make a considerable difference in throughput.
In general, a value higher than 100 is recommended to test throughput. A suggested value for the qpid::messaging::Receiver capacity parameter would be between 500-1000. A lower value is suitable for the qpid-recv utility ack-frequency parameter (which sets the frequency at which qpid::messaging::Session::acknowledge() is called). For example, acknowledging every 10 messages is unlikely to negatively impact the performance.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-01-21 12:52:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Frantisek Reznicek
2014-08-14 06:48:05 UTC
Created attachment 926680 [details]
Performance details (Calc)
Questions: What does the 'client count' mean? E.g. 'tiny' implies 1 client, is that one connection with one sender and receiver on it? or one process with sender and receiver on separate connections? How do the client count and queue count relate? E.g. if the queue count is 10 and the client count is 1, does that mean a sender/receiver per queue? Observations: The durable results are (almost) always higher than the non-durable, which is not what I would expect. Is it possible these are the wrong way round? Or perhaps the store wasn't loaded? (In reply to Frantisek Reznicek from comment #7) > Both sender and receiver clients (qc2_spout/drain) do not set capacity atm. What are the units of the throughput values in the spreadsheet? (In reply to Gordon Sim from comment #8) > (In reply to Frantisek Reznicek from comment #7) > > Both sender and receiver clients (qc2_spout/drain) do not set capacity atm. > > What are the units of the throughput values in the spreadsheet? throughput should be in bytes per second (i.e. number of messages multiplied by simple message content size) (In reply to Frantisek Reznicek from comment #9) > (In reply to Gordon Sim from comment #8) > > What are the units of the throughput values in the spreadsheet? > > throughput should be in bytes per second (i.e. number of messages multiplied > by simple message content size) Thanks, that makes sense (should have thought of that before asking)! One more question: is that aggregate? or per queue? or per sender/receiver pair? One other minor observation, the 'test_perftest_tsZ_qS_cliS_msgsL_df_mfd' and 'test_perftest_tsZ_qT_cliS_msgsL_df_mfd' results are included twice (also for _dt_ cases) with no msgsT in those categories. (In reply to Frantisek Reznicek from comment #9) > throughput should be in bytes per second (i.e. number of messages multiplied > by simple message content size) Hmm... on second glance... for tiny queue, tiny client and tiny messages (i.e. 10 byte messages), the throughput in bytes/sec is reported as ~800,000 for 0-10 (~500,000 for 1.0), i.e. 80,000 *messages*/sec. That seems unlikely if capacity is 0 (its surprisingly high even if it is not). (In reply to Gordon Sim from comment #14) > (In reply to Frantisek Reznicek from comment #13) > > I agree it is slightly suspicious to see 80k messages per sec with capacity > > of 0, but I'm pretty sure capacity was not set on both sides (so uses > > default which to my surprise is 0 not unlimited as python has). > > Python also defaults to 0 receiver capacity. (Originally c++ had non-zero, > and we changed to align with python). Ok, my fault I was not checking both Sender and Receiver. What I can see is: [root@dhcp-x-y ~]# rpm -q python-qpid python-qpid-0.22-17.el6.noarch [root@dhcp-x-y ~]# grep -E 'capacity|class |def ' /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py ... class Sender(Endpoint): def __init__(self, session, id, target, options): self.capacity = options.get("capacity", UNLIMITED) class Receiver(Endpoint, object): def __init__(self, session, id, source, options): self._set_capacity(options.get("capacity", 0), False) In c++ I can see: SenderImpl::SenderImpl(...) : parent(&_parent), name(_name), address(_address), state(UNRESOLVED), capacity(50), window(0), flushed(false), unreliable(AddressResolution::is_unreliable(address)) {} ReceiverImpl::ReceiverImpl(...) : parent(&p), destination(name), address(a), byteCredit(0xFFFFFFFF), autoDecode(autoDecode_), state(UNRESOLVED), capacity(0), window(0) {} So it looks like (Vienna capacity defaults): Sender Receiver c++/c#/perl 50 0 python UNLIMITED 0 Not sure whether difference in Sender's capacity makes a huge difference, but in my view the is still room for improvement. Do you think it's worth to create separate defect to uniquify and if so tune python to 50 or c++ (and all swigged to UNLIMITED)? Bug 772029 clarifies (thanks to Petr) why we would like to have finite sender's capacity (flow-control behavior). Created attachment 928349 [details]
Performance tests sheet (measurement 2014-08-18)
The performance was retested with requested capacity set to 100 (both senders and receivers).
Throughput drop on AMQP 1.0 is smaller (-38.88% worst case).
Surprisingly latency is now better on AMQP 0-10, this needs to be confirmed by another re-measurements we are going to do with more powerfull machines as current ones are having extreme load for all test_perftest_tsZ_qS_cliS_* tests.
I hope we will have by end of week one more measurement sheet from better machines.
I've committed a couple of improvements upstream: https://svn.apache.org/r1618913 and https://svn.apache.org/r1618914 A more substantial improvements has been checked in upstream: https://svn.apache.org/r1619252 and https://svn.apache.org/r1619318 (latter fixes a compilation error on windows introduced by the first). Sending of large messages is still slow. Created attachment 929229 [details]
Performance tests sheet (measurement 2014-08-20)
The performance was retested with requested capacity set to 100 (both senders and receivers) with anothe machine set (amd quad core)
As selected machines are AMD and core processor unid is less giving lower performance it is seen that most light tests are having lower performance as previous measurement, but for heavy tests we are seing improvement in average performance.
Overall throughput performance drop a slightly higher than for previous measurement (-66% worst case), latencywise results are slightly better than previous experiment.
This calc sheet lists both last two performance experiments (as sheets Main-v3 and Main-v4)
Sending of large messages is improved significantly by the following change upstream: https://svn.apache.org/r1619951 Created attachment 931342 [details] Performance tests sheet (measurement 2014-08-25, qpid-cpp-0.22-47, patches comment 19+20 in) The performance retest went ok, no failures. Same machines were used, same tests were executed (capacity=100). Surprisingly the tests are showing worse results on both throughput and latency fronts. See details attached. Measurement 2014-08-23 using qpid-cpp-0.22-46 on sheet Main-v5. Measurement 2014-08-25 using qpid-cpp-0.22-47 on sheet Main-v6. Sheet 46 vs 47 shows actual percentual differences. There are differences stated for AMQP 0-10 as well as AMQP 1.0 to see how the tests are settled. Tests settleness / stability of perf measurement: Let's assume the commited changes are touching just AMQP 1.0 and are not affecting AMQP 0-10 code paths for a moment. AMQP 0-10 throughput -46 / -47 diffs show maximum absolute difference of about 4%, coloring was set or 3%. AMQP 1.0 throughput unfortunatelly show higher performance drops for small messages and low count of clients per queue, some lower than -10%. Similar situation is shown for latencies. There is detected difference between manual execution of qpid-send/receive (proving there is no perf. drop) and performance suite (using qc2_spout/drain clients). Effort in understanding this difference between approaches was started (and will not be blocking this defect state transition). I executed the following manual tests on the same hardware: * The qpidd broker is running on one machine using the same configuration file as in our automated performance tests. * On the client machine, qpid-send is sending the messages to one queue on this broker, qpid-receive reads these messages from the same broker/queue. * Performance statistics are measured on the broker by qpid-queue-stats utility. I iterated over three message sizes (10, 100 and 1000 bytes) and two client capacities (100 and 150), I'll attach the complete results. These numbers show that AMQP 1.0 performs better than AMQP 0-10 in almost all cases. The only exception was 1000 bytes messages + capacity 100 combination. Here, the enqueue numbers for 1.0 are lower by ~7%, on the other hand, message reading (dequeue) shows the biggest improvement (40%) over 0-10 for this case. It seems that, generally speaking, there is no performance degradation when switching from AMQP 0-10 to AMQP1.0. The results QE reported previously are still worth exploring, but this particular issue should not block the release. Created attachment 934076 [details]
Results of manual performance testing using qpid-send/qpid-receive and qpid-queue-stats.
Taking into consideration the fact that manual tests do not show performance degradation when switching from AMQP 0-10 to AMQP1.0 in the general case, QE does not see this issue as a blocker for the release any more. In case that we discover the conditions/settings causing performance degradation, the new bug will be reported. -> VERIFIED |