Bug 1129997

Summary: qpid c++ client AMQP 1.0 throughput performance regression
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED CURRENTRELEASE QA Contact: Eric Sammons <esammons>
Severity: unspecified Docs Contact:
Priority: high    
Version: DevelopmentCC: esammons, gsim, iboverma, jross, mlesko, rrajasek
Target Milestone: 3.0Keywords: Performance, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
It was discovered that if the capacity (property of qpid::messaging::Receiver class) and ack-frequency (setting on qpid-recv utility) were both at 100, there was a sudden drop in throughput. Reducing the ack-frequency or increasing the capacity, even by a very small amount, was found to make a considerable difference in throughput. In general, a value higher than 100 is recommended to test throughput. A suggested value for the qpid::messaging::Receiver capacity parameter would be between 500-1000. A lower value is suitable for the qpid-recv utility ack-frequency parameter (which sets the frequency at which qpid::messaging::Session::acknowledge() is called). For example, acknowledging every 10 messages is unlikely to negatively impact the performance.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-21 12:52:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Performance details (Calc)
none
Performance tests sheet (measurement 2014-08-18)
none
Performance tests sheet (measurement 2014-08-20)
none
Performance tests sheet (measurement 2014-08-25, qpid-cpp-0.22-47, patches comment 19+20 in)
none
Results of manual performance testing using qpid-send/qpid-receive and qpid-queue-stats. none

Description Frantisek Reznicek 2014-08-14 06:48:05 UTC
Description of problem:

qpid c++ client AMQP 1.0 throughput performance regression

There is seen 3% to 50% performance degradation when comparing
AMQP 0-10 and AMQP 1.0 protocol c++ Vienna client for standalone qpidd broker.

Testing scenario was executed on two bare metal machines broker on one clients on the other one (one i686 the other x86_64, machine specs same and close to each other)


See attached calc sheet to get details. There are combinations where AMQP 1.0 performance is almost as good as AMQP 0-10 one, but generally (out of 24 tests) performance drop is slightly more than 20%, but there are alarming cases when performance is about 50% of AMQP 0-10 one (test_perftest_tsZ_qT_cliT_msgsL_dt_mfd
).


Version-Release number of selected component (if applicable):
qpid-cpp-*-0.22-46

How reproducible:
100%

Steps to Reproduce:
1. use any c++ client and execute few minute performance transfer (using default protocol AMQP 0-10)
2. use same c++ client and execute few minute performance transfer and set protocol to AMQP 1.0
3. sweep message size / durability / number of clients / number of queues

Actual results:
  Although AMQP 1.0 latency is lower (better) than AMQP 0-10, also AMQP 1.0 throghput is lower (worse) when compared to AMQP 0-10.

Expected results:
  Improve AMQP 1.0 throghput to be comparable to AMQP 0-10 results.

Additional info:

Comment 1 Frantisek Reznicek 2014-08-14 06:48:59 UTC
Created attachment 926680 [details]
Performance details (Calc)

Comment 4 Gordon Sim 2014-08-14 08:48:08 UTC
Questions:

What does the 'client count' mean? E.g. 'tiny' implies 1 client, is that one connection with one sender and receiver on it? or one process with sender and receiver on separate connections? 

How do the client count and queue count relate? E.g. if the queue count is 10 and the client count is 1, does that mean a sender/receiver per queue?

Observations:

The durable results are (almost) always higher than the non-durable, which is not what I would expect. Is it possible these are the wrong way round? Or perhaps the store wasn't loaded?

Comment 8 Gordon Sim 2014-08-14 11:34:52 UTC
(In reply to Frantisek Reznicek from comment #7)
> Both sender and receiver clients (qc2_spout/drain) do not set capacity atm.

What are the units of the throughput values in the spreadsheet?

Comment 9 Frantisek Reznicek 2014-08-14 11:58:02 UTC
(In reply to Gordon Sim from comment #8)
> (In reply to Frantisek Reznicek from comment #7)
> > Both sender and receiver clients (qc2_spout/drain) do not set capacity atm.
> 
> What are the units of the throughput values in the spreadsheet?

throughput should be in bytes per second (i.e. number of messages multiplied by simple message content size)

Comment 10 Gordon Sim 2014-08-14 15:51:29 UTC
(In reply to Frantisek Reznicek from comment #9)
> (In reply to Gordon Sim from comment #8)
> > What are the units of the throughput values in the spreadsheet?
> 
> throughput should be in bytes per second (i.e. number of messages multiplied
> by simple message content size)

Thanks, that makes sense (should have thought of that before asking)!

One more question: is that aggregate? or per queue? or per sender/receiver pair?

Comment 11 Gordon Sim 2014-08-14 15:51:52 UTC
One other minor observation, the 'test_perftest_tsZ_qS_cliS_msgsL_df_mfd' and 'test_perftest_tsZ_qT_cliS_msgsL_df_mfd' results are included twice (also for _dt_ cases) with no msgsT in those categories.

Comment 12 Gordon Sim 2014-08-14 16:59:00 UTC
(In reply to Frantisek Reznicek from comment #9)
> throughput should be in bytes per second (i.e. number of messages multiplied
> by simple message content size)

Hmm... on second glance... for tiny queue, tiny client and tiny messages (i.e. 10 byte messages), the throughput in bytes/sec is reported as ~800,000 for 0-10 (~500,000 for 1.0), i.e. 80,000 *messages*/sec. That seems unlikely if capacity is 0 (its surprisingly high even if it is not).

Comment 15 Frantisek Reznicek 2014-08-15 09:24:04 UTC
(In reply to Gordon Sim from comment #14)
> (In reply to Frantisek Reznicek from comment #13)

> > I agree it is slightly suspicious to see 80k messages per sec with capacity
> > of 0, but I'm pretty sure capacity was not set on both sides (so uses
> > default which to my surprise is 0 not unlimited as python has).
> 
> Python also defaults to 0 receiver capacity. (Originally c++ had non-zero,
> and we changed to align with python).

Ok, my fault I was not checking both Sender and Receiver. What I can see is:
[root@dhcp-x-y ~]# rpm -q python-qpid
python-qpid-0.22-17.el6.noarch
[root@dhcp-x-y ~]# grep -E 'capacity|class |def ' /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py
  ...
class Sender(Endpoint):
  def __init__(self, session, id, target, options):
    self.capacity = options.get("capacity", UNLIMITED)
class Receiver(Endpoint, object):
  def __init__(self, session, id, source, options):
    self._set_capacity(options.get("capacity", 0), False)

In c++ I can see:
SenderImpl::SenderImpl(...) :
    parent(&_parent), name(_name), address(_address), state(UNRESOLVED),
    capacity(50), window(0), flushed(false), unreliable(AddressResolution::is_unreliable(address)) {}

ReceiverImpl::ReceiverImpl(...) :
    parent(&p), destination(name), address(a), byteCredit(0xFFFFFFFF), autoDecode(autoDecode_),
    state(UNRESOLVED), capacity(0), window(0) {}

So it looks like (Vienna capacity defaults):
              Sender    Receiver
c++/c#/perl       50           0
python     UNLIMITED           0

Not sure whether difference in Sender's capacity makes a huge difference, but in my view the is still room for improvement. Do you think it's worth to create separate defect to uniquify and if so tune python to 50 or c++ (and all swigged to UNLIMITED)?

Comment 17 Frantisek Reznicek 2014-08-15 12:45:39 UTC
Bug 772029 clarifies (thanks to Petr) why we would like to have finite sender's capacity (flow-control behavior).

Comment 18 Frantisek Reznicek 2014-08-19 12:42:17 UTC
Created attachment 928349 [details]
Performance tests sheet (measurement 2014-08-18)

The performance was retested with requested capacity set to 100 (both senders and receivers).

Throughput drop on AMQP 1.0 is smaller (-38.88% worst case).
Surprisingly latency is now better on AMQP 0-10, this needs to be confirmed by another re-measurements we are going to do with more powerfull machines as current ones are having extreme load for all test_perftest_tsZ_qS_cliS_* tests.

I hope we will have by end of week one more measurement sheet from better machines.

Comment 19 Gordon Sim 2014-08-19 18:01:59 UTC
I've committed a couple of improvements upstream:

    https://svn.apache.org/r1618913

and

    https://svn.apache.org/r1618914

Comment 20 Gordon Sim 2014-08-21 09:20:46 UTC
A more substantial improvements has been checked in upstream:

   https://svn.apache.org/r1619252 and https://svn.apache.org/r1619318

(latter fixes a compilation error on windows introduced by the first).

Sending of large messages is still slow.

Comment 22 Frantisek Reznicek 2014-08-21 15:01:34 UTC
Created attachment 929229 [details]
Performance tests sheet (measurement 2014-08-20)

The performance was retested with requested capacity set to 100 (both senders and receivers) with anothe machine set (amd quad core)

As selected machines are AMD and core processor unid is less giving lower performance it is seen that most light tests are having lower performance as previous measurement, but for heavy tests we are seing improvement in average performance.

Overall throughput performance drop a slightly higher than for previous measurement (-66% worst case), latencywise results are slightly better than previous experiment.


This calc sheet lists both last two performance experiments (as sheets Main-v3 and Main-v4)

Comment 26 Gordon Sim 2014-08-26 15:51:22 UTC
Sending of large messages is improved significantly by the following change upstream:

  https://svn.apache.org/r1619951

Comment 27 Frantisek Reznicek 2014-08-27 08:52:35 UTC
Created attachment 931342 [details]
Performance tests sheet (measurement 2014-08-25, qpid-cpp-0.22-47, patches comment 19+20 in)

The performance retest went ok, no failures.
Same machines were used, same tests were executed (capacity=100).

Surprisingly the tests are showing worse results on both throughput and latency fronts.

See details attached.

Measurement 2014-08-23 using qpid-cpp-0.22-46 on sheet Main-v5.
Measurement 2014-08-25 using qpid-cpp-0.22-47 on sheet Main-v6.

Sheet 46 vs 47 shows actual percentual differences.


There are differences stated for AMQP 0-10 as well as AMQP 1.0 to see how the tests are settled.

Tests settleness / stability of perf measurement:

Let's assume the commited changes are touching just AMQP 1.0 and are not affecting AMQP 0-10 code paths for a moment.

AMQP 0-10 throughput -46 / -47 diffs show maximum absolute difference of about 4%, coloring was set or 3%. AMQP 1.0 throughput unfortunatelly show higher performance drops for small messages and low count of clients per queue, some lower than -10%.

Similar situation is shown for latencies.

Comment 30 Frantisek Reznicek 2014-09-03 08:42:05 UTC
There is detected difference between manual execution of qpid-send/receive (proving there is no perf. drop) and performance suite (using qc2_spout/drain clients).

Effort in understanding this difference between approaches was started (and will not be blocking this defect state transition).

Comment 31 Leonid Zhaldybin 2014-09-03 12:34:17 UTC
I executed the following manual tests on the same hardware:
 * The qpidd broker is running on one machine using the same configuration file as in our automated performance tests.
 * On the client machine, qpid-send is sending the messages to one queue on this broker, qpid-receive reads these messages from the same broker/queue.
 * Performance statistics are measured on the broker by qpid-queue-stats utility.
I iterated over three message sizes (10, 100 and 1000 bytes) and two client capacities (100 and 150), I'll attach the complete results. These numbers show that AMQP 1.0 performs better than AMQP 0-10 in almost all cases. The only exception was 1000 bytes messages + capacity 100 combination. Here, the enqueue numbers for 1.0 are lower by ~7%, on the other hand, message reading (dequeue) shows the biggest improvement (40%) over 0-10 for this case.
It seems that, generally speaking, there is no performance degradation when switching from AMQP 0-10 to AMQP1.0. The results QE reported previously are still worth exploring, but this particular issue should not block the release.

Comment 32 Leonid Zhaldybin 2014-09-03 12:35:38 UTC
Created attachment 934076 [details]
Results of manual performance testing using qpid-send/qpid-receive and qpid-queue-stats.

Comment 37 Leonid Zhaldybin 2014-09-09 13:33:57 UTC
Taking into consideration the fact that manual tests do not show performance degradation when switching from AMQP 0-10 to AMQP1.0 in the general case, QE does not see this issue as a blocker for the release any more. In case that we discover the conditions/settings causing performance degradation, the new bug will be reported.

-> VERIFIED