Created attachment 493897 [details] modified python spout example Description of problem: Modified python spout example, which sends durable messages reveals performance issue compared to mrg 1.3 version. On version 2.0 the send ratio is approximately 1 durable msg per second. which is more than 100times slower than on 1.3 with very same spout client. Version-Release number of selected component (if applicable): # rpm -qa | grep -P '(qpid|qmf)' | sort -u python-qpid-0.10-1.el5 python-qpid-qmf-0.10-2.el5 qpidc-debuginfo-0.5.752581-3.el5 qpid-cpp-client-0.10-3.el5 qpid-cpp-client-devel-0.10-3.el5 qpid-cpp-client-devel-docs-0.10-3.el5 qpid-cpp-client-ssl-0.10-3.el5 qpid-cpp-server-0.10-3.el5 qpid-cpp-server-cluster-0.10-3.el5 qpid-cpp-server-devel-0.10-3.el5 qpid-cpp-server-ssl-0.10-3.el5 qpid-cpp-server-store-0.10-3.el5 qpid-cpp-server-xml-0.10-3.el5 qpid-java-client-0.10-2.el5 qpid-java-common-0.10-2.el5 qpid-java-example-0.10-2.el5 qpid-qmf-0.10-2.el5 qpid-qmf-devel-0.10-2.el5 qpid-tools-0.10-3.el5 How reproducible: 100% Steps to Reproduce: 1. MRG version 1.3 time ./spout_durable -c 10000 'my-queue; {create: always, delete:receiver, node:{durable:True}}' real 0m48.786s user 0m13.378s sys 0m3.164s 2. MRG version 2.0 time ./spout_durable -c 100 'my-queue; {create: always, delete:receiver, node:{durable:True}}' real 1m40.004s user 0m0.290s sys 0m0.030s
I have confirmed by testing against trunk that this slow-down can indeed be observed and I have tracked the checkin to r.1072356 (the store will use r.4440). As this checkin has to do with producer flow control, I am guessing that the test will need to be modified in some way to accommodate a change in client behaviour as a result of this change. Assigning to Ken Giusti for a closer look. I doubt this would rise to the level of a blocker; Ken can confirm.
Further testing shows that if the store is loaded and the message is marked durable, then the transmission is slow. It appears that checkin r.1072356 enforces synchronous transmission. The store, not seeing ongoing messages relies on the 1-second flush timer to flush the message, and only then is the message acknowledged and the next message sent. Changing the send options to be asynchronous snd.send(msg, sync=False) solves the problem, and all messages are sent without waiting for the store to write the message to disk.
Created attachment 494900 [details] Patch adding durable and sync options to python spout This patch makes the durability and synchronous changes needed. Running the tests using the -d and not using the -s option will fix this problem.
Kim is correct - this behaviour is "by design". The bugfix that makes this change is: https://issues.apache.org/jira/browse/QPID-2921 This bugfix is in 2.0. Prior to 2.0, the broker would not guarantee that the messages were flushed to store before it sent the ack. This bugfix forces the broker to delay the ack until the message has been flushed to store. As Kim pointed out, the modified spout code uses the default "sync" behavior when it calls send() - this behavior requires the send to block until the message has been acked. Since the example is only sending 1 message at a time, the "write cache" used by the journal is not filling up, and thus it waits for the 1 second timer to expire before it is written (and the ack is then sent). This results in 1 message xfer per second. The example code can be sped up as Kim pointed out by setting the sync=False flag in the send() call. Once all messages are sent, the example can call "snd.sync()" to wait for all messages to be written to disk. This improves the performance greatly. In the end, this is a tradeoff between performance and certainty of getting the messages to disk. The application has to choose how many messages it will allow to be outstanding and possibly lost should the broker crash, vs how fast it needs to transmit messages.
"Prior to 2.0, the broker would not guarantee that the messages were flushed to store before it sent the ack." - I don't believe that is correct... The completions sent to the sending client were held up previously pending completions on all enqueues. Is it possible that we have introduced a regression here where the broker no longer flushes the store directly (which it used to do in this case)?
It's possible, but I don't recall specifically removing a flush from the broker's receive path. Did the broker previously flush to disk each time a durable message was enqueued? My understanding of the store is that this would introduce poor disk write performance - as opposed to letting a disk buffer's worth of messages get scheduled for writing to store as a single unit. I'll have to take a closer look at a 1.3 broker's behavior against the testcase.
The broker used to flush when a sync came in and there were some messages pending enqueue completion (i.e. not for every durable message, but when needed). This was done in the old IncompleteMessageList class, when handling a sync request. I'm re-opening for now until we are sure there is no change here. The JMS client will often use the sync-per-durable-message pattern which is slow enough as it is without further introducing delays waiting for the autoflush timeout.
Understood and I agree - the sync handling has changed quite a bit, most likely a regression. I'm on it now.
Regression - see upstream jira: https://issues.apache.org/jira/browse/QPID-3252
This means that users of the python api in synchronous mode (the default) would not have the option of issuing explicit sync calls to avoid waiting on the store's timer. If that's correct, this is definitely 2.0 blocker material.
Fix provided upstream: http://svn.apache.org/viewvc?view=revision&revision=1101864 pending blocker approval....
Merged to mrg_2.0.x branch: http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=692bf60596cd78ad5409857ae307c170c389853d Testing notes for QA: To exercise this fix's codepath, I suggest running a long duration test that: 1) enables persistence 2) enables flow control, and occasionally triggers it 3) Runs several producer clients that send durable messages with the sync parameter to the send() method explicitly set to false, and occasionally calls session.sync() at random intervals. Pseudocode: While (some long duration) do For (some random batch size) do msg = create durable msg sender.send(msg, sync=False) End-For session.sync() End-While Also, verify no appreciable performance degradation with respect to MRG1.3 when sending transient (non-durable) message traffic (original problem report). thanks, -K
Created attachment 501309 [details] Modified spout client for measuring performace
Overview of measured duration of sending messages by using the attached client "spout_durable_mod" according to the requirement: Durable messages ---------------- 100 messages: Rhel5-i (2.0) 1.413 s Rhel5-i (1.3.2) 1.540 s Rhel5-i (1.3) 1.880 s Rhel5-x (2.0) 1.520 s Rhel5-x (1.3.2) 2.404 s Rhel6-i (2.0) 0.934 s Rhel6-x (2.0) 0.855 s 1000 messages: Rhel5-i (2.0) 3.296 s Rhel5-i (1.3.2) 3.818 s Rhel5-i (1.3) 3.887 s Rhel5-x (2.0) 3.279 s Rhel5-x (1.3.2) 3.946 s Rhel6-i (2.0) 2.679 s Rhel6-x (2.0) 2.489 s 10000 messages: Rhel5-i (2.0) 20.456 s Rhel5-i (1.3.2) 22.418 s Rhel5-i (1.3) 23.821 s Rhel5-x (2.0) 21.003 s Rhel5-x (1.3.2) 20.832 s Rhel6-i (2.0) 20.324 s Rhel6-x (2.0) 20.448 s Non-durable messages -------------------- 100 messages: Rhel5-i (2.0) 0.239 s Rhel5-i (1.3.2) 0.239 s Rhel5-i (1.3) 0.239 s Rhel5-x (2.0) 0.285 s Rhel5-x (1.3.2) 0.281 s Rhel6-i (2.0) 0.428 s Rhel6-x (2.0) 0.407 s 1000 messages: Rhel5-i (2.0) 1.639 s Rhel5-i (1.3.2) 1.818 s Rhel5-i (1.3) 1.833 s Rhel5-x (2.0) 1.806 s Rhel5-x (1.3.2) 1.745 s Rhel6-i (2.0) 1.504 s Rhel6-x (2.0) 1.693 s 10000 messages: Rhel5-i (2.0) 16.268 s Rhel5-i (1.3.2) 16.203 s Rhel5-i (1.3) 16.053 s Rhel5-x (2.0) 16.582 s Rhel5-x (1.3.2) 16.403 s Rhel6-i (2.0) 16.350 s Rhel6-x (2.0) 15.000 s There is no significant performance degradation (using modified client behaviour) in the mrg 2.0 version. --> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0890.html