Bug 446213 - optimise transactions for transient-only messaging
Summary: optimise transactions for transient-only messaging
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.0
Hardware: All
OS: Linux
urgent
medium
Target Milestone: 1.1
: ---
Assignee: Gordon Sim
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-13 15:21 UTC by Gordon Sim
Modified: 2009-02-04 15:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-04 15:35:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0035 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 1.1 Release 2009-02-04 15:33:44 UTC

Description Gordon Sim 2008-05-13 15:21:17 UTC
Currently transactions will always result in a write to disk even if all the
published (or consumed) messages are transient. Would like to optimise this
special case.

Comment 1 Gordon Sim 2008-06-20 08:26:57 UTC

*** This bug has been marked as a duplicate of 452119 ***

Comment 2 Gordon Sim 2008-06-20 13:33:41 UTC
Mistakenly marked this a duplicate.

Comment 3 Carl Trieloff 2008-10-09 18:06:31 UTC
The broker is fine for publish side transactions. It matches sync for tx=1 ans sync and is within 20% of non-tx for async.

It does however have a slowdown on consume txns.

The issue is in the TxAccept, in that is looks at all messages and not just the txn range. All CPU time is attributed to evaluating a range that is to wide.


89153    42.7759  libqpidcommon.so.0.1.0   qpid::RangeSet<qpid::framing::SequenceNumber>::contains(qpid::framing::SequenceNumber const&) const
41731    20.0226  libqpidbroker.so.0.1.0   void std::list<qpid::broker::DeliveryRecord, std::allocator<qpid::broker::DeliveryRecord> >::remove_if<std::const_mem_fun_ref_t<bool, qpid::broker::DeliveryRecord> >(std::const_mem_fun_ref_t<bool, qpid::broker::DeliveryRecord>)
37460    17.9734  libqpidbroker.so.0.1.0   qpid::broker::DeliveryRecord::isRedundant() const
7896      3.7885  libqpidcommon.so.0.1.0   __gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > > std::lower_bound<__gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, qpid::Range<qpid::framing::SequenceNumber> >(__gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, __gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, qpid::Range<qpid::framing::SequenceNumber> const&)
7713      3.7007  libqpidbroker.so.0.1.0   qpid::broker::TxAccept::commit()
7627      3.6595  libqpidbroker.so.0.1.0   qpid::broker::TxAccept::prepare(qpid::broker::TransactionContext*)

Comment 4 Gordon Sim 2008-10-28 13:38:39 UTC
I've revised the handling of the tranasactional accepts to remove the redundant work mentioned above. With sync commit this is still slow; with async commit its much faster.

Comment 5 Frantisek Reznicek 2008-10-31 15:08:41 UTC
No clear test info, expectations are clear. Moving to NEEDINFO.

Comment 6 Frantisek Reznicek 2008-10-31 15:37:56 UTC
Mistakenly moved to ASSIGNED, returning back to MODIFIED. NEEDINFO flag set.

Comment 7 Gordon Sim 2008-10-31 16:23:59 UTC
There isn't really a clearly defined failure/success criteria for this on. In general a test such as  ./src/tests/perftest --tx 1 --async-commit true should not be 'too much slower' than a simple ./src/tests/perftest with no args. Very hard to specify a hard and fast rule for what that means in practice. You certainly should not see an order of magnitude difference. Is that sufficient?

Comment 9 Frantisek Reznicek 2008-12-10 13:09:40 UTC
The data summary says:
rev.724474
time { perftest --tx 1 --async-commit true; }
real    10m6.475s
time { perftest; }
real    1m15.135s

mrg1.0.1 rev.687156
time { perftest --tx 1; }  // more than 2 hours, stopped after approx. 2 hours
real    117m5.733s
time { perftest; }
real    0m29.285s

Detail transcript is pasted below.

After discussion with gsim (also pasted below) we come to conclusion that improvement is noticable, so bug is treated as validated (on RHEL5).

->VERIFIED


Detailed transcript:

qpidd --auth no --worker-threads 8 --mgmt-enable no

MRG-candidate
=============

==================================================================
dhcp-0-199:~# rpm -q python-qpid qpidc qpidc-perftest qpidd
python-qpid-0.3.722953-1.el5
qpidc-0.3.724474-2.el5
qpidc-perftest-0.3.724474-2.el5
qpidd-0.3.724474-2.el5

==================================================================
dhcp-0-199:~# time { perftest --tx 1 --async-commit true; }
Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done . done.
Processing 1 messages from sub_done . done.

Total 1000000 transfers of 1024 bytes in 606.416 seconds.

Publish transfers/sec:    
824.764
Average: 824.764

Subscribe transfers/sec:  
824.565
Average: 824.565

Total transfers/sec:      1649.03
Total Mbytes/sec: 1.61038

real    10m6.475s
user    0m10.332s
sys     6m3.207s

==================================================================
dhcp-0-199:~# time { perftest; }
Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done . done.
Processing 1 messages from sub_done . done.

Total 1000000 transfers of 1024 bytes in 75.0588 seconds.

Publish transfers/sec:    
6664.2
Average: 6664.2

Subscribe transfers/sec:  
6660.88
Average: 6660.88

Total transfers/sec:      13322.9
Total Mbytes/sec: 13.0106

real    1m15.135s
user    0m1.205s
sys     0m55.134s

MRG
===

==================================================================
dhcp-0-199:~# rpm -q python-qpid qpidc qpidc-perftest qpidd
python-qpid-0.2.668378-1.el5
qpidc-0.2.687156-1.el5
qpidc-perftest-0.2.687156-1.el5
qpidd-0.2.687156-1.el5

==================================================================
dhcp-0-199:~# time { perftest --tx 1; }
Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done . done.
Processing 1 messages from sub_done .
^C

real    117m5.733s
user    0m13.084s
sys     12m59.402s

==================================================================
dhcp-0-199:~# time { perftest; }
Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done . done.
Processing 1 messages from sub_done . done.

Total 1000000 transfers of 1024 bytes in 23.1444 seconds.

Publish transfers/sec:    
21693.7
Average: 21693.7

Subscribe transfers/sec:  
21621.4
Average: 21621.4

Total transfers/sec:      43207.1
Total Mbytes/sec: 42.1944

real    0m29.285s
user    0m1.651s
sys     0m10.969s


Discussion about BZ:
 
14:05:26
freznice  
gsim:Hi Gordon, I just got data on https://bugzilla.redhat.com/show_bug.cgi?id=446213
Would you have the time for review?
gsim: rev.724474
time { perftest --tx 1 --async-commit true; }
real    10m6.475s
time { perftest; }
real    1m15.135s

mrg1.0.1 rev.687156
time { perftest --tx 1; }  // more than 2 hours, stopped after approx. 2 hours
real    117m5.733s
time { perftest; }
real    0m29.285s
14:07:35
gsim  
freznice: I think we should just close it
freznice: there may well be more that can be done, but its an improvement on what was there and we didn't have any explicit targets
freznice: my view is there are more important defects to verify
freznice: paste in the perftest output by all means (i.e. throughputs)
14:08:54
freznice  
gsim: ok, I'll will
gsim: ok, thanks

Comment 11 errata-xmlrpc 2009-02-04 15:35:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html


Note You need to log in before you can comment on or make changes to this bug.