Bug 446213

Summary: optimise transactions for transient-only messaging
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED ERRATA QA Contact: Kim van der Riet <kim.vdriet>
Severity: medium Docs Contact:
Priority: urgent    
Version: 1.0CC: cctrieloff, freznice
Target Milestone: 1.1Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-04 15:35:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Sim 2008-05-13 15:21:17 UTC
Currently transactions will always result in a write to disk even if all the
published (or consumed) messages are transient. Would like to optimise this
special case.

Comment 1 Gordon Sim 2008-06-20 08:26:57 UTC

*** This bug has been marked as a duplicate of 452119 ***

Comment 2 Gordon Sim 2008-06-20 13:33:41 UTC
Mistakenly marked this a duplicate.

Comment 3 Carl Trieloff 2008-10-09 18:06:31 UTC
The broker is fine for publish side transactions. It matches sync for tx=1 ans sync and is within 20% of non-tx for async.

It does however have a slowdown on consume txns.

The issue is in the TxAccept, in that is looks at all messages and not just the txn range. All CPU time is attributed to evaluating a range that is to wide.


89153    42.7759  libqpidcommon.so.0.1.0   qpid::RangeSet<qpid::framing::SequenceNumber>::contains(qpid::framing::SequenceNumber const&) const
41731    20.0226  libqpidbroker.so.0.1.0   void std::list<qpid::broker::DeliveryRecord, std::allocator<qpid::broker::DeliveryRecord> >::remove_if<std::const_mem_fun_ref_t<bool, qpid::broker::DeliveryRecord> >(std::const_mem_fun_ref_t<bool, qpid::broker::DeliveryRecord>)
37460    17.9734  libqpidbroker.so.0.1.0   qpid::broker::DeliveryRecord::isRedundant() const
7896      3.7885  libqpidcommon.so.0.1.0   __gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > > std::lower_bound<__gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, qpid::Range<qpid::framing::SequenceNumber> >(__gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, __gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, qpid::Range<qpid::framing::SequenceNumber> const&)
7713      3.7007  libqpidbroker.so.0.1.0   qpid::broker::TxAccept::commit()
7627      3.6595  libqpidbroker.so.0.1.0   qpid::broker::TxAccept::prepare(qpid::broker::TransactionContext*)

Comment 4 Gordon Sim 2008-10-28 13:38:39 UTC
I've revised the handling of the tranasactional accepts to remove the redundant work mentioned above. With sync commit this is still slow; with async commit its much faster.

Comment 5 Frantisek Reznicek 2008-10-31 15:08:41 UTC
No clear test info, expectations are clear. Moving to NEEDINFO.

Comment 6 Frantisek Reznicek 2008-10-31 15:37:56 UTC
Mistakenly moved to ASSIGNED, returning back to MODIFIED. NEEDINFO flag set.

Comment 7 Gordon Sim 2008-10-31 16:23:59 UTC
There isn't really a clearly defined failure/success criteria for this on. In general a test such as  ./src/tests/perftest --tx 1 --async-commit true should not be 'too much slower' than a simple ./src/tests/perftest with no args. Very hard to specify a hard and fast rule for what that means in practice. You certainly should not see an order of magnitude difference. Is that sufficient?

Comment 9 Frantisek Reznicek 2008-12-10 13:09:40 UTC
The data summary says:
rev.724474
time { perftest --tx 1 --async-commit true; }
real    10m6.475s
time { perftest; }
real    1m15.135s

mrg1.0.1 rev.687156
time { perftest --tx 1; }  // more than 2 hours, stopped after approx. 2 hours
real    117m5.733s
time { perftest; }
real    0m29.285s

Detail transcript is pasted below.

After discussion with gsim (also pasted below) we come to conclusion that improvement is noticable, so bug is treated as validated (on RHEL5).

->VERIFIED


Detailed transcript:

qpidd --auth no --worker-threads 8 --mgmt-enable no

MRG-candidate
=============

==================================================================
dhcp-0-199:~# rpm -q python-qpid qpidc qpidc-perftest qpidd
python-qpid-0.3.722953-1.el5
qpidc-0.3.724474-2.el5
qpidc-perftest-0.3.724474-2.el5
qpidd-0.3.724474-2.el5

==================================================================
dhcp-0-199:~# time { perftest --tx 1 --async-commit true; }
Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done . done.
Processing 1 messages from sub_done . done.

Total 1000000 transfers of 1024 bytes in 606.416 seconds.

Publish transfers/sec:    
824.764
Average: 824.764

Subscribe transfers/sec:  
824.565
Average: 824.565

Total transfers/sec:      1649.03
Total Mbytes/sec: 1.61038

real    10m6.475s
user    0m10.332s
sys     6m3.207s

==================================================================
dhcp-0-199:~# time { perftest; }
Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done . done.
Processing 1 messages from sub_done . done.

Total 1000000 transfers of 1024 bytes in 75.0588 seconds.

Publish transfers/sec:    
6664.2
Average: 6664.2

Subscribe transfers/sec:  
6660.88
Average: 6660.88

Total transfers/sec:      13322.9
Total Mbytes/sec: 13.0106

real    1m15.135s
user    0m1.205s
sys     0m55.134s

MRG
===

==================================================================
dhcp-0-199:~# rpm -q python-qpid qpidc qpidc-perftest qpidd
python-qpid-0.2.668378-1.el5
qpidc-0.2.687156-1.el5
qpidc-perftest-0.2.687156-1.el5
qpidd-0.2.687156-1.el5

==================================================================
dhcp-0-199:~# time { perftest --tx 1; }
Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done . done.
Processing 1 messages from sub_done .
^C

real    117m5.733s
user    0m13.084s
sys     12m59.402s

==================================================================
dhcp-0-199:~# time { perftest; }
Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done . done.
Processing 1 messages from sub_done . done.

Total 1000000 transfers of 1024 bytes in 23.1444 seconds.

Publish transfers/sec:    
21693.7
Average: 21693.7

Subscribe transfers/sec:  
21621.4
Average: 21621.4

Total transfers/sec:      43207.1
Total Mbytes/sec: 42.1944

real    0m29.285s
user    0m1.651s
sys     0m10.969s


Discussion about BZ:
 
14:05:26
freznice  
gsim:Hi Gordon, I just got data on https://bugzilla.redhat.com/show_bug.cgi?id=446213
Would you have the time for review?
gsim: rev.724474
time { perftest --tx 1 --async-commit true; }
real    10m6.475s
time { perftest; }
real    1m15.135s

mrg1.0.1 rev.687156
time { perftest --tx 1; }  // more than 2 hours, stopped after approx. 2 hours
real    117m5.733s
time { perftest; }
real    0m29.285s
14:07:35
gsim  
freznice: I think we should just close it
freznice: there may well be more that can be done, but its an improvement on what was there and we didn't have any explicit targets
freznice: my view is there are more important defects to verify
freznice: paste in the perftest output by all means (i.e. throughputs)
14:08:54
freznice  
gsim: ok, I'll will
gsim: ok, thanks

Comment 11 errata-xmlrpc 2009-02-04 15:35:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html