Bug 446213
Summary: | optimise transactions for transient-only messaging | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Gordon Sim <gsim> |
Component: | qpid-cpp | Assignee: | Gordon Sim <gsim> |
Status: | CLOSED ERRATA | QA Contact: | Kim van der Riet <kim.vdriet> |
Severity: | medium | Docs Contact: | |
Priority: | urgent | ||
Version: | 1.0 | CC: | cctrieloff, freznice |
Target Milestone: | 1.1 | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-02-04 15:35:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gordon Sim
2008-05-13 15:21:17 UTC
*** This bug has been marked as a duplicate of 452119 *** Mistakenly marked this a duplicate. The broker is fine for publish side transactions. It matches sync for tx=1 ans sync and is within 20% of non-tx for async. It does however have a slowdown on consume txns. The issue is in the TxAccept, in that is looks at all messages and not just the txn range. All CPU time is attributed to evaluating a range that is to wide. 89153 42.7759 libqpidcommon.so.0.1.0 qpid::RangeSet<qpid::framing::SequenceNumber>::contains(qpid::framing::SequenceNumber const&) const 41731 20.0226 libqpidbroker.so.0.1.0 void std::list<qpid::broker::DeliveryRecord, std::allocator<qpid::broker::DeliveryRecord> >::remove_if<std::const_mem_fun_ref_t<bool, qpid::broker::DeliveryRecord> >(std::const_mem_fun_ref_t<bool, qpid::broker::DeliveryRecord>) 37460 17.9734 libqpidbroker.so.0.1.0 qpid::broker::DeliveryRecord::isRedundant() const 7896 3.7885 libqpidcommon.so.0.1.0 __gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > > std::lower_bound<__gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, qpid::Range<qpid::framing::SequenceNumber> >(__gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, __gnu_cxx::__normal_iterator<qpid::Range<qpid::framing::SequenceNumber> const*, std::vector<qpid::Range<qpid::framing::SequenceNumber>, qpid::InlineAllocator<std::allocator<qpid::Range<qpid::framing::SequenceNumber> >, 3ul> > >, qpid::Range<qpid::framing::SequenceNumber> const&) 7713 3.7007 libqpidbroker.so.0.1.0 qpid::broker::TxAccept::commit() 7627 3.6595 libqpidbroker.so.0.1.0 qpid::broker::TxAccept::prepare(qpid::broker::TransactionContext*) I've revised the handling of the tranasactional accepts to remove the redundant work mentioned above. With sync commit this is still slow; with async commit its much faster. No clear test info, expectations are clear. Moving to NEEDINFO. Mistakenly moved to ASSIGNED, returning back to MODIFIED. NEEDINFO flag set. There isn't really a clearly defined failure/success criteria for this on. In general a test such as ./src/tests/perftest --tx 1 --async-commit true should not be 'too much slower' than a simple ./src/tests/perftest with no args. Very hard to specify a hard and fast rule for what that means in practice. You certainly should not see an order of magnitude difference. Is that sufficient? The data summary says: rev.724474 time { perftest --tx 1 --async-commit true; } real 10m6.475s time { perftest; } real 1m15.135s mrg1.0.1 rev.687156 time { perftest --tx 1; } // more than 2 hours, stopped after approx. 2 hours real 117m5.733s time { perftest; } real 0m29.285s Detail transcript is pasted below. After discussion with gsim (also pasted below) we come to conclusion that improvement is noticable, so bug is treated as validated (on RHEL5). ->VERIFIED Detailed transcript: qpidd --auth no --worker-threads 8 --mgmt-enable no MRG-candidate ============= ================================================================== dhcp-0-199:~# rpm -q python-qpid qpidc qpidc-perftest qpidd python-qpid-0.3.722953-1.el5 qpidc-0.3.724474-2.el5 qpidc-perftest-0.3.724474-2.el5 qpidd-0.3.724474-2.el5 ================================================================== dhcp-0-199:~# time { perftest --tx 1 --async-commit true; } Processing 1 messages from sub_ready . done. Sending start 1 times to pub_start Processing 1 messages from pub_done . done. Processing 1 messages from sub_done . done. Total 1000000 transfers of 1024 bytes in 606.416 seconds. Publish transfers/sec: 824.764 Average: 824.764 Subscribe transfers/sec: 824.565 Average: 824.565 Total transfers/sec: 1649.03 Total Mbytes/sec: 1.61038 real 10m6.475s user 0m10.332s sys 6m3.207s ================================================================== dhcp-0-199:~# time { perftest; } Processing 1 messages from sub_ready . done. Sending start 1 times to pub_start Processing 1 messages from pub_done . done. Processing 1 messages from sub_done . done. Total 1000000 transfers of 1024 bytes in 75.0588 seconds. Publish transfers/sec: 6664.2 Average: 6664.2 Subscribe transfers/sec: 6660.88 Average: 6660.88 Total transfers/sec: 13322.9 Total Mbytes/sec: 13.0106 real 1m15.135s user 0m1.205s sys 0m55.134s MRG === ================================================================== dhcp-0-199:~# rpm -q python-qpid qpidc qpidc-perftest qpidd python-qpid-0.2.668378-1.el5 qpidc-0.2.687156-1.el5 qpidc-perftest-0.2.687156-1.el5 qpidd-0.2.687156-1.el5 ================================================================== dhcp-0-199:~# time { perftest --tx 1; } Processing 1 messages from sub_ready . done. Sending start 1 times to pub_start Processing 1 messages from pub_done . done. Processing 1 messages from sub_done . ^C real 117m5.733s user 0m13.084s sys 12m59.402s ================================================================== dhcp-0-199:~# time { perftest; } Processing 1 messages from sub_ready . done. Sending start 1 times to pub_start Processing 1 messages from pub_done . done. Processing 1 messages from sub_done . done. Total 1000000 transfers of 1024 bytes in 23.1444 seconds. Publish transfers/sec: 21693.7 Average: 21693.7 Subscribe transfers/sec: 21621.4 Average: 21621.4 Total transfers/sec: 43207.1 Total Mbytes/sec: 42.1944 real 0m29.285s user 0m1.651s sys 0m10.969s Discussion about BZ: 14:05:26 freznice gsim:Hi Gordon, I just got data on https://bugzilla.redhat.com/show_bug.cgi?id=446213 Would you have the time for review? gsim: rev.724474 time { perftest --tx 1 --async-commit true; } real 10m6.475s time { perftest; } real 1m15.135s mrg1.0.1 rev.687156 time { perftest --tx 1; } // more than 2 hours, stopped after approx. 2 hours real 117m5.733s time { perftest; } real 0m29.285s 14:07:35 gsim freznice: I think we should just close it freznice: there may well be more that can be done, but its an improvement on what was there and we didn't have any explicit targets freznice: my view is there are more important defects to verify freznice: paste in the perftest output by all means (i.e. throughputs) 14:08:54 freznice gsim: ok, I'll will gsim: ok, thanks An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0035.html |