Bug 948489
| Summary: | C++ Broker Replicating Event Listener needs a limit on messages enqueued for replication | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Chuck Rolke <crolke> |
| Component: | qpid-cpp | Assignee: | Chuck Rolke <crolke> |
| Status: | CLOSED ERRATA | QA Contact: | Frantisek Reznicek <freznice> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.3 | CC: | crolke, esammons, freznice, jross, lzhaldyb, mcressma |
| Target Milestone: | 2.3.3 | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qpid-cpp-0.18-17 | Doc Type: | Bug Fix |
| Doc Text: |
Cause:
A broker may be configured to replicate queue events. (Messaging User Guide: Replicated Queues). On a large, busy system the replicated queue may generate many messages. If the replication queue consumers stop consuming messages then the broker is at risk of running out of memory as it fills with messages waiting for replication.
Consequence:
If the broker runs out of memory then it may crash or run so slowly that performance is unacceptable.
Fix:
A limit may be established that sets an upper limit on the number of messages in the replication event queue. If that limit is exceeded then replication is stopped and the event queue is flushed to recover the memory and message resources that the queue was using.
This fix is only for exceptional conditions only. It gives the customer the option of continuing without replication instead of having the broker bog down with messages that are not being consumed.
Result:
The broker loses the ability to replicate messages as replication is stopped. However, the broker continues to run and avoids an out-of-memory condition.
|
Story Points: | --- |
| Clone Of: | Environment: |
C++ Broker
|
|
| Last Closed: | 2013-07-11 13:37:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Chuck Rolke
2013-04-04 17:59:30 UTC
Correction for Steps to Reproduce: 1. Start a broker with command QPIDD_CMD="../qpidd -p 5672 -d --no-data-dir --no-module-dir --default-queue-limit 0 --auth no --log-enable=info+ --log-enable=debug+:Bridge --log-enable=trace+:Model --load-module replicating_listener.so --replication-queue RQueue --create-replication-queue --replication-listener-name goodbye --log-to-file replication.log" 2. Create a queue that generates events qpid-config add queue a --generate-queue-events 2 3. Generate events qpid-send -m N -b broker:port -a a # generates N events There are two problems with this feature: a] on rhel5 boost 1.33 does not handle good two options with common prefix tracked as bug 971295 there is ongoing fix in git see bug 971295 comment 4 b] unfortunately this feature patch is not correct from clustering point of view. When replication panic is triggered all cluster members except the one which does replication leave the cluster with message: critical cluster(192.168.6.7:28880 READY/error) local error 3163 did not occur on member 192.168.6.7:28856: invalid-argument: anonymous.53b76d94-6971-449b-b607-eac77d9e2e82:0: confirmed < (3+0) but only sent < (2+0) (qpid/SessionState.cpp:154) Testing on standalone broker however shows that feature is functional. Summary: a] added option has to be renamed b] patch has to be extended to work nicely in clustering environment -> ASSIGNED Feature is functional, cases mentioned in comment 14 coded in and passing. Still need about two days to prove additional cases. The feature is reliably functional, tested on RHEL 5.9 / 6.4 i[36]86 / x86_64 on packages: python-qpid-0.18-5.el5_9 python-qpid-qmf-0.18-18.el5_9 qpid-cpp-client-0.18-17.el5_9 qpid-cpp-client-devel-0.18-17.el5_9 qpid-cpp-client-devel-docs-0.18-17.el5_9 qpid-cpp-client-rdma-0.18-17.el5_9 qpid-cpp-client-ssl-0.18-17.el5_9 qpid-cpp-mrg-debuginfo-0.18-17.el5_9 qpid-cpp-server-0.18-17.el5_9 qpid-cpp-server-cluster-0.18-17.el5_9 qpid-cpp-server-devel-0.18-17.el5_9 qpid-cpp-server-ha-0.18-17.el5_9 qpid-cpp-server-rdma-0.18-17.el5_9 qpid-cpp-server-ssl-0.18-17.el5_9 qpid-cpp-server-store-0.18-17.el5_9 qpid-cpp-server-xml-0.18-17.el5_9 qpid-java-client-0.18-8.el5_9 qpid-java-common-0.18-8.el5_9 qpid-java-example-0.18-8.el5_9 qpid-jca-0.18-8.el5 qpid-jca-xarecovery-0.18-8.el5 qpid-jca-zip-0.18-8.el5 qpid-qmf-0.18-18.el5_9 qpid-qmf-debuginfo-0.18-18.el5_9 qpid-qmf-devel-0.18-18.el5_9 qpid-tests-0.18-2.el5 qpid-tools-0.18-10.el5_9 rh-qpid-cpp-tests-0.18-17.el5_9 ruby-qpid-qmf-0.18-18.el5_9 Initial test cases are passing, test cases from comment 14 passing, additional test cases also conditionaly passing (bug 980524, bug 980531). Few of the additional testcases proved that qpid-cpp-client-0.18-17 is behaving better than qpid-cpp-client-0.18-16. The feature has still at least two weak issues, which are addressed separately: * broker does not log 'Replication stopped on queue...' when replication is stopped via QMF - tracked as bug 980524 * clustered broker may loose information about message in / out (count/bytes) when after replication all cluster nodes are refreshed (all newbies) - tracked as bug 980531 -> VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1023.html |