Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 948489

Summary: C++ Broker Replicating Event Listener needs a limit on messages enqueued for replication
Product: Red Hat Enterprise MRG Reporter: Chuck Rolke <crolke>
Component: qpid-cppAssignee: Chuck Rolke <crolke>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: unspecified Docs Contact:
Priority: high    
Version: 2.3CC: crolke, esammons, freznice, jross, lzhaldyb, mcressma
Target Milestone: 2.3.3   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-0.18-17 Doc Type: Bug Fix
Doc Text:
Cause: A broker may be configured to replicate queue events. (Messaging User Guide: Replicated Queues). On a large, busy system the replicated queue may generate many messages. If the replication queue consumers stop consuming messages then the broker is at risk of running out of memory as it fills with messages waiting for replication. Consequence: If the broker runs out of memory then it may crash or run so slowly that performance is unacceptable. Fix: A limit may be established that sets an upper limit on the number of messages in the replication event queue. If that limit is exceeded then replication is stopped and the event queue is flushed to recover the memory and message resources that the queue was using. This fix is only for exceptional conditions only. It gives the customer the option of continuing without replication instead of having the broker bog down with messages that are not being consumed. Result: The broker loses the ability to replicate messages as replication is stopped. However, the broker continues to run and avoids an out-of-memory condition.
Story Points: ---
Clone Of: Environment:
C++ Broker
Last Closed: 2013-07-11 13:37:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chuck Rolke 2013-04-04 17:59:30 UTC
Description of problem:

A system has replication turned on and events are being queued for replication. When the peer system stops receiving messages then the replication queue grows unbounded and the broker runs out of memory.

A customer is requesting that an optional upper limit may be placed on the replication queue size. When that limit is exceed then replication is stopped and the replication queue is purged to reclaim its memory. The theory is that it is preferable to abandon replication than to drive the broker out of memory and crash.

Version-Release number of selected component (if applicable):

2.3

How reproducible:

100%

Steps to Reproduce:
1. Start a broker with command 
   QPIDD_CMD="../qpidd -p 5672 -d --no-data-dir --no-module-dir --default-queue-limit 0 --auth no --log-enable=info+ --log-enable=debug+:Bridge --log-enable=trace+:Model --load-module replicating_listener.so --replication-queue RQueue --create-replication-queue --replication-listener-name goodbye --log-to-file replication.log"

2. In a loop create and delete queues
   qpid-config add queue a
   qpid-config del queue a

3. Replicated events fill the queue RQueue.
  
Actual results:

If enough events fill the replication queue then the broker crashes.

Expected results:

At the specified message limit the broker will discard all the messages in the replication queue and stop replication. 
The event is identified by a warning in the event log and the issuance of a QMF Event.

Additional info:

Comment 1 Chuck Rolke 2013-04-05 14:13:49 UTC
Correction for Steps to Reproduce:

1. Start a broker with command 
   QPIDD_CMD="../qpidd -p 5672 -d --no-data-dir --no-module-dir --default-queue-limit 0 --auth no --log-enable=info+ --log-enable=debug+:Bridge --log-enable=trace+:Model --load-module replicating_listener.so --replication-queue RQueue --create-replication-queue --replication-listener-name goodbye --log-to-file replication.log"

2. Create a queue that generates events
   qpid-config add queue a --generate-queue-events 2

3. Generate events
   qpid-send -m N -b broker:port -a a   # generates N events

Comment 7 Frantisek Reznicek 2013-06-09 17:08:50 UTC
There are two problems with this feature:
a] on rhel5 boost 1.33 does not handle good two options with common prefix
   tracked as bug 971295
   there is ongoing fix in git see bug 971295 comment 4
b] unfortunately this feature patch is not correct from clustering point of view. When replication panic is triggered all cluster members except the one which does replication leave the cluster with message:

   critical cluster(192.168.6.7:28880 READY/error) local error 3163 did not occur on member 192.168.6.7:28856: invalid-argument: anonymous.53b76d94-6971-449b-b607-eac77d9e2e82:0: confirmed < (3+0) but only sent < (2+0) (qpid/SessionState.cpp:154)


Testing on standalone broker however shows that feature is functional.


Summary:
a] added option has to be renamed
b] patch has to be extended to work nicely in clustering environment

-> ASSIGNED

Comment 20 Frantisek Reznicek 2013-07-01 14:13:27 UTC
Feature is functional, cases mentioned in comment 14 coded in and passing.
Still need about two days to prove additional cases.

Comment 21 Frantisek Reznicek 2013-07-02 16:01:49 UTC
The feature is reliably functional, tested on RHEL 5.9 / 6.4 i[36]86 / x86_64 on packages:

python-qpid-0.18-5.el5_9                                                                                                                                                     
python-qpid-qmf-0.18-18.el5_9                                                                                                                                                
qpid-cpp-client-0.18-17.el5_9                                                                                                                                                
qpid-cpp-client-devel-0.18-17.el5_9                                                                                                                                          
qpid-cpp-client-devel-docs-0.18-17.el5_9                                                                                                                                     
qpid-cpp-client-rdma-0.18-17.el5_9                                                                                                                                           
qpid-cpp-client-ssl-0.18-17.el5_9                                                                                                                                            
qpid-cpp-mrg-debuginfo-0.18-17.el5_9                                                                                                                                         
qpid-cpp-server-0.18-17.el5_9                                                                                                                                                
qpid-cpp-server-cluster-0.18-17.el5_9                                                                                                                                        
qpid-cpp-server-devel-0.18-17.el5_9                                                                                                                                          
qpid-cpp-server-ha-0.18-17.el5_9                                                                                                                                             
qpid-cpp-server-rdma-0.18-17.el5_9                                                                                                                                           
qpid-cpp-server-ssl-0.18-17.el5_9                                                                                                                                            
qpid-cpp-server-store-0.18-17.el5_9                                                                                                                                          
qpid-cpp-server-xml-0.18-17.el5_9                                                                                                                                            
qpid-java-client-0.18-8.el5_9                                                                                                                                                
qpid-java-common-0.18-8.el5_9                                                                                                                                                
qpid-java-example-0.18-8.el5_9                                                                                                                                               
qpid-jca-0.18-8.el5                                                                                                                                                          
qpid-jca-xarecovery-0.18-8.el5                                                                                                                                               
qpid-jca-zip-0.18-8.el5                                                                                                                                                      
qpid-qmf-0.18-18.el5_9                                                                                                                                                       
qpid-qmf-debuginfo-0.18-18.el5_9                                                                                                                                             
qpid-qmf-devel-0.18-18.el5_9                                                                                                                                                 
qpid-tests-0.18-2.el5                                                                                                                                                        
qpid-tools-0.18-10.el5_9                                                                                                                                                     
rh-qpid-cpp-tests-0.18-17.el5_9                                                                                                                                              
ruby-qpid-qmf-0.18-18.el5_9                                                                                                                                                  


Initial test cases are passing, test cases from comment 14 passing, additional test cases also conditionaly passing (bug 980524, bug 980531).
Few of the additional testcases proved that qpid-cpp-client-0.18-17 is behaving better than qpid-cpp-client-0.18-16.


The feature has still at least two weak issues, which are addressed separately:
 * broker does not log 'Replication stopped on queue...' when replication is stopped via QMF - tracked as bug 980524
 * clustered broker may loose information about message in / out (count/bytes) when after replication all cluster nodes are refreshed (all newbies) - tracked as bug 980531


-> VERIFIED

Comment 24 errata-xmlrpc 2013-07-11 13:37:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1023.html