Bug 502810

Summary: Policy on joining node is inaccurate
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED ERRATA QA Contact: Martin Kudlej <mkudlej>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.1.1CC: mkudlej
Target Milestone: 1.1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-06-12 17:39:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Sim 2009-05-27 09:27:06 UTC
Description of problem:

On joining a cluster the policies record of the current count and size of a given queue may be inaccurate due to enqueued messages effectively being counted twice.

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-7.el5

How reproducible:

100%

Steps to Reproduce:
1. start first cluster node
2. create queue with specific policy (e.g. qpid-config -a localhost:5673 add queue test-queue --max-queue-count 2 --limit-policy reject [1])
3. send a message to the queue (e.g. echo one | ./qpid/cpp/src/tests/sender)
4. start second node for cluster
5. send second message (e.g. echo two | ./qpid/cpp/src/tests/sender)

Actual results:

New member fails to add second message and exits with:

2009-may-27 05:18:50 error Execution exception: resource-limit-exceeded: Policy exceeded on test-queue by message 2 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90)
2009-may-27 05:18:50 critical 100.0.50.15:1404(READY/error) Error 236 outcome agrees with 100.0.50.15:1404
2009-may-27 05:18:50 critical 100.0.50.15:1404(READY/error) Error 236 did not occur on 100.0.50.15:1350
2009-may-27 05:18:50 error Error delivering frames: Aborted by local failure that did not occur on all replicas

Expected results:

Both nodes allow message to be added (both would then fail if a third message were enqueued). 

Additional info:

[1] using the reject policy as opposed to the default flow-to-disk policy avoids seeing a slightly different issue related to inconsitency as to durability of queue (see https://bugzilla.redhat.com/show_bug.cgi?id=498221)

[2] An alternative sequence to see the problem is to fill the queue to the limit before adding the new node; the new node will then fail to join. (e.g. if step 3. above sends two messages - for m in one two; do echo $m; done | ./qpid/cpp/src/tests/sender - then in step 4 the new node will fail to join as the policy will be erroneously exceeded as part of the udpate).

Comment 1 Gordon Sim 2009-05-27 14:45:08 UTC
Fixed on trunk by r779183 (attached to BZ498221 as a patch against r752581).

Comment 2 Gordon Sim 2009-05-28 10:25:18 UTC
Fixed in qpidd-0.5.752581-8

Comment 3 Martin Kudlej 2009-05-29 13:07:04 UTC
Tested on RHEL 5.3 qpidc-0.5.752581-9.el5, openais-0.80.3-22.el5_3.7 and on i386 it works.
On x86_64 I see error messages described above.

Step to reproduce:
1.port1=5555;host1=`hostname`:$port1;
2.rm -rf ./data1 qpidd1.log
3.qpidd --cluster-name testcluster --auth no --replication-queue replication-queue --create-replication-queue true --data-dir data1 --log-enable info+ -p $port1 >qpidd1.log 2>qpidd1.log
4.qpid-config -a $host1 add queue test-queue --max-queue-count 2 --limit-policy reject
5.echo one | ./sender -p $port1 
After this command raised error:
2009-may-29 08:50:54 warning Exception received from broker: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue by message 5 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90) [caused by 0 \x00:\x00]
Failed: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue by message 5 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90)

From qpidd1.log:
2009-may-29 08:46:38 notice SASL disabled: No Authentication Performed
2009-may-29 08:46:38 notice Listening on TCP port 5555
2009-may-29 08:46:38 info Policy file not specified. ACL Disabled, no ACL checking being done!
libibverbs: Fatal: couldn't read uverbs ABI version.
2009-may-29 08:46:38 info Rdma: Disabled: no rdma devices found
2009-may-29 08:46:38 info Registered replicating queue event listener
2009-may-29 08:46:38 info SSL plugin not enabled, you must set --qpid-ssl-cert-db to enable it.
2009-may-29 08:46:38 info Registered replication exchange
2009-may-29 08:46:38 notice 10.16.64.133:5653(INIT) joining cluster testcluster with url=amqp:tcp:10.16.64.133:5555
2009-may-29 08:46:38 info Registered xml exchange
2009-may-29 08:46:38 notice Broker running
2009-may-29 08:46:38 info 10.16.64.133:5653(JOINER) joining cluster: 10.16.64.81:4948(unknown) 10.16.64.81:5130(unknown) 10.16.64.133:5653(unknown)
2009-may-29 08:46:38 info 10.16.64.133:5653(UPDATEE) receiving update from 10.16.64.81:5130
2009-may-29 08:46:38 info 10.16.64.133:5653(CATCHUP) received update, starting catch-up
2009-may-29 08:46:38 info 10.16.64.133:5653(CATCHUP) member update: 10.16.64.81:4948(member) 10.16.64.81:5130(member) 10.16.64.133:5653(member)
2009-may-29 08:46:38 notice 10.16.64.133:5653(READY) caught up, active cluster member
2009-may-29 08:46:39 info Passivating links
2009-may-29 08:48:41 info Queue size exceeded policy for test-queue
2009-may-29 08:48:41 error Execution exception: resource-limit-exceeded: Policy exceeded on test-queue by message 3 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90)
2009-may-29 08:48:41 critical 10.16.64.133:5653(READY/error) Error 271 outcome agrees with 10.16.64.133:5653
2009-may-29 08:48:41 critical 10.16.64.133:5653(READY/error) Error 271 outcome agrees with 10.16.64.81:4948
2009-may-29 08:48:41 critical 10.16.64.133:5653(READY/error) Error 271 outcome agrees with 10.16.64.81:5130
2009-may-29 08:49:31 error Execution exception: resource-limit-exceeded: Policy exceeded on test-queue by message 4 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90)
2009-may-29 08:49:31 critical 10.16.64.133:5653(READY/error) Error 618 outcome agrees with 10.16.64.133:5653
2009-may-29 08:49:31 critical 10.16.64.133:5653(READY/error) Error 618 outcome agrees with 10.16.64.81:4948
2009-may-29 08:49:31 critical 10.16.64.133:5653(READY/error) Error 618 outcome agrees with 10.16.64.81:5130
2009-may-29 08:50:54 error Execution exception: resource-limit-exceeded: Policy exceeded on test-queue by message 5 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90)
2009-may-29 08:50:54 critical 10.16.64.133:5653(READY/error) Error 647 outcome agrees with 10.16.64.133:5653
2009-may-29 08:50:54 critical 10.16.64.133:5653(READY/error) Error 647 outcome agrees with 10.16.64.81:4948
2009-may-29 08:50:54 critical 10.16.64.133:5653(READY/error) Error 647 outcome agrees with 10.16.64.81:5130

Comment 4 Martin Kudlej 2009-05-29 14:34:32 UTC
Comment 3 was mistake, sorry. I works on both architectures so --> VERIFIED

Comment 6 errata-xmlrpc 2009-06-12 17:39:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1097.html