Description of problem: On joining a cluster the policies record of the current count and size of a given queue may be inaccurate due to enqueued messages effectively being counted twice. Version-Release number of selected component (if applicable): qpidd-0.5.752581-7.el5 How reproducible: 100% Steps to Reproduce: 1. start first cluster node 2. create queue with specific policy (e.g. qpid-config -a localhost:5673 add queue test-queue --max-queue-count 2 --limit-policy reject [1]) 3. send a message to the queue (e.g. echo one | ./qpid/cpp/src/tests/sender) 4. start second node for cluster 5. send second message (e.g. echo two | ./qpid/cpp/src/tests/sender) Actual results: New member fails to add second message and exits with: 2009-may-27 05:18:50 error Execution exception: resource-limit-exceeded: Policy exceeded on test-queue by message 2 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90) 2009-may-27 05:18:50 critical 100.0.50.15:1404(READY/error) Error 236 outcome agrees with 100.0.50.15:1404 2009-may-27 05:18:50 critical 100.0.50.15:1404(READY/error) Error 236 did not occur on 100.0.50.15:1350 2009-may-27 05:18:50 error Error delivering frames: Aborted by local failure that did not occur on all replicas Expected results: Both nodes allow message to be added (both would then fail if a third message were enqueued). Additional info: [1] using the reject policy as opposed to the default flow-to-disk policy avoids seeing a slightly different issue related to inconsitency as to durability of queue (see https://bugzilla.redhat.com/show_bug.cgi?id=498221) [2] An alternative sequence to see the problem is to fill the queue to the limit before adding the new node; the new node will then fail to join. (e.g. if step 3. above sends two messages - for m in one two; do echo $m; done | ./qpid/cpp/src/tests/sender - then in step 4 the new node will fail to join as the policy will be erroneously exceeded as part of the udpate).
Fixed on trunk by r779183 (attached to BZ498221 as a patch against r752581).
Fixed in qpidd-0.5.752581-8
Tested on RHEL 5.3 qpidc-0.5.752581-9.el5, openais-0.80.3-22.el5_3.7 and on i386 it works. On x86_64 I see error messages described above. Step to reproduce: 1.port1=5555;host1=`hostname`:$port1; 2.rm -rf ./data1 qpidd1.log 3.qpidd --cluster-name testcluster --auth no --replication-queue replication-queue --create-replication-queue true --data-dir data1 --log-enable info+ -p $port1 >qpidd1.log 2>qpidd1.log 4.qpid-config -a $host1 add queue test-queue --max-queue-count 2 --limit-policy reject 5.echo one | ./sender -p $port1 After this command raised error: 2009-may-29 08:50:54 warning Exception received from broker: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue by message 5 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90) [caused by 0 \x00:\x00] Failed: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue by message 5 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90) From qpidd1.log: 2009-may-29 08:46:38 notice SASL disabled: No Authentication Performed 2009-may-29 08:46:38 notice Listening on TCP port 5555 2009-may-29 08:46:38 info Policy file not specified. ACL Disabled, no ACL checking being done! libibverbs: Fatal: couldn't read uverbs ABI version. 2009-may-29 08:46:38 info Rdma: Disabled: no rdma devices found 2009-may-29 08:46:38 info Registered replicating queue event listener 2009-may-29 08:46:38 info SSL plugin not enabled, you must set --qpid-ssl-cert-db to enable it. 2009-may-29 08:46:38 info Registered replication exchange 2009-may-29 08:46:38 notice 10.16.64.133:5653(INIT) joining cluster testcluster with url=amqp:tcp:10.16.64.133:5555 2009-may-29 08:46:38 info Registered xml exchange 2009-may-29 08:46:38 notice Broker running 2009-may-29 08:46:38 info 10.16.64.133:5653(JOINER) joining cluster: 10.16.64.81:4948(unknown) 10.16.64.81:5130(unknown) 10.16.64.133:5653(unknown) 2009-may-29 08:46:38 info 10.16.64.133:5653(UPDATEE) receiving update from 10.16.64.81:5130 2009-may-29 08:46:38 info 10.16.64.133:5653(CATCHUP) received update, starting catch-up 2009-may-29 08:46:38 info 10.16.64.133:5653(CATCHUP) member update: 10.16.64.81:4948(member) 10.16.64.81:5130(member) 10.16.64.133:5653(member) 2009-may-29 08:46:38 notice 10.16.64.133:5653(READY) caught up, active cluster member 2009-may-29 08:46:39 info Passivating links 2009-may-29 08:48:41 info Queue size exceeded policy for test-queue 2009-may-29 08:48:41 error Execution exception: resource-limit-exceeded: Policy exceeded on test-queue by message 3 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90) 2009-may-29 08:48:41 critical 10.16.64.133:5653(READY/error) Error 271 outcome agrees with 10.16.64.133:5653 2009-may-29 08:48:41 critical 10.16.64.133:5653(READY/error) Error 271 outcome agrees with 10.16.64.81:4948 2009-may-29 08:48:41 critical 10.16.64.133:5653(READY/error) Error 271 outcome agrees with 10.16.64.81:5130 2009-may-29 08:49:31 error Execution exception: resource-limit-exceeded: Policy exceeded on test-queue by message 4 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90) 2009-may-29 08:49:31 critical 10.16.64.133:5653(READY/error) Error 618 outcome agrees with 10.16.64.133:5653 2009-may-29 08:49:31 critical 10.16.64.133:5653(READY/error) Error 618 outcome agrees with 10.16.64.81:4948 2009-may-29 08:49:31 critical 10.16.64.133:5653(READY/error) Error 618 outcome agrees with 10.16.64.81:5130 2009-may-29 08:50:54 error Execution exception: resource-limit-exceeded: Policy exceeded on test-queue by message 5 of size 3 , policy: size: max=104857600, current=6; count: max=2, current=2; type=reject (qpid/broker/QueuePolicy.cpp:90) 2009-may-29 08:50:54 critical 10.16.64.133:5653(READY/error) Error 647 outcome agrees with 10.16.64.133:5653 2009-may-29 08:50:54 critical 10.16.64.133:5653(READY/error) Error 647 outcome agrees with 10.16.64.81:4948 2009-may-29 08:50:54 critical 10.16.64.133:5653(READY/error) Error 647 outcome agrees with 10.16.64.81:5130
Comment 3 was mistake, sorry. I works on both architectures so --> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1097.html