Bug 614857 - SessionState.cpp confirmed < (94+0) but only sent < (93+0)
SessionState.cpp confirmed < (94+0) but only sent < (93+0)
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
Development
All Linux
urgent Severity urgent
: 1.3
: ---
Assigned To: Alan Conway
ppecka
:
: 620906 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-15 08:20 EDT by ppecka
Modified: 2010-10-20 07:30 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-20 07:30:39 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
config files (2.35 KB, application/x-gzip)
2010-07-15 08:20 EDT, ppecka
no flags Details
log files (370.00 KB, application/x-gzip)
2010-07-19 07:57 EDT, ppecka
no flags Details
logs with debug+:cluster (370.00 KB, application/x-tar)
2010-07-20 04:18 EDT, ppecka
no flags Details

  None (edit)
Description ppecka 2010-07-15 08:20:57 EDT
Created attachment 432067 [details]
config files

Description of problem:
Cluster node left cluster unexpectedly

2010-07-12 13:49:12 error Execution exception: invalid-argument: guest@QPID.ibm-ls21-03.rhts.eng.bos.redhat.com.17827.1: confirmed < (94+0) but only sent < (93+0) (qpid/SessionState.cpp:151)
2010-07-12 13:49:12 critical cluster(10.16.64.78:2543 READY/error) local error 624 did not occur on member 10.16.67.13:17773: invalid-argument: guest@QPID.ibm-ls21-03.rhts.eng.bos.redhat.com.17827.1: confirmed < (94+0) but only sent < (93+0) (qpid/SessionState.cpp:151)
2010-07-12 13:49:12 critical Error delivering frames: local error did not occur on all cluster members : invalid-argument: guest@QPID.ibm-ls21-03.rhts.eng.bos.redhat.com.17827.1: confirmed < (94+0) but only sent < (93+0) (qpid/SessionState.cpp:151) (qpid/cluster/ErrorCheck.cpp:89)
2010-07-12 13:49:12 notice cluster(10.16.64.78:2543 LEFT/error) leaving cluster ppecka-cluster
2010-07-12 13:49:12 notice Shut down


Version-Release number of selected component (if applicable):
# rpm -qa | grep -P '(ais|qpid)' | sort -u
openais-0.80.6-16.el5
openais-devel-0.80.6-16.el5
python-qpid-0.7.946106-4.el5
qpid-cpp-client-0.7.946106-6.el5
qpid-cpp-client-devel-0.7.946106-6.el5
qpid-cpp-client-devel-docs-0.7.946106-6.el5
qpid-cpp-client-ssl-0.7.946106-6.el5
qpid-cpp-mrg-debuginfo-0.7.946106-6.el5
qpid-cpp-server-0.7.946106-6.el5
qpid-cpp-server-cluster-0.7.946106-6.el5
qpid-cpp-server-devel-0.7.946106-6.el5
qpid-cpp-server-ssl-0.7.946106-6.el5
qpid-cpp-server-store-0.7.946106-6.el5
qpid-cpp-server-xml-0.7.946106-6.el5
qpid-java-client-0.7.946106-5.el5
qpid-java-common-0.7.946106-5.el5
qpid-tests-0.7.946106-1.el5
qpid-tools-0.7.946106-6.el5




How reproducible:
happened only ONCE

Steps to Reproduce:
after several runs of: 

service qpidd stop
kill -9 $(pidof qpidd)
service openais stop
service openais start
service qpidd start
sleep 2
qpid-cluster-store -d /var/lib/qpidd/
qpid-cluster guest/guest@localhost:5672
qpid-cluster-store -d /var/lib/qpidd/
qpid-cluster -k -f guest/guest@localhost:5672
qpid-cluster-store -d /var/lib/qpidd/
Comment 1 Alan Conway 2010-07-16 11:35:48 EDT
Was this several runs on a single node cluster or was this being done in parallel on multiple nodes?
Comment 2 Alan Conway 2010-07-16 13:22:57 EDT
were there any other clients being run against the cluster or was qpid-cluster the only one?
Comment 3 ppecka 2010-07-19 04:45:53 EDT
it was done in parallel on two nodes.
other clients were qpid-tool || qpid-config - but this were used occasionally and not in this one particular run.
i'm gonna retest now
Comment 4 ppecka 2010-07-19 07:57:30 EDT
Created attachment 432839 [details]
log files

contains log files from two nodes
Comment 5 ppecka 2010-07-19 08:41:55 EDT
seem's that necessary step to reproduce is "rm" commad there:


service qpidd stop
kill -9 $(pidof qpidd)

rm -rf /var/lib/qpidd/cluster/ /var/lib/qpidd/_cluster* /var/lib/qpidd/rhm

service openais stop
service openais start
service qpidd start

qpid-cluster-store -d /var/lib/qpidd/
qpid-cluster guest/guest@localhost:5672
qpid-cluster-store -d /var/lib/qpidd/
qpid-cluster -k -f guest/guest@localhost:5672
qpid-cluster-store -d /var/lib/qpidd/ 

(In reply to comment #4)
> Created an attachment (id=432839) [details]
> log files
> 
> contains log files from two nodes    

(In reply to comment #4)
> Created an attachment (id=432839) [details]
> log files
> 
> contains log files from two nodes
Comment 6 Alan Conway 2010-07-19 09:55:49 EDT
Can you try to reproduce this with 

log-enable=info+
log-enable=debug+:cluster

That will give us some more information as to what is happening.
Comment 7 ppecka 2010-07-20 04:18:56 EDT
Created attachment 433097 [details]
logs with debug+:cluster

logs with debug+:cluster
Comment 8 Alan Conway 2010-07-20 17:44:00 EDT
The logs show some confusion about the membership of the cluster, where one broker believes the other has left but yet still receives messages from it.

I think this is probably caused by constantly stopping and restarting openais in parallel on the two nodes. I think this is an unrealistic scenario and it is probably safe to ignore it. 

Please try the test without restarting openais. If the bug can't be reproduced without restarting openais then I think it is safe to ignore it.
Comment 9 Alan Conway 2010-07-21 15:57:16 EDT
I've been able to reproduce what looks like the same bug by running the cluster_tests.py test_management in a loop for 3 hours. In this case there was no restarting of openais so comment #8 can probably be ignored
Comment 10 Alan Conway 2010-07-22 21:54:47 EDT
Fixed on trunk r966933 and mrg_1.3.x branch:
http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=0a6a7df8b13d7fc0c08aea778598dc15a1db0a77
Comment 11 Alan Conway 2010-08-03 14:58:25 EDT
*** Bug 620906 has been marked as a duplicate of this bug. ***
Comment 12 ppecka 2010-08-16 10:35:00 EDT
Verified on RHEL 5.5 i386 / x86_64:

# rpm -qa | grep -E '(qpid|ais)' | sort -u
openais-0.80.6-16.el5_5.7
openais-devel-0.80.6-16.el5_5.7
python-qpid-0.7.946106-12.el5
qpid-cpp-client-0.7.946106-12.el5
qpid-cpp-client-devel-0.7.946106-12.el5
qpid-cpp-client-devel-docs-0.7.946106-12.el5
qpid-cpp-client-ssl-0.7.946106-12.el5
qpid-cpp-mrg-debuginfo-0.7.946106-11.el5
qpid-cpp-server-0.7.946106-12.el5
qpid-cpp-server-cluster-0.7.946106-12.el5
qpid-cpp-server-devel-0.7.946106-12.el5
qpid-cpp-server-ssl-0.7.946106-12.el5
qpid-cpp-server-store-0.7.946106-12.el5
qpid-cpp-server-xml-0.7.946106-12.el5
qpid-java-client-0.7.946106-7.el5
qpid-java-common-0.7.946106-7.el5
qpid-tools-0.7.946106-8.el5

--> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.