Description of problem: There are multiple observations that reconnecting sender (csharp.example.spout, with bug 657523 c#2 and bug 629926 c#9 fixes) fails to reconnect with following exception: Exception Org.Apache.Qpid.Messaging.QpidException: Failed to connect within reconnect timeout at Org.Apache.Qpid.Messaging.Sender.Send(Message mmsgp, Boolean sync) at Org.Apache.Qpid.Messaging.Sender.Send(Message mmsgp) at Org.Apache.Qpid.Messaging.Examples.Spout.Main(String[] args) in c:\qpid\do tnet_examples\examples\csharp.example.spout\csharp.example.spout.cs:line 96. Moreover at the bad reconnect point I can see that messages in the queue are purged, which seems to be another broker related problem. Seen on Win XP / Server 2003 x64 vs. RHEL4.8 i386. Version-Release number of selected component (if applicable): qpid-cpp-winsdk-1.3.0.24-*.zip How reproducible: 40% Steps to Reproduce: 1. extract qpid winsdk 2. patch csharp.example.spout .NET example (bug 657523 c#2 and bug 629926 c#9 fixes) 3. compile the examples 4. execute sender with following config ...\csharp.example.spout.exe --broker <host>:<port> --connection-options "{reconnect : True, reconnect-timeout : 2, reconnect_limit : 10, reconnect_interval : 3, username: guest, password:guest}" --content "<random-string>" --count <random-number-gt-1000> "<rand-str-addr>; { create:sender, delete: receiver, node: { type:queue, durable:True } }" 5. while spout is running restart the broker on <host> 'service qpidd restart' 6. monitor queue depth by qc_list_objects.py or qpid-tool to see depth check csharp.example.spout.exe messages 7. Repeat point 4. - 6. until you see a] Reconnection failed as above b] queue depth dropped See more details in last section... Actual results: .NET client reconnection function is not reliable, it sometimes fails and messages might be lost Expected results: .NET client reconnection function should be reliable, no message loss should happen Additional info: Reconnection issue description: C:\qpid>c:\qpid\bin\csharp.example.spout.exe --broker mrg-qe-07.lab.eng.brq.redh at.com:5672 --connection-options "{reconnect : True, reconnect-timeout : 2, reco nnect_limit : 10, reconnect_interval : 3, username: guest, password:guest}" --co ntent 1w11N67MJspuIFVzuIk82BCMI5v8gXCNZZ3q4DUn --count 1004 "VPO3E25FnRz2FBx3Qk8 V4czEah44aTu3OZe2; { create:sender, delete: receiver, node: { type:queue, durabl e:True } }" 1> service qpidd restart # reconnection triggered 2010-12-02 15:22:51 warning Connection [2298 10.34.33.60:5672] closed 2010-12-02 15:22:52 warning Connection [2299 10.34.33.60:5672] closed 2010-12-02 15:22:53 warning Connect failed: No connection could be made because the target machine actively refused it. : (..\..\qpid-cpp-mrg-0.7.946106\cpp\sr c\qpid\sys\windows\Socket.cpp:219) 2010-12-02 15:22:53 warning Connection closed >> reconnection succeeded >> queue contains: >> ./qc_list_objects.py -b mrg-qe-07 -qv | grep VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe >> VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe2 durable msg|byteEnq/Deq:1004/0|40160/0 unacked:0 consumer_cnt:0 binding_cnt:1 C:\qpid>c:\qpid\bin\csharp.example.spout.exe --broker mrg-qe-07.lab.eng.brq.redh at.com:5672 --connection-options "{reconnect : True, reconnect-timeout : 2, reco nnect_limit : 10, reconnect_interval : 3, username: guest, password:guest}" --co ntent 1w11N67MJspuIFVzuIk82BCMI5v8gXCNZZ3q4DUn --count 1004 "VPO3E25FnRz2FBx3Qk8 V4czEah44aTu3OZe2; { create:sender, delete: receiver, node: { type:queue, durabl e:True } }" 1> service qpidd restart # but late, reconnection code was not triggered >> no reconnection happened >> queue contains: >> ./qc_list_objects.py -b mrg-qe-07 -qv | grep VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe >> VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe2 durable msg|byteEnq/Deq:2008/0|80320/0 unacked:0 consumer_cnt:0 binding_cnt:1 C:\qpid>c:\qpid\bin\csharp.example.spout.exe --broker mrg-qe-07.lab.eng.brq.redh at.com:5672 --connection-options "{reconnect : True, reconnect-timeout : 2, reco nnect_limit : 10, reconnect_interval : 3, username: guest, password:guest}" --co ntent 1w11N67MJspuIFVzuIk82BCMI5v8gXCNZZ3q4DUn --count 1004 "VPO3E25FnRz2FBx3Qk8 V4czEah44aTu3OZe2; { create:sender, delete: receiver, node: { type:queue, durabl e:True } }" 1> service qpidd restart # reconnection triggered 2010-12-02 15:23:34 warning Connection [2303 10.34.33.60:5672] closed 2010-12-02 15:23:34 warning Connection [2304 10.34.33.60:5672] closed 2010-12-02 15:23:36 warning Connect failed: No connection could be made because the target machine actively refused it. : (..\..\qpid-cpp-mrg-0.7.946106\cpp\sr c\qpid\sys\windows\Socket.cpp:219) 2010-12-02 15:23:36 warning Connection closed Exception Org.Apache.Qpid.Messaging.QpidException: Failed to connect within reco nnect timeout at Org.Apache.Qpid.Messaging.Sender.Send(Message mmsgp, Boolean sync) at Org.Apache.Qpid.Messaging.Sender.Send(Message mmsgp) at Org.Apache.Qpid.Messaging.Examples.Spout.Main(String[] args) in c:\qpid\do tnet_examples\examples\csharp.example.spout\csharp.example.spout.cs:line 96. >> csharp.example.spout.exe ends with qpid exception and does not try again to reconnect -> ISSUE alpha] >> queue contains: >> ./qc_list_objects.py -b mrg-qe-07 -qv | grep VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe >> VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe2 durable msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 >> confirmed by qpid-tool, no message in the queue means -> ISSUE beta] MESSAGE LOSS Legend: C:\qpid> is on Win XP / Server 2003 x64 1> is on RHEL 4.8 i386 >> comments
An interesting data point would be to substitute the c++ drain and spout executables into the test. This would help differentiate if the problem was .NET-specific or if it is in the common messaging DLLs.
The messages were not marked durable, so the message loss problem is explained. Changing BZ title. I'm going to retry, feedback tomorrow.
The reconnection on Linux (RHEL4) using c++ spout/drain works as expected. During retest of C++ and C# client in Win environment I found following behavior: a] C++ client: Running c++ spout with reconnection, like this (spout is modified to allow durable messages): C:\qpid>c:\qpid\bin\spout --broker mrg-qe-07.lab.eng.brq.redhat.com:5672 --conne ction-options "{reconnect: True, reconnect_timeout:2, reconnect-limit: 10, recon nect_interval:3, username:guest, password:guest}" --content _content_ --count 20 000 --durable "A3; {create:sender, node: {type: queue, durable:True}}" 2010-12-06 10:15:24 warning Connection [3600 10.34.33.60:5672] closed 2010-12-06 10:15:25 warning Connection [3601 10.34.33.60:5672] closed This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 2010-12-06 10:15:26 warning Connect failed: No connection During the run I restarted distant broker few times with variable non-operation gap. Second or third broker interruption led to: This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 2010-12-06 10:15:26 warning Connect failed: No connection After that, no messages were put to broker, proved by QMF: [freznice@dhcp-26-251 ~]$ ~/prjs/rh/rhts_tests/tests/distribution/MRG/Messaging//qpid_common/clients/python/qc_list_objects.py -b mrg-qe-07.lab.eng.brq.redhat.com -qv | grep ^A3 A3 durable msg|byteEnq/Deq:5100/0|45900/0 unacked:0 consumer_cnt:0 binding_cnt:1 The client ended operation after a while and in total 5100 of of expected 20000 were sent. b] C# client: Apart of issue from comment #0, I triggered another issue relevant to above case a] Running modified csharp.example.spout (added durability): C:\qpid>c:\qpid\bin\csharp.example.spout --broker mrg-qe-07.lab.eng.brq.redhat.c om:5672 --connection-options "{reconnect: True, reconnect_timeout:2, reconnect-l imit: 10, reconnect_interval:3, username:guest, password:guest}" --content _cont ent_ --count 10000 --durable "A4; {create:sender, node: {type: queue, durable:Tr ue}}" 2010-12-06 10:18:53 warning Connect failed: No connection could be made because the target machine actively refused it. : (..\..\qpid-cpp-mrg-0.7.946106\cpp\sr c\qpid\sys\windows\Socket.cpp:219) 2010-12-06 10:18:53 warning Connection closed 2010-12-06 10:19:00 warning Connection [3747 10.34.33.60:5672] closed 2010-12-06 10:19:01 warning Connection [3748 10.34.33.60:5672] closed 2010-12-06 10:19:02 warning Connect failed: No connection could be made because the target machine actively refused it. : (..\..\qpid-cpp-mrg-0.7.946106\cpp\sr c\qpid\sys\windows\Socket.cpp:219) 2010-12-06 10:19:02 warning Connection closed While the csharp.example.spout runs I restarted broker few times and triggered hang of the client, checking the message count in the queue and connection I found that connection was closed and queue does not contain all required messages (5939 exp. 20000). [freznice@dhcp-26-251 ~]$ qpid-stat -c mrg-qe-07.lab.eng.brq.redhat.com:5672 Connections client-addr cproc cpid auth connected idle msgIn msgOut ====================================================================================== 127.0.0.1:36492 sesame 20043 guest@QPID 15m 36s 0s 626 0 10.34.26.251:60209 qpid-stat 19903 anonymous@QPID 0s 0s 238 0 [freznice@dhcp-26-251 ~]$ ~/prjs/rh/rhts_tests/tests/distribution/MRG/Messaging//qpid_common/clients/python/qc_list_objects.py -b mrg-qe-07.lab.eng.brq.redhat.com -qv | grep ^A4 A4 durable msg|byteEnq/Deq:5939/0|53451/0 unacked:0 consumer_cnt:0 binding_cnt:1 I confirm that no message loss happens as mentioned above as messages were not marked durable. With durable messages broker find and restore them from the store well.
I've closed this bug as a dup of Bug 661500. The real issue here is confusion around the syntax for the connection options and the fact that the documentation is misleading. *** This bug has been marked as a duplicate of bug 661500 ***