Bug 659322 - .NET / C++ win client Reconnect feature does not work reliably - reconnection sometimes fails / hangs
Summary: .NET / C++ win client Reconnect feature does not work reliably - reconnection...
Keywords:
Status: CLOSED DUPLICATE of bug 661500
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-sdk
Version: Development
Hardware: Unspecified
OS: Windows
high
high
Target Milestone: 1.3.1
: ---
Assignee: Chuck Rolke
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-02 14:48 UTC by Frantisek Reznicek
Modified: 2015-11-16 01:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-08 20:37:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Frantisek Reznicek 2010-12-02 14:48:36 UTC
Description of problem:

There are multiple observations that reconnecting sender (csharp.example.spout, with bug 657523 c#2 and bug 629926 c#9 fixes)

fails to reconnect with following exception:
Exception Org.Apache.Qpid.Messaging.QpidException: Failed to connect within reconnect timeout
   at Org.Apache.Qpid.Messaging.Sender.Send(Message mmsgp, Boolean sync)
   at Org.Apache.Qpid.Messaging.Sender.Send(Message mmsgp)
   at Org.Apache.Qpid.Messaging.Examples.Spout.Main(String[] args) in c:\qpid\do
tnet_examples\examples\csharp.example.spout\csharp.example.spout.cs:line 96.

Moreover at the bad reconnect point I can see that messages in the queue are purged, which seems to be another broker related problem.

Seen on Win XP / Server 2003 x64 vs. RHEL4.8 i386.


Version-Release number of selected component (if applicable):
qpid-cpp-winsdk-1.3.0.24-*.zip

How reproducible:
40%

Steps to Reproduce:
1. extract qpid winsdk
2. patch csharp.example.spout .NET example (bug 657523 c#2 and bug 629926 c#9 fixes)
3. compile the examples
4. execute sender with following config
...\csharp.example.spout.exe --broker <host>:<port> 
  --connection-options "{reconnect : True, reconnect-timeout : 2, reconnect_limit : 10, reconnect_interval : 3, username: guest, password:guest}"
  --content "<random-string>" --count <random-number-gt-1000> 
  "<rand-str-addr>; { create:sender, delete: receiver, node: { type:queue, durable:True } }"
5. while spout is running restart the broker on <host> 'service qpidd restart'
6. monitor queue depth by qc_list_objects.py or qpid-tool to see depth
   check csharp.example.spout.exe messages
7. Repeat point 4. - 6. until you see 
   a] Reconnection failed as above
   b] queue depth dropped

See more details in last section...

Actual results:
.NET client reconnection function is not reliable, it sometimes fails and messages might be lost

Expected results:
.NET client reconnection function should be reliable, no message loss should happen

Additional info:

  
  Reconnection issue description:
  
  C:\qpid>c:\qpid\bin\csharp.example.spout.exe --broker mrg-qe-07.lab.eng.brq.redh
  at.com:5672 --connection-options "{reconnect : True, reconnect-timeout : 2, reco
  nnect_limit : 10, reconnect_interval : 3, username: guest, password:guest}" --co
  ntent 1w11N67MJspuIFVzuIk82BCMI5v8gXCNZZ3q4DUn --count 1004 "VPO3E25FnRz2FBx3Qk8
  V4czEah44aTu3OZe2; { create:sender, delete: receiver, node: { type:queue, durabl
  e:True } }"
  
  1> service qpidd restart  # reconnection triggered
  
  2010-12-02 15:22:51 warning Connection [2298 10.34.33.60:5672] closed
  2010-12-02 15:22:52 warning Connection [2299 10.34.33.60:5672] closed
  2010-12-02 15:22:53 warning Connect failed: No connection could be made because
  the target machine actively refused it. :  (..\..\qpid-cpp-mrg-0.7.946106\cpp\sr
  c\qpid\sys\windows\Socket.cpp:219)
  2010-12-02 15:22:53 warning Connection  closed
  
  >> reconnection succeeded
  >> queue contains:
  >> ./qc_list_objects.py -b mrg-qe-07 -qv | grep VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe
  >> VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe2  durable msg|byteEnq/Deq:1004/0|40160/0 unacked:0 consumer_cnt:0 binding_cnt:1
  
  
  C:\qpid>c:\qpid\bin\csharp.example.spout.exe --broker mrg-qe-07.lab.eng.brq.redh
  at.com:5672 --connection-options "{reconnect : True, reconnect-timeout : 2, reco
  nnect_limit : 10, reconnect_interval : 3, username: guest, password:guest}" --co
  ntent 1w11N67MJspuIFVzuIk82BCMI5v8gXCNZZ3q4DUn --count 1004 "VPO3E25FnRz2FBx3Qk8
  V4czEah44aTu3OZe2; { create:sender, delete: receiver, node: { type:queue, durabl
  e:True } }"
  
  1> service qpidd restart  # but late, reconnection code was not triggered
  
  >> no reconnection happened
  >> queue contains:
  >> ./qc_list_objects.py -b mrg-qe-07 -qv | grep VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe
  >> VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe2  durable msg|byteEnq/Deq:2008/0|80320/0 unacked:0 consumer_cnt:0 binding_cnt:1
  
  
  C:\qpid>c:\qpid\bin\csharp.example.spout.exe --broker mrg-qe-07.lab.eng.brq.redh
  at.com:5672 --connection-options "{reconnect : True, reconnect-timeout : 2, reco
  nnect_limit : 10, reconnect_interval : 3, username: guest, password:guest}" --co
  ntent 1w11N67MJspuIFVzuIk82BCMI5v8gXCNZZ3q4DUn --count 1004 "VPO3E25FnRz2FBx3Qk8
  V4czEah44aTu3OZe2; { create:sender, delete: receiver, node: { type:queue, durabl
  e:True } }"
  
  1> service qpidd restart  # reconnection triggered
  
  2010-12-02 15:23:34 warning Connection [2303 10.34.33.60:5672] closed
  2010-12-02 15:23:34 warning Connection [2304 10.34.33.60:5672] closed
  2010-12-02 15:23:36 warning Connect failed: No connection could be made because
  the target machine actively refused it. :  (..\..\qpid-cpp-mrg-0.7.946106\cpp\sr
  c\qpid\sys\windows\Socket.cpp:219)
  2010-12-02 15:23:36 warning Connection  closed
  Exception Org.Apache.Qpid.Messaging.QpidException: Failed to connect within reco
  nnect timeout
    at Org.Apache.Qpid.Messaging.Sender.Send(Message mmsgp, Boolean sync)
    at Org.Apache.Qpid.Messaging.Sender.Send(Message mmsgp)
    at Org.Apache.Qpid.Messaging.Examples.Spout.Main(String[] args) in c:\qpid\do
  tnet_examples\examples\csharp.example.spout\csharp.example.spout.cs:line 96.
  
  
  >> csharp.example.spout.exe ends with qpid exception and does not try again to reconnect -> ISSUE alpha]
  >> queue contains:
  >> ./qc_list_objects.py -b mrg-qe-07 -qv | grep VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe
  >> VPO3E25FnRz2FBx3Qk8V4czEah44aTu3OZe2  durable msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
  >> confirmed by qpid-tool, no message in the queue means -> ISSUE beta] MESSAGE LOSS
  
  
  Legend:
  C:\qpid> is on Win XP / Server 2003 x64
  1> is on RHEL 4.8 i386
  >> comments

Comment 1 Chuck Rolke 2010-12-02 14:58:36 UTC
An interesting data point would be to substitute the c++ drain and spout executables into the test.

This would help differentiate if the problem was .NET-specific or if it is in the common messaging DLLs.

Comment 3 Frantisek Reznicek 2010-12-02 16:08:25 UTC
The messages were not marked durable, so the message loss problem is explained. Changing BZ title. I'm going to retry, feedback tomorrow.

Comment 4 Frantisek Reznicek 2010-12-06 09:39:59 UTC
The reconnection on Linux (RHEL4) using c++ spout/drain works as expected.

During retest of C++ and C# client in Win environment I found following behavior:

a] C++ client:

Running c++ spout with reconnection, like this (spout is modified to allow durable messages):

C:\qpid>c:\qpid\bin\spout --broker mrg-qe-07.lab.eng.brq.redhat.com:5672 --conne
ction-options "{reconnect: True, reconnect_timeout:2, reconnect-limit: 10, recon
nect_interval:3, username:guest, password:guest}" --content _content_ --count 20
000 --durable "A3; {create:sender, node: {type: queue, durable:True}}"
2010-12-06 10:15:24 warning Connection [3600 10.34.33.60:5672] closed
2010-12-06 10:15:25 warning Connection [3601 10.34.33.60:5672] closed

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
2010-12-06 10:15:26 warning Connect failed: No connection


During the run I restarted distant broker few times with variable non-operation gap.


Second or third broker interruption led to:
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
2010-12-06 10:15:26 warning Connect failed: No connection


After that, no messages were put to broker, proved by QMF:

[freznice@dhcp-26-251 ~]$ ~/prjs/rh/rhts_tests/tests/distribution/MRG/Messaging//qpid_common/clients/python/qc_list_objects.py   -b mrg-qe-07.lab.eng.brq.redhat.com -qv | grep ^A3
A3  durable msg|byteEnq/Deq:5100/0|45900/0 unacked:0 consumer_cnt:0 binding_cnt:1

The client ended operation after a while and in total 5100 of of expected 20000 were sent.


b] C# client:

Apart of issue from comment #0, I triggered another issue relevant to above case a]


Running modified csharp.example.spout (added durability):
C:\qpid>c:\qpid\bin\csharp.example.spout --broker mrg-qe-07.lab.eng.brq.redhat.c
om:5672 --connection-options "{reconnect: True, reconnect_timeout:2, reconnect-l
imit: 10, reconnect_interval:3, username:guest, password:guest}" --content _cont
ent_ --count 10000 --durable "A4; {create:sender, node: {type: queue, durable:Tr
ue}}"
2010-12-06 10:18:53 warning Connect failed: No connection could be made because
the target machine actively refused it. :  (..\..\qpid-cpp-mrg-0.7.946106\cpp\sr
c\qpid\sys\windows\Socket.cpp:219)
2010-12-06 10:18:53 warning Connection  closed
2010-12-06 10:19:00 warning Connection [3747 10.34.33.60:5672] closed
2010-12-06 10:19:01 warning Connection [3748 10.34.33.60:5672] closed
2010-12-06 10:19:02 warning Connect failed: No connection could be made because
the target machine actively refused it. :  (..\..\qpid-cpp-mrg-0.7.946106\cpp\sr
c\qpid\sys\windows\Socket.cpp:219)
2010-12-06 10:19:02 warning Connection  closed

While the csharp.example.spout runs I restarted broker few times and  triggered hang of the client, checking the message count in the queue and connection I found that connection was closed and queue does not contain all required messages (5939 exp. 20000).

[freznice@dhcp-26-251 ~]$ qpid-stat -c mrg-qe-07.lab.eng.brq.redhat.com:5672
Connections
  client-addr         cproc      cpid   auth            connected  idle  msgIn  msgOut
  ======================================================================================
  127.0.0.1:36492     sesame     20043  guest@QPID      15m 36s    0s     626      0
  10.34.26.251:60209  qpid-stat  19903  anonymous@QPID  0s         0s     238      0
[freznice@dhcp-26-251 ~]$ ~/prjs/rh/rhts_tests/tests/distribution/MRG/Messaging//qpid_common/clients/python/qc_list_objects.py   -b mrg-qe-07.lab.eng.brq.redhat.com -qv | grep ^A4
A4  durable msg|byteEnq/Deq:5939/0|53451/0 unacked:0 consumer_cnt:0 binding_cnt:1



I confirm that no message loss happens as mentioned above as messages were not marked durable. With durable messages broker find and restore them from the store well.

Comment 5 Ted Ross 2010-12-08 20:37:10 UTC
I've closed this bug as a dup of Bug 661500.  The real issue here is confusion around the syntax for the connection options and the fact that the documentation is misleading.

*** This bug has been marked as a duplicate of bug 661500 ***


Note You need to log in before you can comment on or make changes to this bug.