Bug 680938

Summary: JMS client fails with Connection reset with a large no. (100) of durable topic subscriptions
Product: Red Hat Enterprise MRG Reporter: Kim van der Riet <kim.vdriet>
Component: qpid-javaAssignee: messaging-bugs <messaging-bugs>
Status: NEW --- QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: DevelopmentCC: iboverma, jross
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Test script run from qpid/java/tools/bin directory
none
Diff to produce modified test setup allowing multiple consumers none

Description Kim van der Riet 2011-02-28 15:27:07 UTC
Created attachment 481391 [details]
Test script run from qpid/java/tools/bin directory

When running the JMS client using attached script to measure topic scalability, the test fails on the 100 test with a Connection reset error. The test increases the number of subscribers to a single topic with a single producer using the progression 1, 3, 10, 30, 100, 300 ..., but has never succeeded beyond the 30 test. This bug is one of two possible failure outcomes while running the 100 test. The chance of failure is 100%, however the probability of this bug's outcome is approximately 50%.

Note that it seems odd that the failure would be on a code 200 (success).

Error when running test Connection reset
org.apache.qpid.AMQConnectionFailureException: Connection reset [error code 200: reply success]
	at org.apache.qpid.client.AMQConnection.<init>(AMQConnection.java:472)
	at org.apache.qpid.client.AMQConnection.<init>(AMQConnection.java:246)
	at org.apache.qpid.tools.PerfBase.setUp(PerfBase.java:55)
	at org.apache.qpid.tools.PerfConsumer.setUp(PerfConsumer.java:105)
	at org.apache.qpid.tools.PerfConsumer.test(PerfConsumer.java:222)
	at org.apache.qpid.tools.PerfConsumer$1.run(PerfConsumer.java:301)
	at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.qpid.AMQException: Cannot connect to broker: Connection reset [error code 200: reply success]
	at org.apache.qpid.client.AMQConnectionDelegate_0_10.makeBrokerConnection(AMQConnectionDelegate_0_10.java:197)
	at org.apache.qpid.client.AMQConnection.makeBrokerConnection(AMQConnection.java:617)
	at org.apache.qpid.client.AMQConnection.<init>(AMQConnection.java:396)
	... 6 more
Caused by: org.apache.qpid.transport.ConnectionException: Connection reset
	at org.apache.qpid.transport.ConnectionException.rethrow(ConnectionException.java:67)
	at org.apache.qpid.transport.Connection.connect(Connection.java:267)
	at org.apache.qpid.client.AMQConnectionDelegate_0_10.makeBrokerConnection(AMQConnectionDelegate_0_10.java:178)
	... 8 more
Caused by: org.apache.qpid.transport.ConnectionException: Connection reset
	at org.apache.qpid.transport.Connection.exception(Connection.java:511)
	at org.apache.qpid.transport.network.Assembler.exception(Assembler.java:105)
	at org.apache.qpid.transport.network.InputHandler.exception(InputHandler.java:197)
	at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:145)
	... 1 more
Caused by: java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:185)
	at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:123)
	... 1 more

Comment 1 Kim van der Riet 2011-02-28 15:36:35 UTC
The other outcome described above (connection timed out) is in bug 680943.

Comment 2 Kim van der Riet 2011-03-31 19:55:18 UTC
While testing on a 2-machine config (ie broker on mrg43, JMS client on mrg42, and while running transient-only topic tests, I see the client freeze up on the 100, 300, and 1000-client section. If I run jdb against the client, I see that several instances of PerfConsumer are waiting for the end of the test, even though the producer has long since exited. See patch attachment for the modifications and scripts which produce this result. In the example below, the 1000-client test hung, and 5 threads are still waiting for the end of the test (although I have seen up to 100 waiting threads on some hangs):

[kpvdr@mrg42 java]$ jdb -attach 8000
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb ...
> threads
Group system:
  (java.lang.ref.Reference$ReferenceHandler)0x7a4 Reference Handler             cond. waiting
  (java.lang.ref.Finalizer$FinalizerThread)0x7a5  Finalizer                     cond. waiting
  (java.lang.Thread)0x7a6                         Signal Dispatcher             running
Group main:
  (java.lang.Thread)0x1                           main                          cond. waiting
  (java.lang.Thread)0x7a8                         Thread-34                     cond. waiting
  (java.lang.Thread)0x7a9                         Thread-123                    cond. waiting
  (java.lang.Thread)0x7aa                         Thread-160                    cond. waiting
  (java.lang.Thread)0x7ab                         Thread-299                    cond. waiting
  (java.lang.Thread)0x7ac                         Thread-732                    cond. waiting
  (java.lang.Thread)0x7ad                         IoSender - /20.0.10.43:5672   cond. waiting
  (java.lang.Thread)0x7ae                         IoSender - /20.0.10.43:5672   cond. waiting
  (java.lang.Thread)0x7af                         IoSender - /20.0.10.43:5672   cond. waiting
  (java.lang.Thread)0x7b0                         IoReceiver - /20.0.10.43:5672 running
  (java.lang.Thread)0x7b1                         IoReceiver - /20.0.10.43:5672 running
  (java.lang.Thread)0x7b2                         IoSender - /20.0.10.43:5672   cond. waiting
  (java.lang.Thread)0x7b3                         IoSender - /20.0.10.43:5672   cond. waiting
  (java.lang.Thread)0x7b4                         IoReceiver - /20.0.10.43:5672 running
  (java.lang.Thread)0x7b5                         IoReceiver - /20.0.10.43:5672 running
  (java.lang.Thread)0x7b6                         IoReceiver - /20.0.10.43:5672 running
  (java.util.TimerThread)0x7b7                    ack-flusher                   cond. waiting
  (java.lang.Thread)0x7b8                         Dispatcher-Channel-0          cond. waiting
  (java.lang.Thread)0x7b9                         Dispatcher-Channel-0          cond. waiting
  (java.lang.Thread)0x7ba                         Dispatcher-Channel-0          cond. waiting
  (java.lang.Thread)0x7bb                         Dispatcher-Channel-0          cond. waiting
  (java.lang.Thread)0x7bc                         Dispatcher-Channel-0          cond. waiting
> suspend
All threads suspended.
> thread 0x1
main[1] where
  [1] java.lang.Object.wait (native method)
  [2] java.lang.Thread.join (Thread.java:1,160)
  [3] java.lang.Thread.join (Thread.java:1,213)
  [4] org.apache.qpid.tools.PerfConsumer.main (PerfConsumer.java:322)
main[1] thread 0x7a8
Thread-34[1] where
  [1] java.lang.Object.wait (native method)
  [2] java.lang.Object.wait (Object.java:502)
  [3] org.apache.qpid.tools.PerfConsumer.calcResults (PerfConsumer.java:148)
  [4] org.apache.qpid.tools.PerfConsumer.test (PerfConsumer.java:225)
  [5] org.apache.qpid.tools.PerfConsumer$1.run (PerfConsumer.java:301)
  [6] java.lang.Thread.run (Thread.java:636)
Thread-34[1] thread 0x7ad
IoSender - /20.0.10.43:5672[1] where
  [1] java.lang.Object.wait (native method)
  [2] java.lang.Object.wait (Object.java:502)
  [3] org.apache.qpid.transport.network.io.IoSender.run (IoSender.java:247)
  [4] java.lang.Thread.run (Thread.java:636)
IoSender - /20.0.10.43:5672[1] thread 0x7b0
IoReceiver - /20.0.10.43:5672[1] where
  [1] java.net.SocketInputStream.socketRead0 (native method)
  [2] java.net.SocketInputStream.read (SocketInputStream.java:146)
  [3] org.apache.qpid.transport.network.io.IoReceiver.run (IoReceiver.java:123)
  [4] java.lang.Thread.run (Thread.java:636)
IoReceiver - /20.0.10.43:5672[1] thread 0x7b8
Dispatcher-Channel-0[1] where
  [1] java.lang.Object.wait (native method)
  [2] java.lang.Object.wait (Object.java:502)
  [3] org.apache.qpid.client.util.FlowControllingBlockingQueue.take (FlowControllingBlockingQueue.java:92)
  [4] org.apache.qpid.client.AMQSession$Dispatcher.run (AMQSession.java:3,242)
  [5] java.lang.Thread.run (Thread.java:636)

Comment 3 Kim van der Riet 2011-03-31 20:01:41 UTC
Created attachment 489213 [details]
Diff to produce modified test setup allowing multiple consumers

This is the patch for producing the symptoms in Comment #2 above. It includes all the changes to the Java test source and the test script perf-topic.sh (which has its durable section commented out at present so that all runs are transient only).

Comment 4 Kim van der Riet 2011-04-01 18:27:28 UTC
The symptoms and attachment from comment 2 and comment 3 above are not directly related to this bug; my mistake. Please ignore.

Comment 5 Kim van der Riet 2011-04-01 18:49:10 UTC
Comment on attachment 489213 [details]
Diff to produce modified test setup allowing multiple consumers

not related to this bug