Bug 1017559

Summary: lack of synchronization on cluster.copy versus message delivery
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Tom Ross <tom.ross>
Component: HornetQAssignee: Clebert Suconic <csuconic>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Svehla <msvehla>
Severity: urgent Docs Contact: Russell Dickenson <rdickens>
Priority: urgent    
Version: 6.1.0CC: jdoyle, msvehla, myarboro, vincent.kirsch
Target Milestone: ER6   
Target Release: EAP 6.2.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1392427 (view as bug list) Environment:
Last Closed: 2013-12-15 16:15:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1016141    
Bug Blocks: 1392427    

Description Tom Ross 2013-10-10 07:31:45 UTC
Customer sees the following exception in the log file:

2013-10-03 17:11:26,941 WARN  [org.hornetq.core.server] (Thread-22 (HornetQ-server-HornetQServerImpl::serverUUID=f9b84d7b-2c39-11e3-8592-fb893fe81f19-6652469)) HQ222151: removing consumer which did not handle a message, consumer=ClusterConnectionBridge@5277a49f [name=sf.CON_IVUrail.9e8e06ce-2c3a-11e3-a54e-e380885b2d1e, queue=QueueImpl[name=sf.CON_IVUrail.9e8e06ce-2c3a-11e3-a54e-e380885b2d1e, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=f9b84d7b-2c39-11e3-8592-fb893fe81f19]]@55b0fa43 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@5277a49f [name=sf.CON_IVUrail.9e8e06ce-2c3a-11e3-a54e-e380885b2d1e, queue=QueueImpl[name=sf.CON_IVUrail.9e8e06ce-2c3a-11e3-a54e-e380885b2d1e, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=f9b84d7b-2c39-11e3-8592-fb893fe81f19]]@55b0fa43 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=21060&host=10-248-144-184], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1264950074[nodeUUID=f9b84d7b-2c39-11e3-8592-fb893fe81f19, connector=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=21010&host=10-248-144-184, address=jms, server=HornetQServerImpl::serverUUID=f9b84d7b-2c39-11e3-8592-fb893fe81f19])) [initialConnectors=[TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=21060&host=10-248-144-184], discoveryGroupConfiguration=null]], message=Reference[253234]:RELIABLE:ServerMessage[messageID=253234,priority=4, bodySize=3000,expiration=0, durable=true, address=jms.topic.topic/NotificationDpl-MB.31301201,properties=TypedProperties[{_JBM_DUPL_ID=68a62d2f-7c23-4bd6-bfd4-cdcebef5ed8b, __HQ_CID=8cb065ed-2c3b-11e3-8592-fb893fe81f19, _HQ_ROUTE_TOsf.CON_IVUrail.9e8e06ce-2c3a-11e3-a54e-e380885b2d1e=[B@689692e4}]]@1971711072: java.lang.IndexOutOfBoundsException: Invalid writerIndex: 1506 - Maximum is 0 or 1500
        at org.jboss.netty.buffer.AbstractChannelBuffer.setIndex(AbstractChannelBuffer.java:63) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
        at org.hornetq.core.buffers.impl.ChannelBufferWrapper.setIndex(ChannelBufferWrapper.java:484) [hornetq-commons-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.message.impl.MessageImpl.<init>(MessageImpl.java:170) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.message.impl.MessageImpl.<init>(MessageImpl.java:134) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.ServerMessageImpl.<init>(ServerMessageImpl.java:89) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.ServerMessageImpl.copy(ServerMessageImpl.java:209) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.cluster.impl.ClusterConnectionBridge.beforeForward(ClusterConnectionBridge.java:182) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:548) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:2435) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1922) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.QueueImpl.access$1200(QueueImpl.java:79) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:2695) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_25]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_25]
        at java.lang.Thread.run(Thread.java:724) [rt.jar:1.7.0_25]

2013-10-03 17:11:26,943 ERROR [org.hornetq.core.server] (Thread-22 (HornetQ-server-HornetQServerImpl::serverUUID=f9b84d7b-2c39-11e3-8592-fb893fe81f19-6652469)) HQ224041: Failed to deliver: java.util.NoSuchElementException
        at org.hornetq.utils.PriorityLinkedListImpl$PriorityLinkedListIterator.repeat(PriorityLinkedListImpl.java:181) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1944) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.QueueImpl.access$1200(QueueImpl.java:79) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:2695) [hornetq-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_25]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_25]
        at java.lang.Thread.run(Thread.java:724) [rt.jar:1.7.0_25]

Comment 1 Miroslav Novak 2013-10-10 09:14:55 UTC
I believe this could be related to bz#997890. In the test there is received mix of small and large messages by one subscriber (in standalone JMS client) from topic when IndexOutOfBoundsException is thrown. 

It was hit in failover test with 2 servers in collocated HA topology with replicated journal but IndexOutOfBoundsException was thrown after successful failback of subscriber when no crash was in progress.

Stack trace from the subscriber (it's different then you posted). I didn't have time to look to the code but there could be some race condition:
12:32:38,562 Thread-253 ERROR [org.jboss.qa.hornetq.apps.clients.SubscriberTransAck:138] Exception was thrown during receiving messages:
java.lang.IndexOutOfBoundsException: Not enough readable bytes - Need 10, maximum is 6
	at org.jboss.netty.buffer.AbstractChannelBuffer.checkReadableBytes(AbstractChannelBuffer.java:661)
	at org.jboss.netty.buffer.AbstractChannelBuffer.readBytes(AbstractChannelBuffer.java:338)
	at org.jboss.netty.buffer.AbstractChannelBuffer.readBytes(AbstractChannelBuffer.java:344)
	at org.hornetq.core.buffers.impl.ChannelBufferWrapper.readBytes(ChannelBufferWrapper.java:334)
	at org.hornetq.utils.TypedProperties.decode(TypedProperties.java:472)
	at org.hornetq.jms.client.HornetQMapMessage.doBeforeReceive(HornetQMapMessage.java:440)
	at org.hornetq.jms.client.HornetQMessageConsumer.getMessage(HornetQMessageConsumer.java:232)
	at org.hornetq.jms.client.HornetQMessageConsumer.receive(HornetQMessageConsumer.java:129)
	at org.jboss.qa.hornetq.apps.clients.SubscriberTransAck.receiveMessage(SubscriberTransAck.java:391)
	at org.jboss.qa.hornetq.apps.clients.SubscriberTransAck.run(SubscriberTransAck.java:110)

Comment 2 Clebert Suconic 2013-10-10 14:46:17 UTC
We can't release without this..

Comment 3 Clebert Suconic 2013-10-10 14:51:02 UTC
(In reply to Miroslav Novak from comment #1)

I don't think it would be releated.. there's no message.copy called and no cluster on the test you mentioned

Comment 4 JBoss JIRA Server 2013-10-16 16:21:06 UTC
Clebert Suconic <clebert.suconic> updated the status of jira HORNETQ-1264 to Closed

Comment 5 Martin Svehla 2013-10-17 12:07:45 UTC
Tested with EAP 6.2.0.ER6 / HornetQ 2.3.9.Final

Comment 7 Vincent Kirsch 2015-08-14 12:08:49 UTC
Hello,

I have seen this kind of behavior happen in HornetQ 2.2.18.

Is there any chance the fix for this problem could be backported to a 2.2.X version? We have this happening on a customer's production site. Long story short, they prefer to not do what they consider hazardous major upgrades of the installed components, so basically we're stuck with HornetQ 2.2 :(

Thanks,
Vincent.