Created attachment 910795 [details] logs.zip There is lost large message in cluster with network failures. This can have negative impact for customers. Test scenario: 1. Start 2 servers in HQ cluster with deployed "queue/InQueue" (reconnect-attempts for cluster connection is set to -1) 2. Start producer which sends large messages (1 MB) to InQueue to 1st server 3. Start consumer on 2nd server which reads messages from InQueue 4. During steps 2. and 3. disconnect network between servers, wait 2 minutes and reconnect 5. Stop producer and wait for consumer to receive all messages 6. Verify that number of send and received messages is equal Sometimes happens that 1 large message is missing. I'm attaching logs from test, server1 and server2. ID of lost large message is: ID:96424df7-f870-11e3-a232-dd298b5ff7de and have set _HQ_DUPL_ID=a4640014-8f4d-4912-9ab2-273b495cccd21403264819220 I can see that server1 acked delivery of large message to server2. But server2 did not receive the whole message, only the first packet with initial large message header but no other chunks: From server2: 13:49:25,301 TRACE [org.hornetq.core.server] (Old I/O server worker (parentId: 590887547, [id: 0x23383a7b, /192.168.40.2:5445])) sendLarge::LargeServerMessage[messageID=3770,priority=4,expiration=[null], durable=true, address=jms.queue.InQueue,properties=TypedProperties[{_HQ_BRIDGE_DUP=[B@562798e8, _HQ_LARGE_SIZE=2147811, counter=904, _HQ_ROUTE_TO=[B@41a7d388, _HQ_DUPL_ID=a4640014-8f4d-4912-9ab2-273b495cccd21403264819220, count=903, color=RED, __HQ_CID=0e83a18b-f870-11e3-a232-dd298b5ff7de}]]@1899013988 From server1: 13:49:25,551 TRACE [org.hornetq.core.server] (Old I/O client worker ([id: 0x1ab03f84, /127.0.0.1:50286 => localhost/127.0.0.1:43812])) ClusterConnectionBridge@5b0508f3 [name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, queue=QueueImpl[name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7]]@40882c7f targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@5b0508f3 [name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, queue=QueueImpl[name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7]]@40882c7f targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=connector-to-proxy-directing-to-this-server, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=43812&host=localhost], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@605474553[nodeUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7, connector=TransportConfiguration(name=connector-to-proxy-directing-to-this-server, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=43821&host=localhost, address=jms, server=HornetQServerImpl::serverUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7])) [initialConnectors=[TransportConfiguration(name=connector-to-proxy-directing-to-this-server, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=43812&host=localhost], discoveryGroupConfiguration=null]] Acking Reference[2845]:RELIABLE:LargeServerMessage[messageID=2845,priority=4,expiration=[null], durable=true, address=jms.queue.InQueue,properties=TypedProperties[{counter=904, _HQ_LARGE_SIZE=2147811, count=903, _HQ_DUPL_ID=a4640014-8f4d-4912-9ab2-273b495cccd21403264819220, _HQ_ROUTE_TOsf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472=[B@3918e589, color=RED, __HQ_CID=0e83a18b-f870-11e3-a232-dd298b5ff7de}]]@1915834324 on queue QueueImpl[name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7]]@40882c7f Network was disconnected in: 13:49:25,756 main INFO [org.jboss.qa.hornetq.test.bridges.NetworkFailuresHornetQCoreBridges:616] Stop all proxies.