Bug 1111628 - Lost large message in cluster with network failures
Summary: Lost large message in cluster with network failures
Keywords:
Status: CLOSED EOL
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: HornetQ
Version: 6.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: EAP 6.4.0
Assignee: Clebert Suconic
QA Contact: Miroslav Novak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-20 14:39 UTC by Miroslav Novak
Modified: 2019-08-19 12:45 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-08-19 12:45:52 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
logs.zip (6.02 MB, application/x-zip-compressed)
2014-06-20 14:39 UTC, Miroslav Novak
no flags Details

Description Miroslav Novak 2014-06-20 14:39:48 UTC
Created attachment 910795 [details]
logs.zip

There is lost large message in cluster with network failures. This can have negative impact for customers. 

Test scenario:
1. Start 2 servers in HQ cluster with deployed "queue/InQueue" (reconnect-attempts for cluster connection is set to -1)
2. Start producer which sends large messages (1 MB) to InQueue to 1st server
3. Start consumer on 2nd server which reads messages from InQueue
4. During steps 2. and 3. disconnect network between servers, wait 2 minutes and reconnect
5. Stop producer and wait for consumer to receive all messages
6. Verify that number of send and received messages is equal

Sometimes happens that 1 large message is missing. I'm attaching logs from test, server1 and server2. 

ID of lost large message is: ID:96424df7-f870-11e3-a232-dd298b5ff7de and have set _HQ_DUPL_ID=a4640014-8f4d-4912-9ab2-273b495cccd21403264819220

I can see that server1 acked delivery of large message to server2. But server2 did not receive the whole message, only the first packet with initial large message header but no other chunks:

From server2:
13:49:25,301 TRACE [org.hornetq.core.server] (Old I/O server worker (parentId: 590887547, [id: 0x23383a7b, /192.168.40.2:5445])) sendLarge::LargeServerMessage[messageID=3770,priority=4,expiration=[null], durable=true, address=jms.queue.InQueue,properties=TypedProperties[{_HQ_BRIDGE_DUP=[B@562798e8, _HQ_LARGE_SIZE=2147811, counter=904, _HQ_ROUTE_TO=[B@41a7d388, _HQ_DUPL_ID=a4640014-8f4d-4912-9ab2-273b495cccd21403264819220, count=903, color=RED, __HQ_CID=0e83a18b-f870-11e3-a232-dd298b5ff7de}]]@1899013988

From server1:
13:49:25,551 TRACE [org.hornetq.core.server] (Old I/O client worker ([id: 0x1ab03f84, /127.0.0.1:50286 => localhost/127.0.0.1:43812])) ClusterConnectionBridge@5b0508f3 [name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, queue=QueueImpl[name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7]]@40882c7f targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@5b0508f3 [name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, queue=QueueImpl[name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7]]@40882c7f targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=connector-to-proxy-directing-to-this-server, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=43812&host=localhost], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@605474553[nodeUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7, connector=TransportConfiguration(name=connector-to-proxy-directing-to-this-server, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=43821&host=localhost, address=jms, server=HornetQServerImpl::serverUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7])) [initialConnectors=[TransportConfiguration(name=connector-to-proxy-directing-to-this-server, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=43812&host=localhost], discoveryGroupConfiguration=null]] Acking Reference[2845]:RELIABLE:LargeServerMessage[messageID=2845,priority=4,expiration=[null], durable=true, address=jms.queue.InQueue,properties=TypedProperties[{counter=904, _HQ_LARGE_SIZE=2147811, count=903, _HQ_DUPL_ID=a4640014-8f4d-4912-9ab2-273b495cccd21403264819220, _HQ_ROUTE_TOsf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472=[B@3918e589, color=RED, __HQ_CID=0e83a18b-f870-11e3-a232-dd298b5ff7de}]]@1915834324 on queue QueueImpl[name=sf.my-cluster.03a9ec30-f870-11e3-8ee0-afecb4b03472, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=ff82ff19-f86f-11e3-839a-4d2347c499b7]]@40882c7f

Network was disconnected in:
13:49:25,756 main INFO  [org.jboss.qa.hornetq.test.bridges.NetworkFailuresHornetQCoreBridges:616] Stop all proxies.


Note You need to log in before you can comment on or make changes to this bug.