After a certain sequence of events, a large message deletion may be called twice, resulting in an erroneous warning message on a backup HornetQ server, similar to below:
----
11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222090: Replication Large MessageID 164 is not available on backup server. Ignoring replication message
----
This issue was fixed by stopping the erroneous warning from appearing in the JBoss EAP 6 logs. As a result, on a backup HornetQ server there should no longer be erroneous warning messages related to large messages not being available.
Description of problem:
There are warning messages:
11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222090: Replication Large MessageID 164 is not available on backup server. Ignoring replication message
during sending/receiving messages to/from live server which has replicated backup. Warning messages are in log of backup server.
Test scenario:
1. Start live/backup pair - replicated journal
2. Start producer and consumer (with transacted session) which are connected to live and sending/receiving messages from queue. Producer sends mix of small and large messages.
3. Check logs of backup server
Additional info:
I added some trace logs to ReplicationEndpoint class and noticed that ReplicationEndpoint.handleLargeMessageEnd is called twice:
11:33:23,824 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Writing buffer for channelID=2
11:33:23,824 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) handling packet PACKET(ReplicationLargeMessageEndMessage)[type=100, channelI
D=2, packetObject=ReplicationLargeMessageEndMessage]
11:33:23,825 DEBUG [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Invocation of interceptor org.hornetq.core.server.impl.ReplicationError on P
ACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage] returned true
11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222015: mnovak: large Message 164 was deleted from largeMessages on backup.
11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Sending packet nonblocking PACKET(ReplicationResponseMessage)[type=90, channelID=2, packetObject=ReplicationResponseMessage] on channeID=2
11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Writing buffer for channelID=2
11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) handling packet PACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage]
11:33:23,825 DEBUG [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Invocation of interceptor org.hornetq.core.server.impl.ReplicationError on PACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage] returned true
11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222015: mnovak: large Message 164 was deleted from largeMessages on backup.
11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222090: Replication Large MessageID 164 is not available on backup server. Ignoring replication message
Trace logs from live and backup server attached.
Expected results:
There should be no warnings.
I just got a notice of this one.. and I understand it being new, right?
Francisco is back early next week.. I've asked him to work on it... I'm also out this week.
I've looked into it and my conclusion is that this is not a bug. Ok. Let me rephrase it: "not a real bug".
What happens is that on certain circumstances HQ will try to delete the same large-message twice. The "local disk" part of the HQ code (i.e. the non-replicated part) just ignores these, while the replication code will log a warning.
Since HQ seems to have chosen to ignore these locally, I think the best option is to just remove this warning from the replication code.