Description of problem: There are warning messages: 11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222090: Replication Large MessageID 164 is not available on backup server. Ignoring replication message during sending/receiving messages to/from live server which has replicated backup. Warning messages are in log of backup server. Test scenario: 1. Start live/backup pair - replicated journal 2. Start producer and consumer (with transacted session) which are connected to live and sending/receiving messages from queue. Producer sends mix of small and large messages. 3. Check logs of backup server Additional info: I added some trace logs to ReplicationEndpoint class and noticed that ReplicationEndpoint.handleLargeMessageEnd is called twice: 11:33:23,824 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Writing buffer for channelID=2 11:33:23,824 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) handling packet PACKET(ReplicationLargeMessageEndMessage)[type=100, channelI D=2, packetObject=ReplicationLargeMessageEndMessage] 11:33:23,825 DEBUG [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Invocation of interceptor org.hornetq.core.server.impl.ReplicationError on P ACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage] returned true 11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222015: mnovak: large Message 164 was deleted from largeMessages on backup. 11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Sending packet nonblocking PACKET(ReplicationResponseMessage)[type=90, channelID=2, packetObject=ReplicationResponseMessage] on channeID=2 11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Writing buffer for channelID=2 11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) handling packet PACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage] 11:33:23,825 DEBUG [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Invocation of interceptor org.hornetq.core.server.impl.ReplicationError on PACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage] returned true 11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222015: mnovak: large Message 164 was deleted from largeMessages on backup. 11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222090: Replication Large MessageID 164 is not available on backup server. Ignoring replication message Trace logs from live and backup server attached. Expected results: There should be no warnings.
Created attachment 778169 [details] server_logs.zip
Francisco Borges is in Paternity leave this week. I'm asking him to look at this as soon as he's back on next week.
I just got a notice of this one.. and I understand it being new, right? Francisco is back early next week.. I've asked him to work on it... I'm also out this week.
I've looked into it and my conclusion is that this is not a bug. Ok. Let me rephrase it: "not a real bug". What happens is that on certain circumstances HQ will try to delete the same large-message twice. The "local disk" part of the HQ code (i.e. the non-replicated part) just ignores these, while the replication code will log a warning. Since HQ seems to have chosen to ignore these locally, I think the best option is to just remove this warning from the replication code.
Agreed. It's really called twice. It's just needed to get rid of this warning because it appears in EAP logs and customers would be nervous about it.
Fix is not present in 6.2.0.ER1/2. Will be verified with EAP 6.2.0.ER3.
Verified in EAP 6.2.0.ER3. There are no more "Replication Large MessageID..." warnings in the log. Setting as verified.