Bug 988321

Summary: WARN - Replication Large MessageID 164 is not available on backup server. Ignoring replication message
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: Clebert Suconic <csuconic>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.1.1CC: anmiller, ataylor, csuconic, lcosti, msvehla, myarboro, pslavice, rsvoboda
Target Milestone: ER1   
Target Release: EAP 6.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
After a certain sequence of events, a large message deletion may be called twice, resulting in an erroneous warning message on a backup HornetQ server, similar to below: ---- 11:33:23,825 WARN [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222090: Replication Large MessageID 164 is not available on backup server. Ignoring replication message ---- This issue was fixed by stopping the erroneous warning from appearing in the JBoss EAP 6 logs. As a result, on a backup HornetQ server there should no longer be erroneous warning messages related to large messages not being available.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-15 16:14:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
server_logs.zip none

Description Miroslav Novak 2013-07-25 09:57:19 UTC
Description of problem:
There are warning messages:
11:33:23,825 WARN  [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222090: Replication Large MessageID 164  is not available on backup server. Ignoring replication message

during sending/receiving messages to/from live server which has replicated backup. Warning messages are in log of backup server.

Test scenario:
1. Start live/backup pair - replicated journal
2. Start producer and consumer (with transacted session) which are connected to live and sending/receiving messages from queue. Producer sends mix of small and large messages.
3. Check logs of backup server

Additional info:
I added some trace logs to ReplicationEndpoint class and noticed that ReplicationEndpoint.handleLargeMessageEnd is called twice:
11:33:23,824 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Writing buffer for channelID=2
11:33:23,824 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) handling packet PACKET(ReplicationLargeMessageEndMessage)[type=100, channelI
D=2, packetObject=ReplicationLargeMessageEndMessage]
11:33:23,825 DEBUG [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Invocation of interceptor org.hornetq.core.server.impl.ReplicationError on P
ACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage] returned true
11:33:23,825 WARN  [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222015: mnovak: large Message 164 was deleted from largeMessages on backup.
11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Sending packet nonblocking PACKET(ReplicationResponseMessage)[type=90, channelID=2, packetObject=ReplicationResponseMessage] on channeID=2
11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Writing buffer for channelID=2
11:33:23,825 TRACE [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) handling packet PACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage]
11:33:23,825 DEBUG [org.hornetq.core.client] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) Invocation of interceptor org.hornetq.core.server.impl.ReplicationError on PACKET(ReplicationLargeMessageEndMessage)[type=100, channelID=2, packetObject=ReplicationLargeMessageEndMessage] returned true
11:33:23,825 WARN  [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222015: mnovak: large Message 164 was deleted from largeMessages on backup.
11:33:23,825 WARN  [org.hornetq.core.server] (Old I/O client worker ([id: 0x2b37b4ea, /192.168.40.1:60844 => /192.168.40.1:5445])) HQ222090: Replication Large MessageID 164  is not available on backup server. Ignoring replication message

Trace logs from live and backup server attached.

Expected results:
There should be no warnings.

Comment 1 Miroslav Novak 2013-07-25 09:59:59 UTC
Created attachment 778169 [details]
server_logs.zip

Comment 2 Clebert Suconic 2013-07-25 10:33:46 UTC
Francisco Borges is in Paternity leave this week. I'm asking him to look at this as soon as he's back on next week.

Comment 4 Clebert Suconic 2013-07-25 22:30:34 UTC
I just got a notice of this one.. and I understand it being new, right?


Francisco is back early next week.. I've asked him to work on it... I'm also out this week.

Comment 7 Francisco Borges 2013-08-21 14:06:26 UTC
I've looked into it and my conclusion is that this is not a bug. Ok. Let me rephrase it: "not a real bug".

What happens is that on certain circumstances HQ will try to delete the same large-message twice. The "local disk" part of the HQ code (i.e. the non-replicated part) just ignores these, while the replication code will log a warning.

Since HQ seems to have chosen to ignore these locally, I think the best option is to just remove this warning from the replication code.

Comment 8 Miroslav Novak 2013-08-21 14:14:16 UTC
Agreed. It's really called twice. It's just needed to get rid of this warning because it appears in EAP logs and customers would be nervous about it.

Comment 10 Miroslav Novak 2013-09-23 10:03:19 UTC
Fix is not present in 6.2.0.ER1/2. Will be verified with EAP 6.2.0.ER3.

Comment 11 Miroslav Novak 2013-10-04 09:42:33 UTC
Verified in EAP 6.2.0.ER3. There are no more "Replication Large MessageID..." warnings in the log. Setting as verified.