Bug 1168712

Summary: HQ222010: Critical IO Error, shutting down the server. file=NIOSequentialFile...
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: Clebert Suconic <csuconic>
Status: CLOSED CURRENTRELEASE QA Contact: Peter Mackay <pmackay>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.4.0CC: ataylor, csuconic, jawilson, jbertram, mnovak, msvehla, mtaylor, pmackay, rstancel
Target Milestone: CR1   
Target Release: EAP 6.4.13   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-03 16:42:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1386335, 1390788    
Attachments:
Description Flags
server-logs.zip none

Description Miroslav Novak 2014-11-27 16:04:58 UTC
Created attachment 962151 [details]
server-logs.zip

Description of problem:
Sometimes there is "Critical IO Error" in log of backup server after failover from live to backup.

Test scenario:
1. Start 2 EAP 6.4.0.DR11 in collocated HA topology with shared store (journal-type NIO)
2. Start producer and consumer with transacted session connected to 1st server server
3. Kill 1st EAP server so clients failover to collocated backup on 2nd server
4. Wait for clients to finish

Result:
Sometimes happens that after step 3. collocated backup on 2nd EAP server does not activate and logs warning "HQ222010: Critical IO Error, shutting down the server...". (see full stack trace below).

We saw similar problem in the past in bz#989520

How reproducible:
We don't have a reproducer at this moment.

Expected results:
Backup should start without problem. 

Additional info: 
I'm attaching trace logs from EAP servers.

Full stack trace from backup:
08:59:39,083 WARN  [org.hornetq.core.server] (Old I/O server worker (parentId: 1934013939, [id: 0x7346b5f3, /127.0.0.1:7446])) HQ222010: Critical IO Error, shutting down the server. file=NIOSequentialFile /mnt/j
bm/hudson/hornetq-journal-A-colocated/large-messages/2147484245.msg, message=null: HornetQException[errorType=IO_ERROR message=null]
        at org.hornetq.core.journal.impl.NIOSequentialFile.open(NIOSequentialFile.java:111) [hornetq-journal-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.journal.impl.NIOSequentialFile.open(NIOSequentialFile.java:96) [hornetq-journal-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.persistence.impl.journal.LargeServerMessageImpl.openFile(LargeServerMessageImpl.java:434) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.persistence.impl.journal.LargeServerMessageImpl.validateFile(LargeServerMessageImpl.java:406) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.persistence.impl.journal.LargeServerMessageImpl.addBytes(LargeServerMessageImpl.java:111) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.server.impl.ServerSessionImpl.sendContinuations(ServerSessionImpl.java:1370) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:472) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:641) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:557) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:533) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:587) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:72) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:122) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.socket.oio.OioWorker.process(OioWorker.java:71) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:51) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:175) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45]
        at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]
Caused by: java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184) [rt.jar:1.6.0_45]
        at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:304) [rt.jar:1.6.0_45]
        at org.hornetq.core.journal.impl.NIOSequentialFile.open(NIOSequentialFile.java:107) [hornetq-journal-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        ... 30 more

Comment 1 Miroslav Novak 2014-11-27 16:06:03 UTC
This relates to jira - HORNETQ-1241

Comment 2 Miroslav Novak 2014-11-28 08:13:07 UTC
I can see this issue in other failover tests as well. Thus increasing severity.

Comment 3 Clebert Suconic 2014-12-04 14:55:51 UTC
Did you wait the server to backup its file before failback?

Comment 4 Miroslav Novak 2014-12-04 15:00:33 UTC
In this scenario is used shared store. There is no need to wait for synchronization.

Comment 5 Clebert Suconic 2015-06-29 17:17:40 UTC
I believe this would be    BZ-1193085

Comment 6 Radovan STANCEL 2016-12-06 14:46:29 UTC
PR https://github.com/hornetq/hornetq/pull/2105

this issue is already merged -> moving to CP 13

Comment 7 Radovan STANCEL 2016-12-09 11:07:57 UTC
7.1 and WF branch are solved by ARTEMIS-831.
7.0 is not containing the issue - confirmed by Tom Ross and Clebert

Comment 8 Peter Mackay 2017-01-24 17:21:42 UTC
Verified with EAP 6.4.13.CP.CR2

Comment 9 Petr Penicka 2017-02-03 16:42:07 UTC
Released with EAP 6.4.13 on Feb 02 2017.