Bug 1168712 - HQ222010: Critical IO Error, shutting down the server. file=NIOSequentialFile...
Summary: HQ222010: Critical IO Error, shutting down the server. file=NIOSequentialFile...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: HornetQ
Version: 6.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: CR1
: EAP 6.4.13
Assignee: Clebert Suconic
QA Contact: Peter Mackay
URL:
Whiteboard:
Depends On:
Blocks: eap6413-payload 1390788
TreeView+ depends on / blocked
 
Reported: 2014-11-27 16:04 UTC by Miroslav Novak
Modified: 2017-02-03 16:42 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-03 16:42:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
server-logs.zip (514.32 KB, application/zip)
2014-11-27 16:04 UTC, Miroslav Novak
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Apache JIRA ARTEMIS-831 0 None None None 2016-12-09 11:07:57 UTC
Red Hat Issue Tracker HORNETQ-1241 0 Major Open Closing replication channel makes Netty send disconnect which causes backup to interrupt 2018-09-27 12:37:15 UTC

Description Miroslav Novak 2014-11-27 16:04:58 UTC
Created attachment 962151 [details]
server-logs.zip

Description of problem:
Sometimes there is "Critical IO Error" in log of backup server after failover from live to backup.

Test scenario:
1. Start 2 EAP 6.4.0.DR11 in collocated HA topology with shared store (journal-type NIO)
2. Start producer and consumer with transacted session connected to 1st server server
3. Kill 1st EAP server so clients failover to collocated backup on 2nd server
4. Wait for clients to finish

Result:
Sometimes happens that after step 3. collocated backup on 2nd EAP server does not activate and logs warning "HQ222010: Critical IO Error, shutting down the server...". (see full stack trace below).

We saw similar problem in the past in bz#989520

How reproducible:
We don't have a reproducer at this moment.

Expected results:
Backup should start without problem. 

Additional info: 
I'm attaching trace logs from EAP servers.

Full stack trace from backup:
08:59:39,083 WARN  [org.hornetq.core.server] (Old I/O server worker (parentId: 1934013939, [id: 0x7346b5f3, /127.0.0.1:7446])) HQ222010: Critical IO Error, shutting down the server. file=NIOSequentialFile /mnt/j
bm/hudson/hornetq-journal-A-colocated/large-messages/2147484245.msg, message=null: HornetQException[errorType=IO_ERROR message=null]
        at org.hornetq.core.journal.impl.NIOSequentialFile.open(NIOSequentialFile.java:111) [hornetq-journal-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.journal.impl.NIOSequentialFile.open(NIOSequentialFile.java:96) [hornetq-journal-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.persistence.impl.journal.LargeServerMessageImpl.openFile(LargeServerMessageImpl.java:434) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.persistence.impl.journal.LargeServerMessageImpl.validateFile(LargeServerMessageImpl.java:406) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.persistence.impl.journal.LargeServerMessageImpl.addBytes(LargeServerMessageImpl.java:111) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.server.impl.ServerSessionImpl.sendContinuations(ServerSessionImpl.java:1370) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:472) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:641) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:557) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:533) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:587) [hornetq-server-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:72) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:122) [hornetq-core-client-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.socket.oio.OioWorker.process(OioWorker.java:71) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:51) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:175) [netty-3.6.10.Final-redhat-1.jar:3.6.10.Final-redhat-1]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45]
        at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]
Caused by: java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184) [rt.jar:1.6.0_45]
        at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:304) [rt.jar:1.6.0_45]
        at org.hornetq.core.journal.impl.NIOSequentialFile.open(NIOSequentialFile.java:107) [hornetq-journal-2.3.24.Final-redhat-1.jar:2.3.24.Final-redhat-1]
        ... 30 more

Comment 1 Miroslav Novak 2014-11-27 16:06:03 UTC
This relates to jira - HORNETQ-1241

Comment 2 Miroslav Novak 2014-11-28 08:13:07 UTC
I can see this issue in other failover tests as well. Thus increasing severity.

Comment 3 Clebert Suconic 2014-12-04 14:55:51 UTC
Did you wait the server to backup its file before failback?

Comment 4 Miroslav Novak 2014-12-04 15:00:33 UTC
In this scenario is used shared store. There is no need to wait for synchronization.

Comment 5 Clebert Suconic 2015-06-29 17:17:40 UTC
I believe this would be    BZ-1193085

Comment 6 Radovan STANCEL 2016-12-06 14:46:29 UTC
PR https://github.com/hornetq/hornetq/pull/2105

this issue is already merged -> moving to CP 13

Comment 7 Radovan STANCEL 2016-12-09 11:07:57 UTC
7.1 and WF branch are solved by ARTEMIS-831.
7.0 is not containing the issue - confirmed by Tom Ross and Clebert

Comment 8 Peter Mackay 2017-01-24 17:21:42 UTC
Verified with EAP 6.4.13.CP.CR2

Comment 9 Petr Penicka 2017-02-03 16:42:07 UTC
Released with EAP 6.4.13 on Feb 02 2017.


Note You need to log in before you can comment on or make changes to this bug.