Bug 1158920
Summary: | InternalError exception thrown after fail-over | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Miroslav Novak <mnovak> | ||||||
Component: | HornetQ | Assignee: | Clebert Suconic <csuconic> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Miroslav Novak <mnovak> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.4.0 | CC: | ataylor, cdewolf, csuconic, dandread, jbertram, kkhan, msvehla, rsvoboda | ||||||
Target Milestone: | DR12 | ||||||||
Target Release: | EAP 6.4.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | dev_best_effort, dev_no_blocker | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-08-19 12:43:46 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1166735 | ||||||||
Attachments: |
|
Description
Miroslav Novak
2014-10-30 14:02:32 UTC
Created attachment 952144 [details]
logs.zip
Link to failed test in Jenkins: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-HornetQ/job/eap-60-hornetq-ha-failover-dedicated/226/testReport/org.jboss.qa.hornetq.test.failover/DedicatedFailoverTestCase/testFailoverClientAckTopic/ We hit this issue again during EAP 6.4.0.DR9 testing but in slightly different scenario: 1. Start 2 EAP 6.4.0.DR9 servers in dedicated topology with shared store and deploye queue InQueue and OutQueue 2. Send 2000 messages to InQueue to 1st EAP server (live) 3. Start 3rd EAP 6.4.0.DR9 with deployed MDB. MDB reads messages through remote JCA from InQueue and sends to OutQueue (in XA transaction) 4. When MDB is processing messages then cleanly shutdown 1st server (live) 5. MDB failovers to 2nd server (backup) 6. Wait for MDB to finish processing and read all messages from OutQueue. 7. Check number of send and received messages. Problem occurred in step 5. MDB did not receive any new messages from backup after failover. I can see following warnings in log of 2nd EAP (backup) server: 14:13:31,320 ERROR [stderr] (hornetq-discovery-group-thread-dg-group1) Exception in thread "hornetq-discovery-group-thread-dg-group1" java.lang.InternalError: unhandled utf8 byte 0 14:13:31,323 ERROR [stderr] (hornetq-discovery-group-thread-dg-group1) at org.hornetq.utils.UTF8Util.readUTF(UTF8Util.java:164) 14:13:31,323 ERROR [stderr] (hornetq-discovery-group-thread-dg-group1) at org.hornetq.core.buffers.impl.ChannelBufferWrapper.readUTF(ChannelBufferWrapper.java:105) 14:13:31,323 ERROR [stderr] (hornetq-discovery-group-thread-dg-group1) at org.hornetq.core.buffers.impl.ChannelBufferWrapper.readStringInternal(ChannelBufferWrapper.java:95) 14:13:31,329 ERROR [stderr] (hornetq-discovery-group-thread-dg-group1) at org.hornetq.core.buffers.impl.ChannelBufferWrapper.readString(ChannelBufferWrapper.java:77) 14:13:31,330 ERROR [stderr] (hornetq-discovery-group-thread-dg-group1) at org.hornetq.core.cluster.DiscoveryGroup$DiscoveryRunnable.run(DiscoveryGroup.java:303) 14:13:31,331 ERROR [stderr] (hornetq-discovery-group-thread-dg-group1) at java.lang.Thread.run(Thread.java:745) ... 14:13:32,771 WARN [org.hornetq.core.server] (Thread-20 (HornetQ-server-HornetQServerImpl::serverUUID=bf00dd22-69d6-11e4-8bd5-8513819ff1c5-266116125)) HQ222015: Internal error! Delivery logic has identified a non delivery and still handled a consumer! Attaching logs-mdb-failover.zip with info and trace logs from servers. Because this issue breaks HA, I'm increasing severity and setting blocker ?. Created attachment 956655 [details]
logs-mdb-failover.zip
Couldn't this being caused by the fact we compiled the latest release with Java8? We should be more resilient to failures on the DiscvoeryRunnable. Any exceptions would interrupt the loop as identified by Miroslav. Miro was spot on the issue. The fix here will be simple: while (started) { try { } catch (Thrwoable e) // do the exception treatment inside the while, don't interrupt it if any exception happened { // logging } } I'm not sure how to test this though.. .we would need a ByteMan test interrupting the send and running it in a loop I have a fix for this one already.. leave it with me Solved by HQ 2.3.24.Final upgrade During EAP 6.4.0.DR11 testing cycle this issue was not hit. Still I'll not set this as verified and check again with DR12 to have some confidence that issue is gone and there are no more problems. Moving to DR12 to have it in priority filter No related issue was found during EAP 6.4.0.DR12 testing. Setting as verified. |