Bug 1169418
Summary: | [QA](6.4.z) Calling last session.commit() does not get a response and throws "javax.jms.JMSException: HQ119014: Timed out waiting for response when sending packet 43" to client | ||
---|---|---|---|
Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Miroslav Novak <mnovak> |
Component: | HornetQ | Assignee: | Yong Hao Gao <hgao> |
Status: | CLOSED WONTFIX | QA Contact: | Miroslav Novak <mnovak> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4.0 | CC: | ataylor, bmaxwell, hgao, jbertram, mnovak, msvehla |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-03-01 12:29:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Miroslav Novak
2014-12-01 15:47:44 UTC
Just like with failover or failback, you are expected to treat for errors and retry... right? Clients retry only for JMSException. IndexOutOfBoundsException is not subtype of it, so this exception ends the client. There should be JMSException in Actual and Expected result instead of IndexOutOfBoundsException. I've accidently copied it from bz#1169348. Problem here is with the last commit which timeouts and JMSException is thrown for every retry. I'm confused a bit on this issue. Clebert and you both seem to agree that the client should retry on a JMSException, and that is the kind of exception which is thrown - a fact confused by a cut/paste mistake. So, is the client retrying when this JMSException is thrown? If not, why not? Please clarify. Client is retrying commit() if JMSException is thrown. There are ~5 retries before giving up. However None of those retries is successful. Hi Howard, I can see the same exception when backup is synchronizing with live. The test scenario is: 1. Start 2 EAP 6.4.0.ER1 servers in dedicated topology with replicated journal 2. Start producer which sends large messages to queue 3. Wait 4 minutes (this is because we need syncing with backup to take a long time > 30s) 4. Start backup server which starts syncing with live (it must take longer call-timeout=30s) You will see that Producer gets JMSException: javax.jms.JMSException: HQ119014: Timed out waiting for response when sending packet 43 at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:390) at org.hornetq.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:570) at org.hornetq.core.client.impl.DelegatingSession.commit(DelegatingSession.java:156) at org.hornetq.jms.client.HornetQSession.commit(HornetQSession.java:229) at org.jboss.qa.hornetq.apps.clients.ProducerTransAck.commitSession(ProducerTransAck.java:210) at org.jboss.qa.hornetq.apps.clients.ProducerTransAck.run(ProducerTransAck.java:100) Caused by: HornetQException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 43] Synchronization timeouts all blocking client calls. When backup is synced and announced then everything returns to normal. Is it possible to send packet to client with something like - "wait, i'm syncing with backup now" so client does not throw JMSException. This is possible to hit with test from HQ upstream test suite: BackupSyncJournalTest#testReserveFileIdValuesOnBackup when it's run with low call-timeout - add below to setUp() method: locator.setCallTimeout(5000); Here it fails when producer tries to send more messages during synchronization. (block-on-durable-send=true) Hi Mirek, I'm still debugging it. So far as I observed that something strange happens during the replication. At some time the replication request take more than 30 sec to get back response, and the 30 sec seems not have spent in replication. I'm tracing the netty layer code now and see if I can find more. Howard Hi Howard, BackupSyncJournalTest#testReserveFileIdValuesOnBackup is using BackupSyncDelay class to slow down replication between live and backup. Maybe it's not well implemented. Anyway I can see the same behaviour without it. (by the way in step 1. should be started only live server not both) Thanks, Mirek |