Description of problem: If connection ttl is equal to the check-period there is only 1 opportunity to validate the live / backup connection, cluster connection or replication connection. If a system is under highload or there is a minor network delay there is a high probability of a clister split / failure due to connection timeout. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Set connection-ttl or override to 30000ms and check-period to 30000 (default) 2. Add 500ms of latency to the network interface: tc qdisc add dev lo root handle 1:0 netem delay 500msec 3. Cluster will split Actual results: Cluster splits causing more then 1 live server in a live / backup pair. Expected results: Add a WARN log recommending not setting connection-ttl / override to the same value as the check-period for the reasons noted above. Additional info:
Ryan Emerson <remerson> updated the status of jira HORNETQ-1482 to Coding In Progress
Ryan Emerson <remerson> updated the status of jira HORNETQ-1482 to Resolved
2.3.x PR: https://github.com/hornetq/hornetq/pull/2033
2.3.25.x Commit: https://github.com/hornetq/hornetq/commit/20f9edd17b5673e94b24953295c54c71f319f190
Hi, we need to make this fix also for EAP/wildfly not only for standalone broker. parseMainConfig() method is never called when EAP is starting
Ryan Emerson <remerson> updated the status of jira HORNETQ-1482 to Reopened
Verified with 6.4.5.CP.CR1
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.