Bug 1159162
Summary: | Possible false positive suspect of FD_HOST when the number of hosts is large | |||
---|---|---|---|---|
Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Osamu Nagano <onagano> | |
Component: | JGroups | Assignee: | Tristan Tarrant <ttarrant> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Martin Gencur <mgencur> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 6.3.1 | CC: | bban, dstahl, gsheldon, ksuzumur, mhusnain, pslavice, rvansa, sjacobs, slaskawi, tkimura, wfink | |
Target Milestone: | CR1 | Flags: | ksuzumur:
needinfo+
|
|
Target Release: | 6.3.2 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Previously in Red Hat JBoss Data Grid, when using the FD_HOST protocol in JGroups for node failure detection (whether the node was alive was checked using ICMP pings), a node was suspected to be dead even if it was responsive. This issue was more likely to occur in larger clusters.
This issue is now fixed in JBoss Data Grid 6.3.2.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1161529 (view as bug list) | Environment: | ||
Last Closed: | 2015-01-26 14:03:42 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1161529 |
Description
Osamu Nagano
2014-10-31 06:00:38 UTC
Currently we have 2 loops, ping loop and cheking timeout loop. Full GC during the ping loop affects the multiple hosts and sends unnecessarily many false suspects. To reduce the impact of the Full GC, we can combine the 2 loops into 1 loop, ping and checking timeout each host, so the Full GC delay only affects to a single host and never affect to other hosts. Will send a pull request later. Takayoshi Kimura <tkimura> updated the status of jira JGRP-1898 to Coding In Progress Takayoshi Kimura <tkimura> updated the status of jira JGRP-1898 to Open Thanks for the PR ! I applied it (changing the logic slightly, see comments on the PR) and backported it to the 3.5 and 3.4 branches. I suggest create a JGroups JAR from a snapshot of the 3.4 branch and test this change. Once you tell me it works, I can release a 3.4.7.Final. Bela Ban <bela> updated the status of jira JGRP-1898 to Resolved Tested 3.4 branch but it doesn't work. Looking at the source code, the change is not correctly applied. Will send a PR shortly. Fixed, tested, verified and sent PRs for 3.4 and 3.5. https://github.com/belaban/JGroups/pull/181 https://github.com/belaban/JGroups/pull/182 fixed typo jdg-6.3.x PR: https://github.com/infinispan/jdg/pull/366 Unit test attached to JGroups JIRA. |