Bug 882162
| Summary: | Segment transfer not restarted if the owner fails | ||
|---|---|---|---|
| Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Radim Vansa <rvansa> |
| Component: | Infinispan | Assignee: | Tristan Tarrant <ttarrant> |
| Status: | CLOSED UPSTREAM | QA Contact: | Nobody <nobody> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.1.0 | CC: | jdg-bugs, nobody |
| Target Milestone: | ER10 | ||
| Target Release: | 6.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2025-02-10 03:27:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Radim Vansa
2012-11-30 10:04:40 UTC
Adrian Nistor <anistor> updated the status of jira ISPN-2574 to Coding In Progress Dan Berindei <dberinde> made a comment on jira ISPN-2574 Fix integrated in master, leaving the issue open until we add a test as well. Tristan Tarrant <ttarrant> made a comment on jira ISPN-2574 Can't we close this issue and create a new one just for the test ? Mircea Markus <mmarkus> made a comment on jira ISPN-2574 That's if Adrian is confident that it's fixed without a test. Adrian Nistor <anistor> made a comment on jira ISPN-2574 Ok, let's close this and create a separate issue for the unit test. Adrian Nistor <anistor> made a comment on jira ISPN-2574 Closing this so it can go to QE. Created a separate issue for the unit test: ISPN-2569 Adrian Nistor <anistor> made a comment on jira ISPN-2574 Closing this so it can go to QE. Created a separate issue for the unit test: ISPN-2596 Michal Linhard <mlinhard> updated the status of jira ISPN-2574 to Reopened Michal Linhard <mlinhard> made a comment on jira ISPN-2574 Adrian please check out my test case: https://github.com/mlinhard/infinispan/commit/8681c35c95aeba128ae28a1c2aba9609b2b9e2b8 it doesn't work for current master. the test scenario is a more simple one: config: distribution, num owners 2 1. create cluster {A,B}, fill 1000 entries 2. join C 3. when B is about to send StateResponseCommand to C, fail B, never send the command 4. C should restart the state transfer and ask the same segments from A 5. cluster {A, C} will form with all segments properly backed up on both A and C Beta4 didn't restart the state transfer which meant some entries weren't properly transfered to C Beta6 did this alright current master again fails to restart the state transfer from A Michal Linhard <mlinhard> made a comment on jira ISPN-2574 On current master this test crashes on this line: {code} final Cache<Object, Object> c2 = cache(2); {code} but when catch the exception (by replacing it with): {code} Cache<Object, Object> aCache = null; while (aCache == null) { try { aCache = cache(2); } catch (Exception e) { log.error("Problem obtaining cache: ", e); } } final Cache<Object, Object> c2 = aCache; {code} i still can't see the StateRequestCommand being sent from C to A (after B is killed) This is fine in ER6 but this functionality has changed and might not be fine in ER8.... Michal Linhard <mlinhard> made a comment on jira ISPN-2574 Just checked, also fails for 5.2.0.CR1 Fails for ER8 Adrian Nistor <anistor> made a comment on jira ISPN-2574 Thanks for the unit test! I'm looking at this issue right now. Surefire wants it renamed to *Test. Will rename and integrate it. Adrian Nistor <anistor> made a comment on jira ISPN-2574 Two things were wrong: 1. An in-progress tasks that was fetching from a leaver was replaced by a new task from a new source but the existing task should also be interrupted otherwise the transfer thread is blocked forever. 2. The check in StateConsumerImpl.startTransferThread() that prevents two threads running at the same time was unsafe and could result in no thread running at all which means tasks pile up in taskQueue but are not processed. StateTransferRestartTest that fails in ER9 passes in ER10. This product has been discontinued or is no longer tracked in Red Hat Bugzilla. |