Bug 809042
| Summary: | View change causes state transfer timeout time to be reduced to RPC timeout time. | ||
|---|---|---|---|
| Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Michal Linhard <mlinhard> |
| Component: | Infinispan | Assignee: | Tristan Tarrant <ttarrant> |
| Status: | CLOSED NOTABUG | QA Contact: | Michal Linhard <mlinhard> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.0.0 | CC: | dberinde, jdg-bugs, mhusnain, nobody |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
RPC timeout indicates the maximum timeout allowed for remote user operations like GET and PUT. State transfer timeout contains the timeout value allowed for internal remote operations used by the state transfer process (for example, APPLY_STATE). DistributedSyncTimeout is used for cache view installations, which include state tranfers. As a result, the value set for distributedSyncTimeout should be greater than the value of stateTransfer.timeout.
</para>
<para>
However, the state transfer timeout does not change during the state transfer. As a result, if a user operation fails with a TimeoutException, it is because rpcTimeout milliseconds passes, regardless of whether it occurs during a state transfer or not.
</para>
<para>
In this instance, there is an unstable cluster in which nodes 2 and 3 consider themselves part of the cluster, while the remaining nodes in the cluster do not view them as part of the cluster. It takes over one minute to establish a common/stable JGroups cluster view and to perform a state transfer, therefore all pending operations fail.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-04-20 11:53:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Michal Linhard
2012-04-02 10:33:25 UTC
Who is assigned to provide CCFR information for this? I'd like to know who I should follow up with if this information remains blank. RPC timeout means the maximum amount of time allowed for remote user operations like GET and PUT. State transfer timeout means the amount of time allowed for internal remote operations that are used by the state transfer process (like APPLY_STATE). DistributedSyncTimeout is used for cache view installations, and cache view installation includes state transfer at the moment - so distributedSyncTimeout should be greater than stateTransfer.timeout. But the timeout for user operations doesn't really change during state transfer. So if a particular user operation fails with a TimeoutException, it's because rpcTimeout milliseconds passed - it doesn't matter if that happened during a state transfer or not. In this particular instance we had an unstable cluster and at several points in time nodes 2 and 3 thought they were part of the cluster but the other nodes didn't think the same way. Since it took more than 1 minute to establish a common/stable JGroups cluster view and perform state transfer, it is expected behaviour that all pending user operation would fail.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
RPC timeout indicates the maximum timeout allowed for remote user operations like GET and PUT. State transfer timeout contains the timeout value allowed for internal remote operations used by the state transfer process (for example, APPLY_STATE). DistributedSyncTimeout is used for cache view installations, which include state tranfers. As a result, the value set for distributedSyncTimeout should be greater than the value of stateTransfer.timeout.
However, the state transfer timeout does not change during the state transfer. As a result, if a user operation fails with a TimeoutException, it is because rpcTimeout milliseconds passes, regardless of whether it occurs during a state transfer or not.
In this instance, there is an unstable cluster in which nodes 2 and 3 consider themselves part of the cluster, while the remaining nodes in the cluster do not view them as part of the cluster. It takes over one minute to establish a common/stable JGroups cluster view and to perform a state transfer, therefore all pending operations fail.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1,5 +1,7 @@
RPC timeout indicates the maximum timeout allowed for remote user operations like GET and PUT. State transfer timeout contains the timeout value allowed for internal remote operations used by the state transfer process (for example, APPLY_STATE). DistributedSyncTimeout is used for cache view installations, which include state tranfers. As a result, the value set for distributedSyncTimeout should be greater than the value of stateTransfer.timeout.
-
+</para>
+<para>
However, the state transfer timeout does not change during the state transfer. As a result, if a user operation fails with a TimeoutException, it is because rpcTimeout milliseconds passes, regardless of whether it occurs during a state transfer or not.
-
+</para>
+<para>
In this instance, there is an unstable cluster in which nodes 2 and 3 consider themselves part of the cluster, while the remaining nodes in the cluster do not view them as part of the cluster. It takes over one minute to establish a common/stable JGroups cluster view and to perform a state transfer, therefore all pending operations fail.
Dan Berindei <dberinde> updated the status of jira ISPN-1961 to Resolved |