Bug 963980 - Session failover does not work for "read-only" sessions
Session failover does not work for "read-only" sessions
Status: CLOSED WONTFIX
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Clustering (Show other bugs)
6.0.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: EAP 6.2.0
Assigned To: Radoslav Husar
Jitka Kozana
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-16 19:27 EDT by dereed
Modified: 2013-10-21 21:25 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-19 17:30:56 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description dereed 2013-05-16 19:27:21 EDT
Description of problem:

If a session is not updated (only read from) for at least the session timeout, it will not fail over to another node.

This is because session timeout for a clustered session is calculated on every node in the cluster based on the last update to the session, which is incorrect.

On failover, one of two things appears to happen:
1.  the session timestamp in Infinispan is checked, and the session is expired
2.  the session no longer exists in Infinispan because it was incorrectly removed
Both of these are wrong.

Version-Release number of selected component (if applicable):
EAP 6.0.1

How reproducible:
Consistently

Steps to Reproduce:
1.  Deploy a <distributable/> application to a cluster with Clustered SSO enabled: <sso cache-container="web" cache-name="sso"/> (optionally with <session-timeout>1</session-timeout> for quicker testing)
2.  Access a page that creates a session
3.  Access other pages that read, but do not write to the session (don't let the session time out)
4.  At least <session-timeout> after #2, fail over to another node
  
Actual results:
#4 creates a new session

Expected results:
#4 accesses the still valid session

Additional info:
Comment 1 dereed 2013-09-19 17:30:56 EDT
Limitation of the current session replication implementation.

It can be worked around (at a performance cost) by using <replication-trigger>ACCESS</replication-trigger>.
Comment 2 Radoslav Husar 2013-09-20 09:10:51 EDT
Seems to me it's too much of an issue to just let go just like this. Even if neither of the fixes is viable, we need to make sure this gets documented properly.

To fill in background a little, IncomingDistributableSessionData which encapsulate the timestamp -- from which the validity of the session is calculated -- is not replicated on every access. This results that on failover the "read-only" session that has been accessed within the expiration time will still be claimed invalid and new session is created instead (org.apache.catalina.connector.Request.doGetSession(...)).

One of the mechanisms to mitigate this is "maxUnreplicatedInterval" which replicates timestamp even if the rest of the session is not dirty. Turned off by default (value -1), value 0 makes it replicate on every access, etc. It also returns correct values for HttpSession.getLastAccessedTime() calls following failover.

Here are few suggestions to fix/mitigate:

1/ On failover, do not perform a validity check based on the timestamp. This would leave a rather small window open -- in between session timeout and session being expired and removed from cache -- that if the session was to be failed over it would be valid even though it should have expired. (Compare to current level of correctness, when we remove a session even though we should not).

2/ Replicate accesstime on every access/more frequently/change default for maxUnreplicatedInterval to ~60; needs to checked for viability.

3/ In case of failover for other reasons than node failure, the correct access time can be fetched from the remote note. Covers only a minority of cases; needs to checked for viability.

Note You need to log in before you can comment on or make changes to this bug.