Bug 963980

Summary: Session failover does not work for "read-only" sessions
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: dereed
Component: ClusteringAssignee: Radoslav Husar <rhusar>
Status: CLOSED WONTFIX QA Contact: Jitka Kozana <jkudrnac>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.0.1CC: jkudrnac, paul.ferraro, rhusar
Target Milestone: ---   
Target Release: EAP 6.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-19 21:30:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dereed 2013-05-16 23:27:21 UTC
Description of problem:

If a session is not updated (only read from) for at least the session timeout, it will not fail over to another node.

This is because session timeout for a clustered session is calculated on every node in the cluster based on the last update to the session, which is incorrect.

On failover, one of two things appears to happen:
1.  the session timestamp in Infinispan is checked, and the session is expired
2.  the session no longer exists in Infinispan because it was incorrectly removed
Both of these are wrong.

Version-Release number of selected component (if applicable):
EAP 6.0.1

How reproducible:
Consistently

Steps to Reproduce:
1.  Deploy a <distributable/> application to a cluster with Clustered SSO enabled: <sso cache-container="web" cache-name="sso"/> (optionally with <session-timeout>1</session-timeout> for quicker testing)
2.  Access a page that creates a session
3.  Access other pages that read, but do not write to the session (don't let the session time out)
4.  At least <session-timeout> after #2, fail over to another node
  
Actual results:
#4 creates a new session

Expected results:
#4 accesses the still valid session

Additional info:

Comment 1 dereed 2013-09-19 21:30:56 UTC
Limitation of the current session replication implementation.

It can be worked around (at a performance cost) by using <replication-trigger>ACCESS</replication-trigger>.

Comment 2 Radoslav Husar 2013-09-20 13:10:51 UTC
Seems to me it's too much of an issue to just let go just like this. Even if neither of the fixes is viable, we need to make sure this gets documented properly.

To fill in background a little, IncomingDistributableSessionData which encapsulate the timestamp -- from which the validity of the session is calculated -- is not replicated on every access. This results that on failover the "read-only" session that has been accessed within the expiration time will still be claimed invalid and new session is created instead (org.apache.catalina.connector.Request.doGetSession(...)).

One of the mechanisms to mitigate this is "maxUnreplicatedInterval" which replicates timestamp even if the rest of the session is not dirty. Turned off by default (value -1), value 0 makes it replicate on every access, etc. It also returns correct values for HttpSession.getLastAccessedTime() calls following failover.

Here are few suggestions to fix/mitigate:

1/ On failover, do not perform a validity check based on the timestamp. This would leave a rather small window open -- in between session timeout and session being expired and removed from cache -- that if the session was to be failed over it would be valid even though it should have expired. (Compare to current level of correctness, when we remove a session even though we should not).

2/ Replicate accesstime on every access/more frequently/change default for maxUnreplicatedInterval to ~60; needs to checked for viability.

3/ In case of failover for other reasons than node failure, the correct access time can be fetched from the remote note. Covers only a minority of cases; needs to checked for viability.