963980 – Session failover does not work for "read-only" sessions

Bug 963980 - Session failover does not work for "read-only" sessions

Summary: Session failover does not work for "read-only" sessions

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	JBoss Enterprise Application Platform 6
Classification:	JBoss
Component:	Clustering
Sub Component:
Version:	6.0.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	EAP 6.2.0
Assignee:	Radoslav Husar
QA Contact:	Jitka Kozana
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-05-16 23:27 UTC by dereed
Modified:	2013-10-22 01:25 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-09-19 21:30:56 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Description dereed 2013-05-16 23:27:21 UTC

Description of problem:

If a session is not updated (only read from) for at least the session timeout, it will not fail over to another node.

This is because session timeout for a clustered session is calculated on every node in the cluster based on the last update to the session, which is incorrect.

On failover, one of two things appears to happen:
1.  the session timestamp in Infinispan is checked, and the session is expired
2.  the session no longer exists in Infinispan because it was incorrectly removed
Both of these are wrong.

Version-Release number of selected component (if applicable):
EAP 6.0.1

How reproducible:
Consistently

Steps to Reproduce:
1.  Deploy a <distributable/> application to a cluster with Clustered SSO enabled: <sso cache-container="web" cache-name="sso"/> (optionally with <session-timeout>1</session-timeout> for quicker testing)
2.  Access a page that creates a session
3.  Access other pages that read, but do not write to the session (don't let the session time out)
4.  At least <session-timeout> after #2, fail over to another node
  
Actual results:
#4 creates a new session

Expected results:
#4 accesses the still valid session

Additional info:

Comment 1 dereed 2013-09-19 21:30:56 UTC

Limitation of the current session replication implementation.

It can be worked around (at a performance cost) by using <replication-trigger>ACCESS</replication-trigger>.

Comment 2 Radoslav Husar 2013-09-20 13:10:51 UTC

Seems to me it's too much of an issue to just let go just like this. Even if neither of the fixes is viable, we need to make sure this gets documented properly.

To fill in background a little, IncomingDistributableSessionData which encapsulate the timestamp -- from which the validity of the session is calculated -- is not replicated on every access. This results that on failover the "read-only" session that has been accessed within the expiration time will still be claimed invalid and new session is created instead (org.apache.catalina.connector.Request.doGetSession(...)).

One of the mechanisms to mitigate this is "maxUnreplicatedInterval" which replicates timestamp even if the rest of the session is not dirty. Turned off by default (value -1), value 0 makes it replicate on every access, etc. It also returns correct values for HttpSession.getLastAccessedTime() calls following failover.

Here are few suggestions to fix/mitigate:

1/ On failover, do not perform a validity check based on the timestamp. This would leave a rather small window open -- in between session timeout and session being expired and removed from cache -- that if the session was to be failed over it would be valid even though it should have expired. (Compare to current level of correctness, when we remove a session even though we should not).

2/ Replicate accesstime on every access/more frequently/change default for maxUnreplicatedInterval to ~60; needs to checked for viability.

3/ In case of failover for other reasons than node failure, the correct access time can be fetched from the remote note. Covers only a minority of cases; needs to checked for viability.

Note You need to log in before you can comment on or make changes to this bug.