Bug 111560
| Summary: | BaseDispatcher cache of applications is not invalidated when Application objects change | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Web Application Framework | Reporter: | Daniel Berrangé <berrange> |
| Component: | other | Assignee: | Vadim Nasardinov <vnasardinov> |
| Status: | CLOSED RAWHIDE | QA Contact: | Jon Orris <jorris> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | nightly | CC: | bche, richardl |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2004-01-27 21:08:38 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 102913 | ||
|
Description
Daniel Berrangé
2003-12-05 14:28:15 UTC
There are two ways to address this ticket that have been brought up
and partially discussed. The first approach is to forcibly expire
cache entries every 15 minutes or so, and recompute them on demand.
The second approach is to do the same thing we do in
com.arsdigita.kernel.SiteNode, which is to use CacheTable/CacheServlet
as the caching mechanism that attempts to provide some degree of cache
coherence across nodes in a cluster.
Let's look at the first approach. There are two use cases that need
to be addressed.
Use Case 1.
An application is unmounted and is (optionally) mounted on a
different URL.
The user may continue to be able to access the unmounted
application at its old URL for 7.5 minutes on average (but no
longer than 15 minutes). That would seem to be acceptable in many
situations.
Once the stale cache entry is expired, the old URL will give a 404
or some such, which is the correct behavior.
If the application is remounted on a new URL, it will be
immediately available at this new path. (This in addition to
possibly being also accessible at the old URL for up to 15
minutes.)
Use Case 2.
Same as previous, but the user mounts a different application
instance on the old URL.
They may continue to see the old unmounted application instead of
the new one at this URL for up to 15 minutes (averaging 7.5
minutes). This may cause confusion. The situation will
straighten itself out no later than in 15 minutes.
It is Use Case #2 that makes us consider the alternative
CacheTable/CacheServlet-based solution. Before examining the
alternative, let's take another look at Use Case #2.
The first question to ask is, how often is it going to happen? If it
happens twice a year, I think 15 minutes of confusion is not a
terribly high price to pay for an otherwise vastly more reliable
solution. The second thing to do is notice that our Portal Server
currently makes Use Case #2 impossible. The URLs on which the portal
server mounts applications currently seem to always include the
application id, which is a unique key. Therefore, it impossible to
mount application Y on a URL previously occupied by application X.
The second, CacheTable-based approach is inherently less reliable, as
it requires correct interoperation of two or more nodes. The only
place where CacheTable is currently used in our system is
com.arsdigita.kernel.SiteNode. Note that we currently have no idea
whether this caching code actually works or not. Why don't we know?
Two reasons. Number one, I don't think our QA and Scaling currently
test multi-node configurations. Number two, SiteNode has been
deprecated for some time now. I don't think we have any apps where
site nodes are mounted and remounted at runtime. Therefore, even in
production systems that do use multi-node setups, the cache coherence
code is never actually exercised.
Even though I could go ahead and implement a CacheTable-based solution
for this ticket, I will not be able to test to my (or anyone else's)
satisfaction. It _may_ work by coincidence.
I can and will test the time-limited cache solution, if we choose to
take this route.
So, it comes down to this. Do we want to implement an inherently
unreliable, untested, and untestable (by me) solution for a use case
that never comes up in our non-APLAWS product and only occasionally
comes up in APLAWS? Or do we want to implement a much simpler, easily
testable (and therefore vastly more reliable) solution, whose only
drawback is the potential to cause 15 minutes of confusion per install
per year, and which can be remedied by documenting the (rarely
encountered) confusing edge case?
> Number one, I don't think our QA and Scaling currently > test multi-node configurations. If true, then this is a severe shortcoming of QA / Scaling tests which must be rectified asap, since every single production deployment of CCM runs in a multi-node configuration. > Number two, SiteNode has been > deprecated for some time now. I don't think we have any apps where > site nodes are mounted and remounted at runtime. Therefore, even in > production systems that do use multi-node setups, the cache coherence > code is never actually exercised. SiteNodes are still used & are created/destroyed/changed at runtime - the Application backs its urls with site node instances. The cache table code in the site node cache is exercised on created, updating and deletion of site nodes. Both APLAWS portals & PortalServer have the capability to create Applications & thus implicitly are creating SiteNodes, in turn exercising the CacheTable code. The cache table is also used in the CMS dispatcher: dan@camden$ find cms/src -name '*.java' | xargs grep -l CacheTable cms/src/com/arsdigita/cms/ContentSectionServlet.java cms/src/com/arsdigita/cms/dispatcher/ContentItemDispatcher.java dan@camden$ Marking as QA_READY as of change 39457, based on the following.
The ticket requirements are:
1.
> There does not appear to be any invalidation of this cache when
> the URL on which an application is mounted is changed.
[meaning: cache invalidation needs to be implemented]
2.
> This invalidation will need to be done across JVM instances in a
> cluster too.
Based on discussion that happened partially in the above posts and
partly elsewhere, it was decided that for consistency with the
existing product, BaseDispatcher's caching mechanism needed to be
implemented in the same way that SiteNode did it, i.e. via the
CacheTable class.
While working on this ticket, I discovered the following things:
(a) The SiteNode implementation did not do any cache invalidation
either. However, based on Dan's feedback, this has not caused
any problems in production to date. The word on the street is
that the current (i.e. pre-39457) implementation works fine.
(b) The CacheTable thingy is not as bad as I originally thought.
First and foremost, it implements a size-limited, time-limited
cache with the LRU eviction policy. The current default age
limit is 300 seconds (that is, five minutes). Secondly,
CacheTable tries to be smart and invalidate peer cache tables as
early as possible via HTTP "messaging". Whether or not this
early invalidation actually works under real load is largely
irrelevant, because local LRU invalidation due to size and age
limits appears to happen quite reliably.
So, basically, CacheTable already implements the simple and reliable
approach of forcible periodic cache eviction that I was advocating in
comment #1. The fact that it also tries to add a half-assed messaging
mechanism on top of the reliable LRU policy seems to do no harm.
I changed SiteNode's (and BaseDispatcher's) implementation to
invalidate the underlying cache table in its entirety, whenever URL
mappings change. This takes care of requirement #1. The CacheTable's
LRU policy comes as close as practically feasible to satisfying
requirement #2. In any case, the current implementation is no worse
than the old one in when it comes to #2.
P.S. A fairly massive cleanup of the CacheTable and its friends happened
as part of this ticket. For details, see
$ p4 changes //core-platform/dev/src/com/arsdigita/caching/...@39293,39457
As a follow-up to comment #3 > Whether or not this early invalidation actually works under real > load is largely irrelevant, Jon Orris reports in bug 115066 that early invalidation works incorrectly. |