Bug 111560
Summary: | BaseDispatcher cache of applications is not invalidated when Application objects change | ||
---|---|---|---|
Product: | [Retired] Red Hat Web Application Framework | Reporter: | Daniel Berrangé <berrange> |
Component: | other | Assignee: | Vadim Nasardinov <vnasardinov> |
Status: | CLOSED RAWHIDE | QA Contact: | Jon Orris <jorris> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | nightly | CC: | bche, richardl |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-01-27 21:08:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 102913 |
Description
Daniel Berrangé
2003-12-05 14:28:15 UTC
There are two ways to address this ticket that have been brought up and partially discussed. The first approach is to forcibly expire cache entries every 15 minutes or so, and recompute them on demand. The second approach is to do the same thing we do in com.arsdigita.kernel.SiteNode, which is to use CacheTable/CacheServlet as the caching mechanism that attempts to provide some degree of cache coherence across nodes in a cluster. Let's look at the first approach. There are two use cases that need to be addressed. Use Case 1. An application is unmounted and is (optionally) mounted on a different URL. The user may continue to be able to access the unmounted application at its old URL for 7.5 minutes on average (but no longer than 15 minutes). That would seem to be acceptable in many situations. Once the stale cache entry is expired, the old URL will give a 404 or some such, which is the correct behavior. If the application is remounted on a new URL, it will be immediately available at this new path. (This in addition to possibly being also accessible at the old URL for up to 15 minutes.) Use Case 2. Same as previous, but the user mounts a different application instance on the old URL. They may continue to see the old unmounted application instead of the new one at this URL for up to 15 minutes (averaging 7.5 minutes). This may cause confusion. The situation will straighten itself out no later than in 15 minutes. It is Use Case #2 that makes us consider the alternative CacheTable/CacheServlet-based solution. Before examining the alternative, let's take another look at Use Case #2. The first question to ask is, how often is it going to happen? If it happens twice a year, I think 15 minutes of confusion is not a terribly high price to pay for an otherwise vastly more reliable solution. The second thing to do is notice that our Portal Server currently makes Use Case #2 impossible. The URLs on which the portal server mounts applications currently seem to always include the application id, which is a unique key. Therefore, it impossible to mount application Y on a URL previously occupied by application X. The second, CacheTable-based approach is inherently less reliable, as it requires correct interoperation of two or more nodes. The only place where CacheTable is currently used in our system is com.arsdigita.kernel.SiteNode. Note that we currently have no idea whether this caching code actually works or not. Why don't we know? Two reasons. Number one, I don't think our QA and Scaling currently test multi-node configurations. Number two, SiteNode has been deprecated for some time now. I don't think we have any apps where site nodes are mounted and remounted at runtime. Therefore, even in production systems that do use multi-node setups, the cache coherence code is never actually exercised. Even though I could go ahead and implement a CacheTable-based solution for this ticket, I will not be able to test to my (or anyone else's) satisfaction. It _may_ work by coincidence. I can and will test the time-limited cache solution, if we choose to take this route. So, it comes down to this. Do we want to implement an inherently unreliable, untested, and untestable (by me) solution for a use case that never comes up in our non-APLAWS product and only occasionally comes up in APLAWS? Or do we want to implement a much simpler, easily testable (and therefore vastly more reliable) solution, whose only drawback is the potential to cause 15 minutes of confusion per install per year, and which can be remedied by documenting the (rarely encountered) confusing edge case? > Number one, I don't think our QA and Scaling currently > test multi-node configurations. If true, then this is a severe shortcoming of QA / Scaling tests which must be rectified asap, since every single production deployment of CCM runs in a multi-node configuration. > Number two, SiteNode has been > deprecated for some time now. I don't think we have any apps where > site nodes are mounted and remounted at runtime. Therefore, even in > production systems that do use multi-node setups, the cache coherence > code is never actually exercised. SiteNodes are still used & are created/destroyed/changed at runtime - the Application backs its urls with site node instances. The cache table code in the site node cache is exercised on created, updating and deletion of site nodes. Both APLAWS portals & PortalServer have the capability to create Applications & thus implicitly are creating SiteNodes, in turn exercising the CacheTable code. The cache table is also used in the CMS dispatcher: dan@camden$ find cms/src -name '*.java' | xargs grep -l CacheTable cms/src/com/arsdigita/cms/ContentSectionServlet.java cms/src/com/arsdigita/cms/dispatcher/ContentItemDispatcher.java dan@camden$ Marking as QA_READY as of change 39457, based on the following. The ticket requirements are: 1. > There does not appear to be any invalidation of this cache when > the URL on which an application is mounted is changed. [meaning: cache invalidation needs to be implemented] 2. > This invalidation will need to be done across JVM instances in a > cluster too. Based on discussion that happened partially in the above posts and partly elsewhere, it was decided that for consistency with the existing product, BaseDispatcher's caching mechanism needed to be implemented in the same way that SiteNode did it, i.e. via the CacheTable class. While working on this ticket, I discovered the following things: (a) The SiteNode implementation did not do any cache invalidation either. However, based on Dan's feedback, this has not caused any problems in production to date. The word on the street is that the current (i.e. pre-39457) implementation works fine. (b) The CacheTable thingy is not as bad as I originally thought. First and foremost, it implements a size-limited, time-limited cache with the LRU eviction policy. The current default age limit is 300 seconds (that is, five minutes). Secondly, CacheTable tries to be smart and invalidate peer cache tables as early as possible via HTTP "messaging". Whether or not this early invalidation actually works under real load is largely irrelevant, because local LRU invalidation due to size and age limits appears to happen quite reliably. So, basically, CacheTable already implements the simple and reliable approach of forcible periodic cache eviction that I was advocating in comment #1. The fact that it also tries to add a half-assed messaging mechanism on top of the reliable LRU policy seems to do no harm. I changed SiteNode's (and BaseDispatcher's) implementation to invalidate the underlying cache table in its entirety, whenever URL mappings change. This takes care of requirement #1. The CacheTable's LRU policy comes as close as practically feasible to satisfying requirement #2. In any case, the current implementation is no worse than the old one in when it comes to #2. P.S. A fairly massive cleanup of the CacheTable and its friends happened as part of this ticket. For details, see $ p4 changes //core-platform/dev/src/com/arsdigita/caching/...@39293,39457 As a follow-up to comment #3 > Whether or not this early invalidation actually works under real > load is largely irrelevant, Jon Orris reports in bug 115066 that early invalidation works incorrectly. |