Bug 704551

Summary: RFE: [apache] Improve the performance of discovery by caching the parsed config files
Product: [Other] RHQ Project Reporter: Lukas Krejci <lkrejci>
Component: PluginsAssignee: Nobody <nobody>
Status: NEW --- QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1CC: hrupp
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukas Krejci 2011-05-13 14:55:26 UTC
Description of problem:

Currently, each server discovery as well as each vhost discovery parses the whole set of configuration files into an "AST". This is quite expensive operation and would benefit from caching.

We already have a "ConfigurationTimestamp" class that can be used to check whether the configuration changed or not based on timestamps of the files.

Having a static cache that could be reused by all discoveries, configuration reads, etc. would improve the performance of the plugin.

This can become an issue with large environments with 10s of apache instances per server.

Comment 1 Charles Crouch 2011-05-13 17:36:59 UTC
I'm concerned about the impact this would have on the accuracy and complexity of our apache discovery, cached data is just a place for data to go stale :-/. 
I'm -1 on this without some specific data points highlighting the problem. Also vhost (service) discovery should not be happening very frequently anyway right?

Comment 2 Lukas Krejci 2011-05-16 08:57:33 UTC
I had a test of 40 apache instances each with 4 vhosts. The discovery spiked the CPU at something like 70-90% because the configuration files would be read and parsed 200 times (40 server discoveries + 160 vhost discoveries). Moreover, if the ServerRoot directive is present in the config files (which it is 99% of the time) then the server discovery has to be done twice (once with the apache-binary-hardcoded ServerRoot and the second time with the server root actually detected).

The ConfigurationTimestamp class was put in place to prevent the data going stale during metric collection, because during that we need to read the configuration files as well. Basically we remember the timestamps of all the configuration files from the last time we read them. Until any of those timestamps changes, there is no need to re-read the configuration files.

Obviously doing an md5 would be more robust than checking last modified times because those can be manually modified but my aim was to lower the computational complexity as much as possible while being reasonably sure that the data isn't stale.