Bug 704551 - RFE: [apache] Improve the performance of discovery by caching the parsed config files
Summary: RFE: [apache] Improve the performance of discovery by caching the parsed conf...
Keywords:
Status: NEW
Alias: None
Product: RHQ Project
Classification: Other
Component: Plugins
Version: 4.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-13 14:55 UTC by Lukas Krejci
Modified: 2022-03-31 04:28 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Lukas Krejci 2011-05-13 14:55:26 UTC
Description of problem:

Currently, each server discovery as well as each vhost discovery parses the whole set of configuration files into an "AST". This is quite expensive operation and would benefit from caching.

We already have a "ConfigurationTimestamp" class that can be used to check whether the configuration changed or not based on timestamps of the files.

Having a static cache that could be reused by all discoveries, configuration reads, etc. would improve the performance of the plugin.

This can become an issue with large environments with 10s of apache instances per server.

Comment 1 Charles Crouch 2011-05-13 17:36:59 UTC
I'm concerned about the impact this would have on the accuracy and complexity of our apache discovery, cached data is just a place for data to go stale :-/. 
I'm -1 on this without some specific data points highlighting the problem. Also vhost (service) discovery should not be happening very frequently anyway right?

Comment 2 Lukas Krejci 2011-05-16 08:57:33 UTC
I had a test of 40 apache instances each with 4 vhosts. The discovery spiked the CPU at something like 70-90% because the configuration files would be read and parsed 200 times (40 server discoveries + 160 vhost discoveries). Moreover, if the ServerRoot directive is present in the config files (which it is 99% of the time) then the server discovery has to be done twice (once with the apache-binary-hardcoded ServerRoot and the second time with the server root actually detected).

The ConfigurationTimestamp class was put in place to prevent the data going stale during metric collection, because during that we need to read the configuration files as well. Basically we remember the timestamps of all the configuration files from the last time we read them. Until any of those timestamps changes, there is no need to re-read the configuration files.

Obviously doing an md5 would be more robust than checking last modified times because those can be manually modified but my aim was to lower the computational complexity as much as possible while being reasonably sure that the data isn't stale.


Note You need to log in before you can comment on or make changes to this bug.