Bug 760289

Summary: Excessive file scanning in drift detection when using includes filters
Product: [Other] RHQ Project Reporter: Jay Shaughnessy <jshaughn>
Component: driftAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2   
Target Milestone: ---   
Target Release: RHQ 4.3.0   
Hardware: All   
OS: All   
Fixed In Version: 4.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 786613 (view as bug list) Environment:
Last Closed: 2013-08-31 06:11:09 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 760116, 786613    

Description Jay Shaughnessy 2011-12-05 13:59:38 EST
I may be wrong, I'm still looking at this, but it looks like to me like
the drift detector processes all files, recursively, under the base
directory, looking for files that match the filters.  That doesn't mean
we digest them all but it does grab each file in order to see whether
it matches the filters.

This means that if you use a broad base directory, like c:/ on windows, and
your includes filters are subdir1 and subdir2, that we'll actually scan the
entire file system looking for files. It should, I would think, only look in
subdir1 and subdir2 directories recursively.

This can most likely hang up an agent.  Researching more now...
Comment 1 Jay Shaughnessy 2011-12-05 14:19:26 EST
Still looking at this but assuming it is as described above, the workaround
would be to delete the offending definition and create multiple more
specific definitions.  So, instead of:


use two defs not using includes filters:


Note, that you can always perform a pattern-based  filter on the basedir
using an includes filter using "." as the path.
Comment 2 Jay Shaughnessy 2011-12-05 16:05:02 EST
master commit 6f3d99d160c4910bffe16ada89b625fe251bea44

Now, when using includes file paths limit the directory scanning to only
those included directories.

Note that using a "." as an includes path basically translates to using
the base directory, in which case the scan will be as it is now.
A future enhancement may be to analyze the pattern and decide
whether a recursive scan is necessary.  Currently. So, using includes
patterns to just look for certain files in the base directory will
expose you to the full scan.

Test Notes:
This is not obvious to test as it's mainly a performance fix. But,
prior to the fix, creating a drift definition with a basedir of the
file system root, with an includes subdir, would take a very large 
period of time to complete for a sizeable file system, and a lot of
disk/cpu activity.  It should not complete very quickly assuming the
included subdir has a reasonable number of total files.
Comment 3 Mike Foley 2011-12-12 13:15:00 EST
verified by testing positive use-case around filters.  RHQ 3.  master
Comment 4 Heiko W. Rupp 2013-08-31 06:11:09 EDT
Bulk close of old bugs in VERIFIED state.