Bug 756100 - RFE: use timestamp and file size during drift detection scans
Summary: RFE: use timestamp and file size during drift detection scans
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: drift
Version: 4.2
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: RHQ 4.3.0
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 707225 jon30-sprint10, rhq43-sprint10
TreeView+ depends on / blocked
 
Reported: 2011-11-22 17:14 UTC by John Sanda
Modified: 2013-08-31 10:16 UTC (History)
1 user (show)

Fixed In Version: 4.3
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-31 10:16:35 UTC
Embargoed:


Attachments (Terms of Use)

Description John Sanda 2011-11-22 17:14:06 UTC
Description of problem:
Currently drift detection is always done by generating and comparing SHAs of files against recorded SHAs in a snapshot file. Calculating a digest sum is a CPU-intensive operation. The time required to calculate the digest is relative to file size and density. Always doing the digest calculation could significantly increase the agent's footprint in terms of CPU utilization as well as IO overhead.

One fairly safe way to avoid always doing the digest calculation but also still detecting changes is to look at file timestamps and sizes. If neither the file's timestamp nor its size has changed, we can skip recalculating the digest. 

We still need calculate and store SHAs when generating the initial snapshot or any time the file timestamp of size might not otherwise be available. This enhancement could significantly reduce the time for drift detection runs and the overall agent footprint.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jay Shaughnessy 2011-12-15 19:07:58 UTC
master commit e60b0694356357d6c73b04128606afe6d75e3041

Now using timestamp and filesize info to avoid SHA digest
generation when possible.  Jsanda added most of the support, working
the new info into the changeset files. Jshaughn added support for 
handling situations where the current changeset is supplied by the
server, due to pinning or agent sync. When supplied by the server no
timestamp info is available, so the non-timestamped changesets
must be replaced with timestamped versions as soon as possible.

Test Notes
Although the benefit is performance oriented and not testable in
a standard fashion, there are still many scenarios that can be exercised
to ensure that the added support does not generate problems in the
various workflows.  In particular exercise non-pinned definitions and
multiple snapshots with all varieties of drift (including, if possible
on the OS, setting monitored files non-readable by the agent process.
These files should be treated like removed files but have some different
internal code paths). Also, exercise pinned defs, moving then in and out
of compliance with all types of drift.  And also, starting the agent
--clean (or --purgedata) such that drift sync must be executed.

Comment 2 Mike Foley 2011-12-16 14:25:26 UTC
Documenting verification with Drift TCMS test case execution runs 

https://tcms.engineering.redhat.com/plan/4174/#testruns

Comment 3 Heiko W. Rupp 2013-08-31 10:16:35 UTC
Bulk close of old bugs in VERIFIED state.


Note You need to log in before you can comment on or make changes to this bug.