Bug 756100

Summary:	RFE: use timestamp and file size during drift detection scans
Product:	[Other] RHQ Project	Reporter:	John Sanda <jsanda>
Component:	drift	Assignee:	Jay Shaughnessy <jshaughn>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Mike Foley <mfoley>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	4.2	CC:	jshaughn
Target Milestone:	---	Keywords:	FutureFeature, Improvement
Target Release:	RHQ 4.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.3	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-08-31 10:16:35 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	707225, 760116

Description John Sanda 2011-11-22 17:14:06 UTC

Description of problem:
Currently drift detection is always done by generating and comparing SHAs of files against recorded SHAs in a snapshot file. Calculating a digest sum is a CPU-intensive operation. The time required to calculate the digest is relative to file size and density. Always doing the digest calculation could significantly increase the agent's footprint in terms of CPU utilization as well as IO overhead.

One fairly safe way to avoid always doing the digest calculation but also still detecting changes is to look at file timestamps and sizes. If neither the file's timestamp nor its size has changed, we can skip recalculating the digest. 

We still need calculate and store SHAs when generating the initial snapshot or any time the file timestamp of size might not otherwise be available. This enhancement could significantly reduce the time for drift detection runs and the overall agent footprint.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jay Shaughnessy 2011-12-15 19:07:58 UTC

master commit e60b0694356357d6c73b04128606afe6d75e3041

Now using timestamp and filesize info to avoid SHA digest
generation when possible.  Jsanda added most of the support, working
the new info into the changeset files. Jshaughn added support for 
handling situations where the current changeset is supplied by the
server, due to pinning or agent sync. When supplied by the server no
timestamp info is available, so the non-timestamped changesets
must be replaced with timestamped versions as soon as possible.

Test Notes
Although the benefit is performance oriented and not testable in
a standard fashion, there are still many scenarios that can be exercised
to ensure that the added support does not generate problems in the
various workflows.  In particular exercise non-pinned definitions and
multiple snapshots with all varieties of drift (including, if possible
on the OS, setting monitored files non-readable by the agent process.
These files should be treated like removed files but have some different
internal code paths). Also, exercise pinned defs, moving then in and out
of compliance with all types of drift.  And also, starting the agent
--clean (or --purgedata) such that drift sync must be executed.

Comment 2 Mike Foley 2011-12-16 14:25:26 UTC

Documenting verification with Drift TCMS test case execution runs 

https://tcms.engineering.redhat.com/plan/4174/#testruns

Comment 3 Heiko W. Rupp 2013-08-31 10:16:35 UTC

Bulk close of old bugs in VERIFIED state.