Bug 535283 - (RHQ-1996) fix mtime-based inventory sync logic
fix mtime-based inventory sync logic
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Inventory (Show other bugs)
1.3
All All
urgent Severity medium (vote)
: ---
: ---
Assigned To: Ian Springer
Jeff Weiss
http://jira.rhq-project.org/browse/RH...
: CodeChange
Depends On: RHQ-2246
Blocks: JON231
  Show dependency treegraph
 
Reported: 2009-04-21 17:55 EDT by Joseph Marques
Modified: 2014-11-09 17:49 EST (History)
5 users (show)

See Also:
Fixed In Version: 2.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-08-12 12:52:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joseph Marques 2009-04-21 17:55:00 EDT
once upon a time measurement schedules used to require that an agent be online to receive those changes, otherwise the flow would fail.  this might not have seemed so bad for updating schedules against a single resource, but we also support updating schedules for compatible groups, auto groups, and across resource types (measurement templates).  so, this was corrected in RHQ-792, which allowed the agent to be down but still receive the updates.

the fix for RHQ-792 piggybacked on the logic that the agent uses for determining which resources it needs to merge, namely the mtime.  if the last modified time of the resource on the server is LATER in time than the one stored on the agent, then something about that resource changed and it will be synced...and one of the items in the synchronization payload are the measurement schedules for that resource.

as part of general performance fixes for the 1.2 release, it was determined that the mtime on the server was being reset too often.  this was because we were using a hibernate @OnUpdate persistence hook to update the mtime field to the current time each time the resource was merged through the entityManager.  however, the agent doesn't care about every single update made to a resource, related resource object, or misuse of entityManager.merge(Resource) calls.  instead, it only cares about whether the basic properties on a resource change, whether its corresponding measurement schedules have been updated, or whether it's plugin configuration has been updated.

to remedy this, the @OnUpdate hook was removed and "System.currentTimeMillis()" was placed into the setter methods for the properties of concern.  the mtime was also manually set in the code paths that update the measurement schedules (note: mtime resetting is not necessary at the time of this writing because plugin configuration updates are synchronous).

unfortunately, this solution was ignoring one very evident issue - the domain entities are shared between the agent and server.  so, although the side effect of resetting the mtime when certain setter methods were called on the resource in the server-side was correct, it posed problems when the mtime was reset on the agent side.  remember, we just mentioned that differing mtimes is the primary qualification that determines whether a resource needs to be synced / merged between agent and server.

consequently, the mtime was being reset too often, which ended up being the root cause of RHQ-1990.  although this was fixed with a one-off solution (namely resetting the mtime back down to 0 right at the end of the manual add workflow on the agent side) it's possible this mtime thing could come back to bite us later.  accordingly, the proper fix here is to remove the side-effect of updating the mtime in the setter methods of Resource entity, and instead do it explicitly in the server-side code paths that update the resource.
Comment 1 Joseph Marques 2009-07-10 11:29:44 EDT
not enough time left in this release to address this with an appropriate amount of QA to give us confidence that we haven't regressed...and inventory synchronization mechanisms are too important to just brush over.  let's try to get this into the next release.
Comment 2 Charles Crouch 2009-09-18 15:58:32 EDT
Automated tests as described in RHQ-792 should be added to confirm this fix.
Comment 3 Ian Springer 2009-10-19 16:41:14 EDT
REPRO STEPS
=============
Use the following procedure:

1) Inventory only the platform.
2) execute "inventory --xml --export=export-1.xml" on the agent command line
3) shut down the agent
4) Go to Administration>System Configuration>Templates
5) Edit Metric Templates for CPU
6) check "Update schedules for existing resources", set collection interval to 10 minutes for "Idle" metrics (this will enable that metric as well)
7) Check that this metric got updated for the individual CPU resources in their Monitor>Schedule sub tabs
8) start up the agent
9) wait 30s (shouldn't be necessary)
10) execute "inventory --xml --export=export-2.xml" on the agent commandline
11) Locate the <schedule> element with name CpuPerc.idle for the same CPU resource in both export-1.xml and export-2.xml

Expected results:

The schedule in export-1.xml should be disabled and the interval should be 1200000.
The schedule in export-2.xml should be enabled and interval should be 600000.

Actual results (prior to this bug being fixed):

The schedule looks the same in both files. 
Comment 4 Ian Springer 2009-10-19 17:28:37 EDT
Fixed by updating the mtime externally from SLSB methods, rather than inside Resource entity's setters (git commit e22d84ed25a635fc2cbc416bdee87dc340166749). Tested fix using the above repro steps. Also tested that name/description/location updates done while the Agent is offline get synced when the Agent comes back online.

Comment 5 Ian Springer 2009-10-19 18:26:01 EDT
Fixed in RHQ_1_2_0_GA_PERF branch - r5255.
Comment 6 Charles Crouch 2009-10-20 00:00:33 EDT
Ian, can you merge this into 1.3.1 too. Thanks
Comment 7 Charles Crouch 2009-10-20 00:27:27 EDT
Associated case: 344497
Comment 8 Ian Springer 2009-10-20 10:11:46 EDT
Fixed in 1.3 CP branch - r5256. Charles, is that what you meant by 1.3.1?
Comment 9 Ian Springer 2009-10-20 13:35:54 EDT
r5258 - When updating metric schedules, if pushing the updated schedules to the Agents fails, throw an Exception, rather than just touching the mtime on the corresponding Resources; ***NOTE*** THIS IS A SPECIAL FIX FOR THIS BRANCH ONLY AND SHOULD NOT BE MERGED INTO TRUNK!

git commit for fix to trunk: e22d84ed25a635fc2cbc416bdee87dc340166749

Comment 10 Ian Springer 2009-10-21 13:09:56 EDT
It turns out the mtime also needs to be set on Resources when inventoryStatus is updated, otherwise inventory statuses never get synced to Agents... This is fixed by the following commits:

master - commit fdd23ec2bb428e6bde53e70d9b31c6ae22465da0	
1.3 CP branch - r5262
1.2 PERF branch - r5261
Comment 11 Red Hat Bugzilla 2009-11-10 15:55:41 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1996
This bug relates to RHQ-792
Comment 12 Jay Shaughnessy 2009-12-14 14:22:52 EST
The backward patching was incomplete in the RHQ_1_3_0_GA_CP branch.  This has been resolved in r5269.
Comment 14 Corey Welton 2010-01-22 19:09:25 EST
qa -> gneelaka
Comment 15 Jeff Weiss 2010-02-01 12:33:22 EST
This has the same repro steps as 
https://bugzilla.redhat.com/show_bug.cgi?id=535564

QA Verified.
Comment 16 Corey Welton 2010-08-12 12:52:41 EDT
Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.