Bug 535283 (RHQ-1996) - fix mtime-based inventory sync logic
Summary: fix mtime-based inventory sync logic
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: RHQ-1996
Product: RHQ Project
Classification: Other
Component: Inventory
Version: 1.3
Hardware: All
OS: All
urgent
medium
Target Milestone: ---
: ---
Assignee: Ian Springer
QA Contact: Jeff Weiss
URL: http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On: RHQ-2246
Blocks: JON231
TreeView+ depends on / blocked
 
Reported: 2009-04-21 21:55 UTC by Joseph Marques
Modified: 2018-10-27 16:10 UTC (History)
5 users (show)

Fixed In Version: 2.4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-08-12 16:52:41 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 772771 0 urgent CLOSED Agent not syncing updated plugin config at startup 2021-02-22 00:41:40 UTC

Internal Links: 772771

Description Joseph Marques 2009-04-21 21:55:00 UTC
once upon a time measurement schedules used to require that an agent be online to receive those changes, otherwise the flow would fail.  this might not have seemed so bad for updating schedules against a single resource, but we also support updating schedules for compatible groups, auto groups, and across resource types (measurement templates).  so, this was corrected in RHQ-792, which allowed the agent to be down but still receive the updates.

the fix for RHQ-792 piggybacked on the logic that the agent uses for determining which resources it needs to merge, namely the mtime.  if the last modified time of the resource on the server is LATER in time than the one stored on the agent, then something about that resource changed and it will be synced...and one of the items in the synchronization payload are the measurement schedules for that resource.

as part of general performance fixes for the 1.2 release, it was determined that the mtime on the server was being reset too often.  this was because we were using a hibernate @OnUpdate persistence hook to update the mtime field to the current time each time the resource was merged through the entityManager.  however, the agent doesn't care about every single update made to a resource, related resource object, or misuse of entityManager.merge(Resource) calls.  instead, it only cares about whether the basic properties on a resource change, whether its corresponding measurement schedules have been updated, or whether it's plugin configuration has been updated.

to remedy this, the @OnUpdate hook was removed and "System.currentTimeMillis()" was placed into the setter methods for the properties of concern.  the mtime was also manually set in the code paths that update the measurement schedules (note: mtime resetting is not necessary at the time of this writing because plugin configuration updates are synchronous).

unfortunately, this solution was ignoring one very evident issue - the domain entities are shared between the agent and server.  so, although the side effect of resetting the mtime when certain setter methods were called on the resource in the server-side was correct, it posed problems when the mtime was reset on the agent side.  remember, we just mentioned that differing mtimes is the primary qualification that determines whether a resource needs to be synced / merged between agent and server.

consequently, the mtime was being reset too often, which ended up being the root cause of RHQ-1990.  although this was fixed with a one-off solution (namely resetting the mtime back down to 0 right at the end of the manual add workflow on the agent side) it's possible this mtime thing could come back to bite us later.  accordingly, the proper fix here is to remove the side-effect of updating the mtime in the setter methods of Resource entity, and instead do it explicitly in the server-side code paths that update the resource.

Comment 1 Joseph Marques 2009-07-10 15:29:44 UTC
not enough time left in this release to address this with an appropriate amount of QA to give us confidence that we haven't regressed...and inventory synchronization mechanisms are too important to just brush over.  let's try to get this into the next release.

Comment 2 Charles Crouch 2009-09-18 19:58:32 UTC
Automated tests as described in RHQ-792 should be added to confirm this fix.

Comment 3 Ian Springer 2009-10-19 20:41:14 UTC
REPRO STEPS
=============
Use the following procedure:

1) Inventory only the platform.
2) execute "inventory --xml --export=export-1.xml" on the agent command line
3) shut down the agent
4) Go to Administration>System Configuration>Templates
5) Edit Metric Templates for CPU
6) check "Update schedules for existing resources", set collection interval to 10 minutes for "Idle" metrics (this will enable that metric as well)
7) Check that this metric got updated for the individual CPU resources in their Monitor>Schedule sub tabs
8) start up the agent
9) wait 30s (shouldn't be necessary)
10) execute "inventory --xml --export=export-2.xml" on the agent commandline
11) Locate the <schedule> element with name CpuPerc.idle for the same CPU resource in both export-1.xml and export-2.xml

Expected results:

The schedule in export-1.xml should be disabled and the interval should be 1200000.
The schedule in export-2.xml should be enabled and interval should be 600000.

Actual results (prior to this bug being fixed):

The schedule looks the same in both files. 


Comment 4 Ian Springer 2009-10-19 21:28:37 UTC
Fixed by updating the mtime externally from SLSB methods, rather than inside Resource entity's setters (git commit e22d84ed25a635fc2cbc416bdee87dc340166749). Tested fix using the above repro steps. Also tested that name/description/location updates done while the Agent is offline get synced when the Agent comes back online.



Comment 5 Ian Springer 2009-10-19 22:26:01 UTC
Fixed in RHQ_1_2_0_GA_PERF branch - r5255.


Comment 6 Charles Crouch 2009-10-20 04:00:33 UTC
Ian, can you merge this into 1.3.1 too. Thanks

Comment 7 Charles Crouch 2009-10-20 04:27:27 UTC
Associated case: 344497

Comment 8 Ian Springer 2009-10-20 14:11:46 UTC
Fixed in 1.3 CP branch - r5256. Charles, is that what you meant by 1.3.1?


Comment 9 Ian Springer 2009-10-20 17:35:54 UTC
r5258 - When updating metric schedules, if pushing the updated schedules to the Agents fails, throw an Exception, rather than just touching the mtime on the corresponding Resources; ***NOTE*** THIS IS A SPECIAL FIX FOR THIS BRANCH ONLY AND SHOULD NOT BE MERGED INTO TRUNK!

git commit for fix to trunk: e22d84ed25a635fc2cbc416bdee87dc340166749



Comment 10 Ian Springer 2009-10-21 17:09:56 UTC
It turns out the mtime also needs to be set on Resources when inventoryStatus is updated, otherwise inventory statuses never get synced to Agents... This is fixed by the following commits:

master - commit fdd23ec2bb428e6bde53e70d9b31c6ae22465da0	
1.3 CP branch - r5262
1.2 PERF branch - r5261


Comment 11 Red Hat Bugzilla 2009-11-10 20:55:41 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1996
This bug relates to RHQ-792


Comment 12 Jay Shaughnessy 2009-12-14 19:22:52 UTC
The backward patching was incomplete in the RHQ_1_3_0_GA_CP branch.  This has been resolved in r5269.

Comment 14 Corey Welton 2010-01-23 00:09:25 UTC
qa -> gneelaka

Comment 15 Jeff Weiss 2010-02-01 17:33:22 UTC
This has the same repro steps as 
https://bugzilla.redhat.com/show_bug.cgi?id=535564

QA Verified.

Comment 16 Corey Welton 2010-08-12 16:52:41 UTC
Mass-closure of verified bugs against JON.


Note You need to log in before you can comment on or make changes to this bug.