Bug 702440 - Generate synthetic agent deletes in cumin based on heartbeats and agent list [RFE]
Summary: Generate synthetic agent deletes in cumin based on heartbeats and agent list ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: cumin
Version: Development
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 2.0.1
: ---
Assignee: Trevor McKay
QA Contact: Jan Sarenik
URL:
Whiteboard:
Depends On:
Blocks: 723887 736494
TreeView+ depends on / blocked
 
Reported: 2011-05-05 16:51 UTC by Trevor McKay
Modified: 2012-03-01 11:47 UTC (History)
5 users (show)

Fixed In Version: cumin-0.1.4840-1
Doc Type: Bug Fix
Doc Text:
Cause Cumin did not have a way to recognize inactive agents in its database. Consequence Stopping agents while cumin was shutdown or configuring cumin to point to a different broker could cause objects from missing agents to display in the UI when cumin was restarted. These stale objects would never be deleted. Fix All dynamic data in cumin is deleted when cumin starts. Agents and objects are rediscovered as cumin runs. Result Cumin will only display data from active agents. Overall performance is not discernibly affected by deletion of dynamic data on startup.
Clone Of:
: 736494 (view as bug list)
Environment:
Last Closed: 2011-09-07 16:43:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 595774 0 high CLOSED Settle on a solution for transient object deletes 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2011:1249 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update 2011-09-07 16:40:45 UTC

Description Trevor McKay 2011-05-05 16:51:46 UTC
Description of problem:

This issue is closely related to several that have been addressed in the past but it is slightly different.

It is possible for stale data to collect in the cumin database that will never be deleted even if all agents are using stable ids.  This can happen whenever cumin sees a population of agents, cumin is shut down, and cumin is restarted in an environment where the agent population is not a superset of the population at shutdown time (because cumin is started pointing at a different broker, or configuration changes were made, or machines/agents went down unexpectedly, etc).

Cumin needs a mechanism to discover in its data references to agents which do not exist as of its start time.  This can be done in a background thread with a (configurable) timeout and run interval to garbage collect old agents and their objects.

How reproducible:

100%

Steps to Reproduce:
1.  Start cumin pointed at a broker, let it run for a while.
2.  Start one or more sesame agents pointed at the same broker
3.  Shut cumin down.
4.  Restart cumin pointed at a different broker, or restart cumin pointed at the same broker after shutting down some of the sesame agents.
5. Systems will be shown under the inventory tab that do not exist in the current environment.

Actual results:

Wait forever, those systems should never go away.

Expected results:

Cumin should detect stale entities.

Additional info:

If agents are restarted with the same ids as the original and then removed while cumin is running, the systems will disappear from the display.

Comment 1 Trevor McKay 2011-05-05 16:53:25 UTC
BZ595774 is related to this, and contains a bunch of links to other related BZs.

Comment 2 Trevor McKay 2011-05-05 17:00:27 UTC
Possible solution outline:

Add a table in the database which tracks agents and the last time a heartbeat was heard from that agent.  Periodically scan the table and delete agents which have not received a heartbeat in N seconds (first run of the thread needs to be offset from cumin start time to give agents a chance to "show up").  Also delete any objects associated with that agent as we do on agent delete or agent creation.

Comment 3 Trevor McKay 2011-06-10 17:19:15 UTC
We already delete all objects associated with an agent when we first see that agent after cumin starts up (restricted by bound classes).  This covers objects that we will see again, as well as objects that we would not have seen, associated with that agent.

We also delete objects when a broker tells us an agent went away.

The only group left is objects associated with agents that we will never see during a given session (if they show up late, we will delete their objects, noted above).  This is in fact the group that we are targeting in this BZ.

The union of these two sets is simply all objects of bound classes for a particular cumin-data instance.  So doesn't handling phantom data just resolve to deleting all objects of all bound classes when cumin-data starts?

I think yes.  Will try.

(sample data is not deleted except by the expiration thread, when it is 24 hours old)

Comment 4 Trevor McKay 2011-06-10 17:22:53 UTC
One additional note, we do not delete the Collector object when we see its agent created or deleted.  I don't understand why not, maybe some historical reason.  Especially with collector filtering turned off, I can't see that this is too much of a problem.

Comment 5 Trevor McKay 2011-06-13 14:35:42 UTC
Fixed in revision 4808.

All non-sample data associated with bound classes is deleted by cumin-data instances on startup.  User is presented with a friendly banner noting the absence of the collector until the collector object is seen.

No longer any need to delete data when an agent create is seen.  Objects are still deleted when an agent delete is seen.

User data is also preserved.

Comment 6 Jan Sarenik 2011-07-21 10:51:43 UTC
Verified with cumin-0.1.4878-1.el5

Comment 7 Trevor McKay 2011-07-25 15:46:04 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    Cumin did not have a way to recognize inactive agents in its database.  

Consequence
  Stopping agents while cumin was shutdown or configuring cumin to point to a different broker could cause objects from missing agents to display in the UI when cumin was restarted.  These stale objects would never be deleted.

Fix
    All dynamic data in cumin is deleted when cumin starts.  Agents and objects are rediscovered as cumin runs.

Result
    Cumin will only display data from active agents.  Overall performance is not discernibly affected by deletion of dynamic data on startup.

Comment 8 errata-xmlrpc 2011-09-07 16:43:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html


Note You need to log in before you can comment on or make changes to this bug.