Bug 1073201 - Agent does a discovery loop if the inventory contains an unknown resource
Summary: Agent does a discovery loop if the inventory contains an unknown resource
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: GA
: RHQ 4.11
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1170525
TreeView+ depends on / blocked
 
Reported: 2014-03-06 01:56 UTC by Elias Ross
Modified: 2015-01-15 23:21 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-07-21 10:13:24 UTC
Embargoed:


Attachments (Terms of Use)
Patch for master (16.17 KB, patch)
2014-03-11 01:40 UTC, Elias Ross
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1077634 0 urgent CLOSED Failed to update connection settings 2021-02-22 00:41:40 UTC

Internal Links: 1077634

Description Elias Ross 2014-03-06 01:56:31 UTC
Description of problem:

2014-03-06 01:50:03,912 INFO  [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 104185
2014-03-06 01:50:05,555 INFO  [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 104185
2014-03-06 01:50:09,911 INFO  [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 104185
2014-03-06 01:50:11,546 INFO  [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 104185

I accidentally disabled the Apache HTTP plugin and the agent still had this resource in inventory. What seems to happen is 'unknown resource' is shown, then full inventory discovery happens in a loop.

Version-Release number of selected component (if applicable): 4.9


How reproducible: Always (not tested)


Steps to Reproduce:
1. Add a resource to inventory
2. Disable its corresponding plugin
3. Restart the agent and see this

Actual results:

Discovery looping

Expected results:

Discovery only happens somewhat frequently, not in a loop. Full inventory scan also should be avoided, even if this happens.

Additional info:

Comment 1 Elias Ross 2014-03-06 21:52:15 UTC
More logs:

2014-03-06 21:48:41,953 INFO  [InventoryManager.discovery-1] (InventoryManager)- Sending [runtime] inventory report to Server...
2014-03-06 21:48:42,911 INFO  [InventoryManager.discovery-1] (InventoryManager)- Syncing local inventory with Server inventory...
2014-03-06 21:48:42,912 INFO  [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 109201
2014-03-06 21:48:42,938 INFO  [InventoryManager.discovery-1] (RuntimeDiscoveryExecutor)- Executing runtime discovery scan rooted at [platform]...
....
2014-03-06 21:48:43,357 INFO  [InventoryManager.discovery-1] (RuntimeDiscoveryExecutor)- Scanned platform and 0 server(s) and discovered 0 new descendant Resource(s).
2014-03-06 21:48:43,357 INFO  [InventoryManager.discovery-1] (InventoryManager)- Sending [runtime] inventory report to Server...
2014-03-06 21:48:43,409 INFO  [InventoryManager.discovery-1] (InventoryManager)- Syncing local inventory with Server inventory...
2014-03-06 21:48:43,410 INFO  [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 109201

You can see it happens about 2 seconds apart.

Patch pending.

Comment 2 Heiko W. Rupp 2014-03-07 08:41:47 UTC
I  have seen the same - and here the "bad resource" was actually sitting in the discovery queue and thus in state NEW.

Comment 3 Elias Ross 2014-03-11 01:40:50 UTC
Created attachment 872924 [details]
Patch for master

Comment 4 Jay Shaughnessy 2014-03-13 19:43:51 UTC
Assigning to myself, will review the patch, thanks.

Comment 5 Jay Shaughnessy 2014-03-16 04:53:49 UTC
master commit cdc471aee9fd89f9a5226a19f92e0bdfb0a11f3a
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Mar 14 16:19:24 2014 -0400

 It's possible to disable a plugin on an agent after resources of that plugin's
 types are already in server-side inventory.  The server will report those
 resources to the agent during an inventory sync.  The agent will treat them
 as "unknown" resources because those resources will not have containers.

 This fixes the handling for unknown resources with disabled resource types.
 It ensures they are not merged into agent-side inventory and also do not
 trigger further discovery scans.  Additionally, it now generates a
 ResourceError for the server-side resource to help notify the user that the
 resource is no longer being managed, and should be uninventoried.  Only
 uninventory will stop the inventory sync overhead.


master commit 72550f24284c25f467107b920f94770041dae117
Author: Jay Shaughnessy <jshaughn>
Date:   Sun Mar 16 00:50:03 2014 -0400

 Patch supplied by <elias_ross>.  The patch worked around the issue
 although in a prior commit I put in a fix for the core issue.  But the patch
 is also useful.  I'm applying parts of it, manually, as applying it verbatim
 was not applicable after my initial change.

  Thanks Elias!

 -----------------------------------
 Original Patch Comment:

 The main fix is keeping a reference to the scheduled service rescan Future and
 prevent a scan from being scheduled again before execution. This also has
 executeServiceScanDeferred() work the same way.

 The other related fixes are for:
 1) Only do availability checking for synched/merged/deleted resources, not a
 full scan.  As we have the references for this, it seems worthwhile to prevents
 lots of scans (and overloading the server) if an unknown resource shows up.
 2) Concurrency. Setters/getters should be synchronized if accessed across
    Threads.
 3) Use 0 for scan time, as we can then avoid getting system time.
 4) In cases Executor.submit(Callable) is used and Future isn't needed, use
    Runnable instead.
 -----------------------------------

 A couple of modifications:
 - set availabilityCheck time to 1, as opposed to 0, because 0 indicates
   an initialized state and does not guarantee an avail check is perfomed.
 - when requesting an avail check for an unknown resource, make it
   recursive so the unknown children also get checked.

 Additionally, now when updating plugin config, root the ensuing discovery at

Comment 6 Jay Shaughnessy 2014-03-18 01:38:18 UTC
commit 65e78e6280ca54db1e659c47f6f77a603559ce42
Author: Jay Shaughnessy <jshaughn>
Date:   Mon Mar 17 17:44:07 2014 -0400

revisions due to test failures
- Go back to providing a full avail report if there are inventory sync
  changes.  This isn't really that inefficient agent-side because it's not
  a full scan, but rather a full report.  We still only check avail for
  those resources that have not yet provided avail, and the regularly
  scheduled checks.
- Don't just skip a service scan if one is in progress. We should scan again
  from the top to guarantee nothing gets missed.  So, instead, cancel the
  current scan and add interrupt logic such that it reports what it has
  discovered to that point. And then start a new scan.
- update DiscoveryTest to explicitly wait for discovery to complete, the
  changes seem to have changed the dynamics of the test.

These changes still need to be monitored to ensure things are behaving...

Comment 7 Heiko W. Rupp 2014-07-21 10:13:24 UTC
Bulk closing of RHQ 4.11 issues, now that RHQ 4.12 is out.

If you find an issue with those, please open a new BZ, linking to the old one.


Note You need to log in before you can comment on or make changes to this bug.