Description of problem: 2014-03-06 01:50:03,912 INFO [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 104185 2014-03-06 01:50:05,555 INFO [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 104185 2014-03-06 01:50:09,911 INFO [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 104185 2014-03-06 01:50:11,546 INFO [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 104185 I accidentally disabled the Apache HTTP plugin and the agent still had this resource in inventory. What seems to happen is 'unknown resource' is shown, then full inventory discovery happens in a loop. Version-Release number of selected component (if applicable): 4.9 How reproducible: Always (not tested) Steps to Reproduce: 1. Add a resource to inventory 2. Disable its corresponding plugin 3. Restart the agent and see this Actual results: Discovery looping Expected results: Discovery only happens somewhat frequently, not in a loop. Full inventory scan also should be avoided, even if this happens. Additional info:
More logs: 2014-03-06 21:48:41,953 INFO [InventoryManager.discovery-1] (InventoryManager)- Sending [runtime] inventory report to Server... 2014-03-06 21:48:42,911 INFO [InventoryManager.discovery-1] (InventoryManager)- Syncing local inventory with Server inventory... 2014-03-06 21:48:42,912 INFO [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 109201 2014-03-06 21:48:42,938 INFO [InventoryManager.discovery-1] (RuntimeDiscoveryExecutor)- Executing runtime discovery scan rooted at [platform]... .... 2014-03-06 21:48:43,357 INFO [InventoryManager.discovery-1] (RuntimeDiscoveryExecutor)- Scanned platform and 0 server(s) and discovered 0 new descendant Resource(s). 2014-03-06 21:48:43,357 INFO [InventoryManager.discovery-1] (InventoryManager)- Sending [runtime] inventory report to Server... 2014-03-06 21:48:43,409 INFO [InventoryManager.discovery-1] (InventoryManager)- Syncing local inventory with Server inventory... 2014-03-06 21:48:43,410 INFO [InventoryManager.discovery-1] (InventoryManager)- Got unknown resource: 109201 You can see it happens about 2 seconds apart. Patch pending.
I have seen the same - and here the "bad resource" was actually sitting in the discovery queue and thus in state NEW.
Created attachment 872924 [details] Patch for master
Assigning to myself, will review the patch, thanks.
master commit cdc471aee9fd89f9a5226a19f92e0bdfb0a11f3a Author: Jay Shaughnessy <jshaughn> Date: Fri Mar 14 16:19:24 2014 -0400 It's possible to disable a plugin on an agent after resources of that plugin's types are already in server-side inventory. The server will report those resources to the agent during an inventory sync. The agent will treat them as "unknown" resources because those resources will not have containers. This fixes the handling for unknown resources with disabled resource types. It ensures they are not merged into agent-side inventory and also do not trigger further discovery scans. Additionally, it now generates a ResourceError for the server-side resource to help notify the user that the resource is no longer being managed, and should be uninventoried. Only uninventory will stop the inventory sync overhead. master commit 72550f24284c25f467107b920f94770041dae117 Author: Jay Shaughnessy <jshaughn> Date: Sun Mar 16 00:50:03 2014 -0400 Patch supplied by <elias_ross>. The patch worked around the issue although in a prior commit I put in a fix for the core issue. But the patch is also useful. I'm applying parts of it, manually, as applying it verbatim was not applicable after my initial change. Thanks Elias! ----------------------------------- Original Patch Comment: The main fix is keeping a reference to the scheduled service rescan Future and prevent a scan from being scheduled again before execution. This also has executeServiceScanDeferred() work the same way. The other related fixes are for: 1) Only do availability checking for synched/merged/deleted resources, not a full scan. As we have the references for this, it seems worthwhile to prevents lots of scans (and overloading the server) if an unknown resource shows up. 2) Concurrency. Setters/getters should be synchronized if accessed across Threads. 3) Use 0 for scan time, as we can then avoid getting system time. 4) In cases Executor.submit(Callable) is used and Future isn't needed, use Runnable instead. ----------------------------------- A couple of modifications: - set availabilityCheck time to 1, as opposed to 0, because 0 indicates an initialized state and does not guarantee an avail check is perfomed. - when requesting an avail check for an unknown resource, make it recursive so the unknown children also get checked. Additionally, now when updating plugin config, root the ensuing discovery at
commit 65e78e6280ca54db1e659c47f6f77a603559ce42 Author: Jay Shaughnessy <jshaughn> Date: Mon Mar 17 17:44:07 2014 -0400 revisions due to test failures - Go back to providing a full avail report if there are inventory sync changes. This isn't really that inefficient agent-side because it's not a full scan, but rather a full report. We still only check avail for those resources that have not yet provided avail, and the regularly scheduled checks. - Don't just skip a service scan if one is in progress. We should scan again from the top to guarantee nothing gets missed. So, instead, cancel the current scan and add interrupt logic such that it reports what it has discovered to that point. And then start a new scan. - update DiscoveryTest to explicitly wait for discovery to complete, the changes seem to have changed the dynamics of the test. These changes still need to be monitored to ensure things are behaving...
Bulk closing of RHQ 4.11 issues, now that RHQ 4.12 is out. If you find an issue with those, please open a new BZ, linking to the old one.