Bug 1015734

Summary: RuntimeDiscoveryExecutor can execute discovery scans in a loop
Product: [Other] RHQ Project Reporter: Elias Ross <genman>
Component: AgentAssignee: Nobody <nobody>
Status: NEW --- QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.9CC: bkramer, hrupp
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elias Ross 2013-10-05 03:27:02 UTC
Description of problem:

After doing a server upgrade, I experienced a looping issue where some agents (not all) were doing inventory syncing fairly continuously. The root cause isn't clear, but the following was observed:

2013-10-05 01:33:26,275 INFO  [InventoryManager.discovery-1] (InventoryManager)- Syncing local inventory with Server inventory...
2013-10-05 01:33:26,279 INFO  [InventoryManager.discovery-1] (RuntimeDiscoveryExecutor)- Executing runtime discovery scan rooted at [platform]...
...
2013-10-05 01:33:27,333 INFO  [InventoryManager.discovery-1] (RuntimeDiscoveryExecutor)- Scanned platform and 0 server(s) and discovered 0 new descendant Resource(s).
2013-10-05 01:33:27,333 INFO  [InventoryManager.discovery-1] (InventoryManager)- Sending [runtime] inventory report to Server...
2013-10-05 01:33:27,860 INFO  [InventoryManager.discovery-1] (InventoryManager)- Syncing local inventory with Server inventory...

As you can see about 1 second later the same sync occurred again. This appears to be a loop for the following.

InventoryManager.
    private void synchInventory(ResourceSyncInfo syncInfo, boolean partialInventory) {

calls 
    public boolean handleReport(InventoryReport report) {

calls
    synchInventory(...) in a new thread... by

                this.inventoryThreadPoolExecutor.schedule((Callable<? extends Object>) this.serviceScanExecutor,
                    configuration.getChildResourceDiscoveryDelay(), TimeUnit.SECONDS);

^^ the delay here is only 5 seconds...

and the cycle repeats itself.

I added additional pooled EJB instances as increased the communication setting concurrency and this did *not* fix the problem.

The problem fixed itself when I stopped most of the other agents.


Version-Release number of selected component (if applicable): 4.9


How reproducible: 

Steps to Reproduce:
1. In cases where the server appears busy. I'm guessing there's just not enough time for the proper sync to happen. 
2. There may be changes in resources/plugins before/after upgrade.
3.

Additional info:

Comment 1 Elias Ross 2013-10-06 20:33:58 UTC
This approach may fix the symptom but probably not the cause.

diff --git a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/InventoryManager.java b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/InventoryManager.java
index 2e4d52a..d2b1604 100644
--- a/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/InventoryManager.java
+++ b/modules/core/plugin-container/src/main/java/org/rhq/core/pc/inventory/InventoryManager.java
@@ -41,6 +41,7 @@
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.Executor;
 import java.util.concurrent.Future;
+import java.util.concurrent.ScheduledFuture;
 import java.util.concurrent.ScheduledThreadPoolExecutor;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
@@ -216,6 +217,11 @@
      */
     private ResourceUpgradeDelegate resourceUpgradeDelegate = new ResourceUpgradeDelegate(this);
 
+    /**
+     * Prevent service scans from looping.
+     */
+    private volatile ScheduledFuture<? extends Object> serviceScan;
+
     public InventoryManager() {
         super(DiscoveryAgentService.class);
     }
@@ -1271,8 +1277,11 @@ private void synchInventory(ResourceSyncInfo syncInfo, boolean partialInventory)
                 // requestAvailabilityCheck on each unknown or modified resource.
                 requestFullAvailabilityReport();
 
-                this.inventoryThreadPoolExecutor.schedule((Callable<? extends Object>) this.serviceScanExecutor,
-                    configuration.getChildResourceDiscoveryDelay(), TimeUnit.SECONDS);
+                // Don't schedule yet another scan we already did so
+                if (serviceScan != null && !serviceScan.isDone()) {
+                    this.serviceScan = this.inventoryThreadPoolExecutor.schedule((Callable<? extends Object>) this.serviceScanExecutor,
+                        configuration.getChildResourceDiscoveryDelay(), TimeUnit.SECONDS);
+                }
             }
         } catch (Throwable t) {
             log.warn("Failed to synchronize local inventory with Server inventory for Resource [" + syncInfo.getId()