Bug 829350

Summary: Unable to configure the cadence of child discovery
Product: [Other] RHQ Project Reporter: Lukas Krejci <lkrejci>
Component: AgentAssignee: Lukas Krejci <lkrejci>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: high    
Version: unspecifiedCC: fbrychta, hrupp, jshaughn, mazz
Target Milestone: ---   
Target Release: JON 3.1.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 802550 Environment:
Last Closed: 2013-09-03 15:11:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 802550    
Bug Blocks: 782579    

Description Lukas Krejci 2012-06-06 14:23:24 UTC
+++ This bug was initially created as a clone of Bug #802550 +++

Description of problem:
The discovery of child resources is currently hardcoded to occur 5 seconds after the agent received the newly committed resources. This should be made configurable because it affects the load the plugin container generates during the initial import of inventories.

Version-Release number of selected component (if applicable):
4.4.0-SNAPSHOT

How reproducible:
always

Steps to Reproduce:
1. Configure the agent to have both server and service discovery periods set to an hour (3600s).
2. Watch the agent.log
3. Import a single server (with some child servers or services) into an otherwise empty platform
4. At the time you import the server in the UI, there should be a message in the agent.log stating:

Syncing local inventory with Server inventory...

5. 5 seconds after that message (but no sooner), there should be a message:

Executing runtime discovery scan rooted at [platform]

Actual results:
No way of influencing the 5 seconds interval

Expected results:
There should be a configuration property to set this interval.

Additional info:
Having this configuration property would be also good for plugin and plugin container tests that require deep hierarchies of resources.

--- Additional comment from lkrejci on 2012-03-12 15:48:22 EDT ---

Created attachment 569496 [details]
proposed patch

Attaching a patch for adding such configuration property to the agent and plugin container configuration.

--- Additional comment from ccrouch on 2012-03-19 11:55:56 EDT ---

If this is tested and easily testable then lets apply the patch

--- Additional comment from mazz on 2012-04-04 15:16:04 EDT ---

master commit: de1000f

--- Additional comment from fbrychta on 2012-06-01 11:06:56 EDT ---

I followed this scenario:
1. Configure the agent to have both server and service discovery periods set to an hour (3600s) and set rhq.agent.plugins.child-discovery.delay-secs to 120s
2. import rhq-agent
3. check all imported resources -> platform's and rhq-agent's child resources was imported 
4. after 120s i can see in agent.log 'Executing runtime discovery scan rooted at [platform]' -> after this, child resources of agent's child resources was imported

Example: right after i imported the agent, i could see the JVM resource as a agent's child. The JVM had no child resources. After 120s JMV's child resources was imported. 

According to description of rhq.agent.plugins.child-discovery.delay-secs in agent-configuration.xml i would expect that all agent's child resources (including the JVM) should be imported after 120s.

--- Additional comment from mazz on 2012-06-05 11:50:05 EDT ---

bug 823942 has very recently changed the same area in InventoryManager that this patch changed. I'm not sure how it affected it, but its possible.

See this commit:

http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commit;h=14d53ea73b219a85d1b54584457ed48a60e1a556

--- Additional comment from mazz on 2012-06-05 13:11:00 EDT ---

(In reply to comment #5)
> bug 823942 has very recently changed the same area in InventoryManager that
> this patch changed. I'm not sure how it affected it, but its possible.
> 
> See this commit:
> 
> http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commit;
> h=14d53ea73b219a85d1b54584457ed48a60e1a556

That was Jay's commit. We still aren't sure if that commit has anythign to do with this. Plus, the patch for this issue was very simple - just added a new config to avoid hardcoding the "5" in the code.

Jay and I aren't sure if there is anything wrong here. Looking at this further, but we may need more input from Lukas to see if this really is still broken or not. In fact, I'll add a NEEDINFO here for Lukas to chime in since it was his patch that introduced this new config option to fix the issue.

--- Additional comment from lkrejci on 2012-06-05 13:22:22 EDT ---

<lkrejci> mazz: i thought that was more of a documentation issue actually... i think filip's expectation was that *all* the child resources are going to get discovered after that delay
<lkrejci> but because our discovery is incremental per level, the delay is applied before *each* child discovery, at each level
<lkrejci> i think that was his complaint... and i think that it therefore works as designed, only the docs are not clear enough

Filip, can you confirm this is what you're seeing/expecting?

--- Additional comment from jshaughn on 2012-06-05 16:08:49 EDT ---


I think maybe he is wondering why the JVM child is discovered immediately and not deferred like the other children.

--- Additional comment from fbrychta on 2012-06-06 08:04:55 EDT ---

Yes Jay, i expected that even immediate children would be discovered after defined delay. Lukas clarified that following is correct and expected behaviour:1- after manual import of resource, his immediate children are discovered immediately
2- children on a next level are discovered after rhq.agent.plugins.child-discovery.delay-secs
3- children on a next level are discovered after rhq.agent.plugins.child-discovery.delay-secs
4-... recursively

--- Additional comment from lkrejci on 2012-06-06 08:57:45 EDT ---

Leaving in ON_DEV until we decide what JON version this is going to go in.

master http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=27109e402d86deb7804249b870e7c15de7263491
Author: Lukas Krejci <lkrejci>
Date:   Wed Jun 6 14:54:38 2012 +0200

    [BZ 802550] - rewording the docs on rhq.agent.plugins.child-discovery.delay-secs
    to better explain what it actually means.

Comment 1 Lukas Krejci 2012-06-06 14:24:58 UTC
Retargetting to JON 3.1.1. 

We need to merge the commit 27109e402d86deb7804249b870e7c15de7263491 into JON 3.1.1 release branch.

Comment 2 Lukas Krejci 2012-08-07 10:02:17 UTC
release/jon3.1.x http://git.fedorahosted.org/cgit/rhq/rhq.git/diff/?id=a00038bf941b2c719a133beb7e61aeddefb3803d
Author: Lukas Krejci <lkrejci>
Date:   Wed Jun 6 14:54:38 2012 +0200

    [BZ 802550] - rewording the docs on rhq.agent.plugins.child-discovery.delay-secs
    to better explain what it actually means.
    (cherry picked from commit 27109e402d86deb7804249b870e7c15de7263491)

Comment 3 John Sanda 2012-08-14 02:16:33 UTC
Moving to ON_QA since JON 3.1.1 ER2 build is availble - https://brewweb.devel.redhat.com/buildinfo?buildID=228250

Comment 4 Lukas Krejci 2012-08-14 08:04:43 UTC
Moving to ON_QA

Comment 5 Filip Brychta 2012-08-14 11:34:13 UTC
Verified on JON 3.1.1 ER2

Comment 6 Heiko W. Rupp 2013-09-03 15:11:35 UTC
Bulk closing of old issues in VERIFIED state.