Bug 802550

Summary: Unable to configure the cadence of child discovery
Product: [Other] RHQ Project Reporter: Lukas Krejci <lkrejci>
Component: AgentAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Filip Brychta <fbrychta>
Severity: unspecified Docs Contact:
Priority: high    
Version: unspecifiedCC: fbrychta, hrupp, jshaughn, mazz
Target Milestone: ---   
Target Release: JON 3.1.0   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=829355
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 829350 (view as bug list) Environment:
Last Closed: 2013-09-03 11:07:56 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 782579, 829350    
Description Flags
proposed patch none

Description Lukas Krejci 2012-03-12 15:45:15 EDT
Description of problem:
The discovery of child resources is currently hardcoded to occur 5 seconds after the agent received the newly committed resources. This should be made configurable because it affects the load the plugin container generates during the initial import of inventories.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Configure the agent to have both server and service discovery periods set to an hour (3600s).
2. Watch the agent.log
3. Import a single server (with some child servers or services) into an otherwise empty platform
4. At the time you import the server in the UI, there should be a message in the agent.log stating:

Syncing local inventory with Server inventory...

5. 5 seconds after that message (but no sooner), there should be a message:

Executing runtime discovery scan rooted at [platform]

Actual results:
No way of influencing the 5 seconds interval

Expected results:
There should be a configuration property to set this interval.

Additional info:
Having this configuration property would be also good for plugin and plugin container tests that require deep hierarchies of resources.
Comment 1 Lukas Krejci 2012-03-12 15:48:22 EDT
Created attachment 569496 [details]
proposed patch

Attaching a patch for adding such configuration property to the agent and plugin container configuration.
Comment 2 Charles Crouch 2012-03-19 11:55:56 EDT
If this is tested and easily testable then lets apply the patch
Comment 3 John Mazzitelli 2012-04-04 15:16:04 EDT
master commit: de1000f
Comment 4 Filip Brychta 2012-06-01 11:06:56 EDT
I followed this scenario:
1. Configure the agent to have both server and service discovery periods set to an hour (3600s) and set rhq.agent.plugins.child-discovery.delay-secs to 120s
2. import rhq-agent
3. check all imported resources -> platform's and rhq-agent's child resources was imported 
4. after 120s i can see in agent.log 'Executing runtime discovery scan rooted at [platform]' -> after this, child resources of agent's child resources was imported

Example: right after i imported the agent, i could see the JVM resource as a agent's child. The JVM had no child resources. After 120s JMV's child resources was imported. 

According to description of rhq.agent.plugins.child-discovery.delay-secs in agent-configuration.xml i would expect that all agent's child resources (including the JVM) should be imported after 120s.
Comment 5 John Mazzitelli 2012-06-05 11:50:05 EDT
bug 823942 has very recently changed the same area in InventoryManager that this patch changed. I'm not sure how it affected it, but its possible.

See this commit:

Comment 6 John Mazzitelli 2012-06-05 13:11:00 EDT
(In reply to comment #5)
> bug 823942 has very recently changed the same area in InventoryManager that
> this patch changed. I'm not sure how it affected it, but its possible.
> See this commit:
> http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commit;
> h=14d53ea73b219a85d1b54584457ed48a60e1a556

That was Jay's commit. We still aren't sure if that commit has anythign to do with this. Plus, the patch for this issue was very simple - just added a new config to avoid hardcoding the "5" in the code.

Jay and I aren't sure if there is anything wrong here. Looking at this further, but we may need more input from Lukas to see if this really is still broken or not. In fact, I'll add a NEEDINFO here for Lukas to chime in since it was his patch that introduced this new config option to fix the issue.
Comment 7 Lukas Krejci 2012-06-05 13:22:22 EDT
<lkrejci> mazz: i thought that was more of a documentation issue actually... i think filip's expectation was that *all* the child resources are going to get discovered after that delay
<lkrejci> but because our discovery is incremental per level, the delay is applied before *each* child discovery, at each level
<lkrejci> i think that was his complaint... and i think that it therefore works as designed, only the docs are not clear enough

Filip, can you confirm this is what you're seeing/expecting?
Comment 8 Jay Shaughnessy 2012-06-05 16:08:49 EDT
I think maybe he is wondering why the JVM child is discovered immediately and not deferred like the other children.
Comment 9 Filip Brychta 2012-06-06 08:04:55 EDT
Yes Jay, i expected that even immediate children would be discovered after defined delay. Lukas clarified that following is correct and expected behaviour:1- after manual import of resource, his immediate children are discovered immediately
2- children on a next level are discovered after rhq.agent.plugins.child-discovery.delay-secs
3- children on a next level are discovered after rhq.agent.plugins.child-discovery.delay-secs
4-... recursively
Comment 10 Lukas Krejci 2012-06-06 08:57:45 EDT
Leaving in ON_DEV until we decide what JON version this is going to go in.

master http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=27109e402d86deb7804249b870e7c15de7263491
Author: Lukas Krejci <lkrejci@redhat.com>
Date:   Wed Jun 6 14:54:38 2012 +0200

    [BZ 802550] - rewording the docs on rhq.agent.plugins.child-discovery.delay-secs
    to better explain what it actually means.
Comment 11 Lukas Krejci 2012-06-06 10:38:47 EDT
Pushing back to ON_QA.

This seems to work as expected, only the documentation wording was not clear enough.

I created the bug 829350 for tracking of inclusion of the updated xml comments on the setting in JON 3.1.1 and bug 829355 for updating JON 3.1.0 docs with the more accurate wording.
Comment 12 Heiko W. Rupp 2013-09-03 11:07:56 EDT
Bulk closing of old issues in VERIFIED state.