Bug 1118091 - RHQ Agent resources are not being auto-clustered in resource group due to their unique resource keys
Summary: RHQ Agent resources are not being auto-clustered in resource group due to the...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Resource Grouping
Version: JON 3.2.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: DR01
: JON 3.3.0
Assignee: Jay Shaughnessy
QA Contact: Filip Brychta
URL:
Whiteboard:
: 1029598 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-10 01:33 UTC by Larry O'Leary
Modified: 2018-12-06 17:16 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The way RHQ Agents used their managing agent's name as their resource key caused each agent to appear as a distinct resource in the resource group navigation tree. Agent discovery now assigns the value RHQ Agent to a newly discovered agent's resource key if the managing agent has the same name as the platform resource key. Agents have the same resource key by default, regardless of which platform they are from. This allows agents to be auto-clustered and to appear as a single node in the resource group navigation true.
Clone Of:
Environment:
Last Closed: 2014-12-11 14:04:31 UTC
Type: Bug


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 909763 None None None Never
Red Hat Bugzilla 1118083 None None None Never

Internal Links: 1118083

Description Larry O'Leary 2014-07-10 01:33:52 UTC
Description of problem:
A compatible group which is recursive and contains a child resource which is a singleton results in multiple resources appearing at the root or base node:

DynaGroup - All Platforms
├── CPUs
├── File Systems
├── JBossAS7 Standalone Servers
├── Network Adapters
├── RHQ Agent
│   ├── Agent Measurement Subsystem
│   ├── Agent Plugin Container
│   ├── JVM
│   ├── rhq-agent-env.sh
│   └── RHQ Agent Launcher Script
├── RHQ Agent
│   ├── Agent Measurement Subsystem
│   ├── Agent Plugin Container
│   ├── JVM
│   ├── rhq-agent-env.sh
│   └── RHQ Agent Launcher Script
└── Bundle Handler - Ant

Although this may be expected when viewing the parent resource from the normal inventory view, this is not supposed to happen in the group view. Instead, the resources, even those that are singleton, should be auto-clustered.

Version-Release number of selected component (if applicable):
3.2.1

How reproducible:
Always

Steps to Reproduce:
1. Install and start JBoss ON system with two or more agents/platforms in inventory.
2.  Create a new dynamic group definition named _All Platforms_:

    *   *Expression*: `resource.type.category = PLATFORM`
    *   *Recursive*: _True_
    
3.  Navigate to the newly created group.

Actual results:
There are two RHQ Agent nodes in the navigation tree at the left of the group view.

Expected results:
The two RHQ Agent nodes should appear under a parent auto-cluster node named RHQ Agents:

    DynaGroup - All Platforms
    ├── CPUs
    ├── File Systems
    ├── JBossAS7 Standalone Servers
    ├── Network Adapters
    ├── RHQ Agents                <--- GROUPING NODE
    │   ├── RHQ Agent
    │   │   ├── Agent Measurement Subsystem
    │   │   ├── Agent Plugin Container
    │   │   ├── JVM
    │   │   ├── rhq-agent-env.sh
    │   │   └── RHQ Agent Launcher Script
    │   └── RHQ Agent
    │       ├── Agent Measurement Subsystem
    │       ├── Agent Plugin Container
    │       ├── JVM
    │       ├── rhq-agent-env.sh
    │       └── RHQ Agent Launcher Script
    └── Bundle Handler - Ant


Additional info:
I seems that for group views, singleton should be ignored at the root level? Not sure if that is also true under any of the child nodes. I would imagine not.

Comment 1 Jay Shaughnessy 2014-07-10 19:13:30 UTC
I'm not sure how easy it would be to do but I agree that it might be marginally nicer to group these under an "RHQ Agents" resource type node.

Although, just to be clear, realize that since all the agents likely have different clusterkeys they would still show up individually under the proposed parent node.

Comment 2 Larry O'Leary 2014-07-11 00:56:59 UTC
(In reply to Jay Shaughnessy from comment #1)
> Although, just to be clear, realize that since all the agents likely have
> different clusterkeys they would still show up individually under the
> proposed parent node.

Right. Which is probably the real issue here. Perhaps the singleton is very helpful here? Perhaps the auto-cluster can revert to resource type in the event that the type is marked as a singleton and therefore ignore the key? 

The user's issue here is that RHQ Agent is actually deployed on 100% of the cluster. Therefore, I believe I may have mixed up the issue a bit. What is expected is that RHQ Agent is grouped/displayed as a single AutoCluster. This is because there is one on each resource. Because it is a singleton, we know that each resource can have exactly one meaning that auto-clustering could be done safely and with meaning?

The final tree:

    DynaGroup - All Platforms
    ├── CPUs
    ├── File Systems
    ├── JBossAS7 Standalone Servers
    ├── Network Adapters
    ├── RHQ Agents                <--- CLUSTER NODE
    │   └── RHQ Agent
    │       ├── Agent Measurement Subsystem
    │       ├── Agent Plugin Container
    │       ├── JVM
    │       ├── rhq-agent-env.sh
    │       └── RHQ Agent Launcher Script
    └── Bundle Handler - Ant

And in the event that one of the platforms did not have an RHQ Agent resource in inventory:

    DynaGroup - All Platforms
    ├── CPUs
    ├── File Systems
    ├── JBossAS7 Standalone Servers
    ├── Network Adapters
    ├── RHQ Agents                <--- CLUSTER NODE
    │   └── RHQ Agent (50%)
    │       ├── Agent Measurement Subsystem
    │       ├── Agent Plugin Container
    │       ├── JVM
    │       ├── rhq-agent-env.sh
    │       └── RHQ Agent Launcher Script
    └── Bundle Handler - Ant

Comment 3 Larry O'Leary 2014-07-11 02:40:01 UTC
Once again, I am updating the title of this BZ. This is to reflect the issue actually raised by the user.

Specifically, the RHQ Agent resource appears multiple times in the resource group view with a less then 100% group member distribution even though the resource is identical on all platforms.

This of course is due to each RHQ Agent having a unique resource key. The following shows what the user expects to see. This is what was described in comment 2 with one correction to the bring the cluster node to the root:

    DynaGroup - All Platforms
    ├── CPUs
    ├── File Systems
    ├── JBossAS7 Standalone Servers
    ├── Network Adapters
    ├── RHQ Agent                       <--- AUTO-CLUSTER NODE
    │   ├── Agent Measurement Subsystem
    │   ├── Agent Plugin Container
    │   ├── JVM
    │   ├── rhq-agent-env.sh
    │   └── RHQ Agent Launcher Script
    └── Bundle Handler - Ant


Again, group with two platforms with RHQ Agent only deployed to a single platform:

    DynaGroup - All Platforms
    ├── CPUs
    ├── File Systems
    ├── JBossAS7 Standalone Servers
    ├── Network Adapters
    ├── RHQ Agent (50%)                  <--- AUTO-CLUSTER NODE
    │   ├── Agent Measurement Subsystem
    │   ├── Agent Plugin Container
    │   ├── JVM
    │   ├── rhq-agent-env.sh
    │   └── RHQ Agent Launcher Script
    └── Bundle Handler - Ant


To be clear, this issue doesn't have anything to do with the singleton as I original thought. I am also not sure what other resource types would be impacted by this issue but for this case, RHQ Agent is the offender.

Comment 4 Jay Shaughnessy 2014-07-11 17:31:57 UTC
To get the behavior they want agents would need to have the same resource key across different platforms. If that were true they would auto-cluster as desired under the platform group.  But as it stands RHQ Agent resource keys incorporate the agent name (which typically defaults to the hostname), and agent names are unique among all agents.  That means that for this use case you'll have a separate RHQ Agent child node in the tree for every imported RHQ Agent.

I agree that this is annoying.  But since we support multiple agents on a platform, we can't just use a static resource key because then multiple agents under the same platform would get the same key and we'd have a conflict.  Moreover, keys need to be predictable, so the same agent must get the same key every time.

Any solution would need to somehow make a change to the RHQ Agent reskey generation.  Trying to solve this problem in the tree code is likely a bad idea full of special case code.  I do have an idea.  Thinking about it some more...

Comment 7 Jay Shaughnessy 2014-07-14 18:58:40 UTC

The proposed solution has been posted for review as:

   https://github.com/rhq-project/rhq/pull/86

Comment 8 Jay Shaughnessy 2014-07-21 20:33:39 UTC
master commit 26e5712b8cefc7601f6ee95091922de667ee3752
Author: Jay Shaughnessy <jshaughn@redhat.com>
Date:   Fri Jul 18 13:25:47 2014 -0400

    Another round of scalability enhancements for updating plugin metadata.  In
    the past we broke the update of each Plugin into its own Tx.  Later we split
    registeringTypes and removingTypes into separate Tx and applied a 30 minute
    timeout to the type registration.  With this pass we now update each type
    in its own Tx and allow up to 30 minutes per type.  This can be necessary
    if updating plugin configurations for a large existing resource population.

Comment 9 Simeon Pinder 2014-07-31 15:52:04 UTC
Moving to ON_QA as available to test with brew build of DR01: https://brewweb.devel.redhat.com//buildinfo?buildID=373993

Comment 10 Jay Shaughnessy 2014-09-05 02:40:55 UTC
Comment 8 has what looks like the wrong commit:  The correct commit are:

Master commit d571b9caa4522eb1420898307b24cd1d97fb1e17

Comment 11 Jay Shaughnessy 2014-09-05 02:42:39 UTC
*** Bug 1029598 has been marked as a duplicate of this bug. ***

Comment 12 Filip Brychta 2014-09-10 09:23:19 UTC
Verified on
Version :	
3.3.0.ER02
Build Number :	
4fbb183:7da54e2

Verified behaviour described in comment 3


Note You need to log in before you can comment on or make changes to this bug.