Bug 1127029 - Agent [null] would like to connect to this server
Summary: Agent [null] would like to connect to this server
Keywords:
Status: ON_QA
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: 4.12
Hardware: Unspecified
OS: Unspecified
high
high vote
Target Milestone: ---
: RHQ 4.13
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1127092
TreeView+ depends on / blocked
 
Reported: 2014-08-06 01:24 UTC by Elias Ross
Modified: 2014-08-08 14:01 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1127092 (view as bug list)
Environment:
Last Closed:


Attachments (Terms of Use)

Description Elias Ross 2014-08-06 01:24:18 UTC
Description of problem:

I have a agent in my network (don't know which one) connecting to RHQ with not so good results:

01:16:45,788 INFO  [org.rhq.enterprise.server.core.CoreServerServiceImpl] (http-/0.0.0.0:7080-13) Agent [null][4.12.0(422702f)] would like to connect to this server

It would be helpful if the message included the remote client IP address, but I don't know and it's difficult to know when there are hundreds of existing agents.


Version-Release number of selected component (if applicable): 4.12


How reproducible: Unclear

Steps to Reproduce:
1. Unclear

Additional info:

diff --git a/modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/core/CoreServerServiceImpl.java b/modules/enterprise/server/jar/src/main/java/org/rhq/enterpris
index 2e4ddfc..162e44c 100644
--- a/modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/core/CoreServerServiceImpl.java
+++ b/modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/core/CoreServerServiceImpl.java
@@ -336,6 +336,11 @@ public ConnectAgentResults connectAgent(ConnectAgentRequest request) throws Agen
         String agentName = request.getAgentName();
         AgentVersion agentVersion = request.getAgentVersion();
 
+        if (agentName == null) {
+            String msg = request.getAddress() + ':' + request.getPort() + "] is connecting without a name";
+            throw new AgentNotSupportedException(msg);
+        }
+
         log.info("Agent [" + agentName + "][" + agentVersion + "] would like to connect to this server");
 
         if (!getAgentManager().isAgentVersionSupported(agentVersion)) {

This can't work because there is no way to get the IP/port from this request. IP/port should probably be available for every request, although not clear from an API standpoint how that might work. (I don't think the client should tell what IP it is, the server knows that already.)

Comment 1 Elias Ross 2014-08-06 01:47:30 UTC
Okay, what happened was installing a new storage node, the installer tries to install a new agent, though one was already installed on the host in a different location. It blew away the preferences directory for an already installed agent and caused trouble.

Comment 2 Heiko W. Rupp 2014-08-06 19:15:56 UTC
Elias,
do you recall the exact command line you used to install the storage node?

Comment 3 Jay Shaughnessy 2014-08-06 19:59:55 UTC
I'm guessing it was just > rhqct install --storage.

I think this is a missing use case in that installer.  When laying down a server component (storage or actual server) we want any agent on that machine to be located as a sibling to the rhq-server directory, and to be controlled by rhqctl.

Since a storage node requires an agent, 'rhqctl install --storage' implicitly installs an agent as well.

We support 'rhqctl upgrade --from-agent-dir' to allow for moving an existing agent to the expected location.  But we don't support that option for 'rhqctl install'.  I guess the idea of installing a server on a former agent-only machine didn't cross our minds or just didn't get done.

I think we may need to support something like 'rhqctl install --from-agent-dir' which would relocate an existing agent to the expected location.

If not we should likely just document that in this scenario you should shut down the legacy agent and start using the new agent.  That's a bit weak.

Comment 4 Elias Ross 2014-08-06 21:27:42 UTC
I don't mind it trying to do its install and just failing, but the problem is it actually causes the existing agent to forget its name.

It's also a bug that an agent would try to register with no name.

The command was just as Jay S. described.

Comment 5 Jay Shaughnessy 2014-08-08 14:01:51 UTC
commit 1106141fb59aefa027602c138f59b33f4347466a
Author: Jay Shaughnessy <jshaughn@redhat.com>
Date:   Fri Aug 8 09:59:02 2014 -0400

 The issue here is that 'rhqctl install' does not make provisions for an
 existing agent on the machine.  When installing an RHQ Server or Storage
 Node it is then expected that RHQ components be managed by the rhqctl
 facility.  This is made possible when upgrading from pre-rhqctl versions
 by the 'rhqctl upgrade' --from-server-dir and --from-agent-dir options,
 which allow an upgrade to "pull in" the existing components on the machine.
 The missing use case is this one, when not upgrading but rather installing.
 So, this commit adds the --from-agent-dir option to 'rhqctl install'.
 Allowing a formerly installed RHQ Agent to be updated (as needed) and
 pulled under rhqctl, moving it to the expected location of sibling
 directory to the rhq-server installation directory (and not installing
 a new agent, which caused the reported issue)
 - Also, fix an issue in 'rhqctl upgrade --from-agent-dir' where the new
   agent was not actually moved to the proper sibling directory.  I believe
   this affected RHQ 4.12 (and possibly JON 3.2)


Note You need to log in before you can comment on or make changes to this bug.