Description of problem: Starting the agent with --clean will not discover the RHQ Server running on the same box. Version-Release number of selected component (if applicable): How reproducible: Very. Steps to Reproduce: 1. Fresh DB 2. Start agent with --clean 3. Answer setup prompt with localhost / localhost for both hosts Actual results: The RHQ server running on the same box does NOT appear in the AD portlet. Expected results: The RHQ server running on the same box as the agent gets discovered Additional info: Neither of the following remedied the situation 1) "pc stop" / "pc start" from the agent console 2) "discovery -f" from the agent console / forcing the detailed discovery from the Platform resource in inventory
When I ran discovery -f I was able to discovery a JBoss 4.2.2 instance. When I had previously reproduced this I was only running "discovery" without the -f option from the agent prompt. The default server discovery scan interval is 15 minutes. I changed that to two minutes and I was able to again discover my JBoss server. Like Joseph I have seen this behavior as well so I was a bit surprised when discovery worked for me. Maybe the issue was server scan only running every 15 minutes. In any event I am going to move this to ON_QA because I'd like QE to take a look to see if they think there is any issue.
That's odd, because the normal discovery (not using the '-f' switch) should still be able to find top-level servers like the JBossAS instance in question here. On my box, regardless of whether I run "discovery" or "discovery -f" the results are the same: Full discovery run in [27] ms === Server Scan Inventory Report === Start Time: Apr 27, 2010 5:06:40 PM Finish Time: Apr 27, 2010 5:06:40 PM Resource Count: 0 === Service Scan Inventory Report === Start Time: Apr 27, 2010 5:06:40 PM Finish Time: Apr 27, 2010 5:06:40 PM Resource Count: 0 In other words, it consistently finds nothing after starting the agent --clean. It also finds nothing after I execute "pc stop" and "pc start". The only thing that seems to fix it is bouncing the agent. After that, it successfully finds the RHQ Server instance.
OK, so I finally whipped out a debugger since I wanted to get to the bottom of this already. The root cause, as indicated by the discovery report results shown above, was that the JBossAS process was not being discovered via process table scan. Upon further inspection, it appears to be issues with process visibility between different users on a system at the Sigar layer. Once I restart the agent as root, my issues went away. We've seen this before with several other plugins, notably Postgres, so I'm choking this up to PEBCAK and closing the issue out.
Ah, I was running both agent and rhq server under the same user account. That would explain the discrepancy between our findings. Thanks for following up on this.
Just to clarify this is not a shortcoming in Sigar, more of an issue with our plugins https://bugzilla.redhat.com/show_bug.cgi?id=534850