Bug 607001 - jboss-as-5 plugin: discovery for AS5/EAP5 JBossMessaging Queue Resources hangs and is timed out and blacklisted by the PC
Summary: jboss-as-5 plugin: discovery for AS5/EAP5 JBossMessaging Queue Resources hang...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Plugins
Version: 1.3.1
Hardware: All
OS: All
high
medium
Target Milestone: ---
: ---
Assignee: Jan Martiska
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: jon-sprint12-bugs jon30-bugs
TreeView+ depends on / blocked
 
Reported: 2010-06-22 22:24 UTC by Ian Springer
Modified: 2013-09-02 07:14 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-02 07:14:20 UTC
Embargoed:


Attachments (Terms of Use)
logfile for agent which should contain a few references to inventorying eap 5.1 (1.57 MB, application/octet-stream)
2010-07-24 02:09 UTC, Corey Welton
no flags Details
Screenshot_EAP5.1.1 (102.03 KB, image/png)
2011-10-03 11:57 UTC, Sunil Kondkar
no flags Details

Description Ian Springer 2010-06-22 22:24:11 UTC
I saw this for a newly inventoried EAP 5.0.1 Resource. All of its descendant Resources were discovered, except for both Queues (DLQ and ExpiryQueue). I eventually realized it was vecause the Queue resource type had been backlisted by the PC, which means the Queue discovery component's discoverResources() method took longer than 5 minutes to complete. This is most likely a bug in the as5 plugin. Unfortunately, I don't know why the discovery hung, because our logging was lacking that information and I did not get a thread dump at the time it was actually hung.

Comment 1 Ian Springer 2010-06-22 22:26:07 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=606999 adds better logging when discovery invocations time out. The stack trace of the hung discovery thread is now logged at WARN level. The next time the Queue discovery component hangs, this logging should help us ascertain the root cause of this issue.

Comment 2 Charles Crouch 2010-07-23 22:42:44 UTC
From corey
"This does occur in 5.1 versus 5.0.  Queues show up in the latter, but not the former."

(5:26:59 PM) ips: ccrouch - i have an eap 5.1 in inv and my 2 queues got discovered
(5:27:26 PM) ips: so i don't think it happens every time
(5:28:17 PM) ips: but i'm going to try to run discovery -f a bunch of times and check my agent log to see if any discovery component threads time out / get blacklisted

Comment 3 Charles Crouch 2010-07-23 23:10:47 UTC
(5:43:00 PM) ccrouch: ips: wrt to the component blacklisting, when does that get reset?
(5:47:47 PM) ccrouch: would it be possible for people to restart the agent say and try discovery again?
(5:57:18 PM) ips: ccrouch: yeah i think restarting the agentresets it
(5:57:38 PM) ips: there's also a way to rset it using an agent prompt command i think
(5:59:50 PM) ips: yep: "discovery --blacklist=list" to list, and "discovery --blacklist=clear" to clear
(6:01:22 PM) ccrouch: ok, i'll add that and ask corey to retest
(6:02:59 PM) ips: k, as long as he attachs his agent log first :)
(6:03:18 PM) ccrouch: ips +1
(6:03:20 PM) ips: i don't want to lose the stack trace from the hung thread
(6:05:10 PM) ips: the thing that perplexes me is that if you look at RemoteProfileServiceConnectionProvider, i set the jnp.timeout and jnp.sotimeout props to 60s
(6:05:54 PM) ips: so i would think those timeouts would timeout any profile service calls that are taking longer than a minute
(6:06:29 PM) ccrouch: right
(6:07:54 PM) ips: but the discovery component timeout is 5 mins ...
(6:08:26 PM) ips: so very anxious to see corey's log

Comment 4 Corey Welton 2010-07-24 02:08:13 UTC
Have inventoried and reinventoried EAP 5.1 a few times, queues never got discovered.  Attaching (gargantuan) log.

Comment 5 Corey Welton 2010-07-24 02:09:55 UTC
Created attachment 434097 [details]
logfile for agent which should contain a few references to inventorying eap 5.1

Comment 6 John Mazzitelli 2010-07-26 13:48:00 UTC
discovery prompt command can be used to list and clear the blacklist

-b, --blacklist={list|clear} : Operates on the blacklist which determines
                               which resource types are not discoverable.
                               (note that specifying this option will not
                               run an actual discovery scan)
                               'list' prints blacklisted resource types.
                               'clear' delists all resource types which
                               re-enables all types to be discoverable.

So you can clear the blacklist and get the discovery component to be re-activated again via :

discovery --blacklist=clear

Comment 7 Ian Springer 2010-07-26 17:39:18 UTC
When Corey wasn't seeing the queues discovered, he ran "discovery --blacklist=list" and his blacklist was empty. So I don't think he's seeing the same thing I saw where my queue discovery component got blacklisted. Since I don't see any errors in his Agent log related to queue discovery, I'm guessing the discovery just returned an empty Set for some reason, because if it threw an exception, I think we'd see that in the Agent log.

Comment 8 Corey Welton 2010-07-27 12:08:30 UTC
In my initial EAP 5.1 run, I think/thought I repro'd the behavior, I was not seeing queues.  After stopping the agent and restarting it, they appeared to show up.  I never saw anything on a blacklist, but I am not sure if that was because I stopped and restarted agent.

I retried twice on two separate machines, to duplicate behavior (once of which was a heavily resourced box), so that I could try clearing blacklists, etc., from the UI ,but was unable to repro -- the queues appeared.  It did seem that the queues took longer to show up in inventory, but they did eventually show up.

I then tried repro'ing twice more using EAP 5.0.1 (versus a 5.1 build).  Again, queues showed up.

So in short, I am unable to get this to reproduce successfully.  Assuming I did reproduce it the first time (i.e., it wasn't user error/misdiagnosis), an agent restart and/or cleared blacklist seemed to do the trick.

Comment 9 Charles Crouch 2010-07-27 12:45:43 UTC
Dropping priority based on non-reproducability and availability of a workaround

Comment 10 Charles Crouch 2010-08-04 12:37:35 UTC
Seems like a good candidate for automated testing, to determine if this is reproducible

Comment 11 Charles Crouch 2011-09-27 13:05:29 UTC
Pushing to ON_QA to see if this is reproducible on the latest EAP5.1.1 builds

Comment 12 Sunil Kondkar 2011-10-03 11:56:38 UTC
Verified on build#456 (Version: 4.1.0-SNAPSHOT Build Number: 702edd7)

Started EAP5.1.1 , discovered and imported EAP5.1.1 in RHQ.
The queues 'DLQ' and 'ExpiryQueue' are discovered successfully.

Please refer the attached screenshot.

Comment 13 Sunil Kondkar 2011-10-03 11:57:52 UTC
Created attachment 526040 [details]
Screenshot_EAP5.1.1

Comment 15 Heiko W. Rupp 2013-09-02 07:14:20 UTC
Bulk closing of issues that were VERIFIED, had no target release and where the status changed more than a year ago.


Note You need to log in before you can comment on or make changes to this bug.