I saw this for a newly inventoried EAP 5.0.1 Resource. All of its descendant Resources were discovered, except for both Queues (DLQ and ExpiryQueue). I eventually realized it was vecause the Queue resource type had been backlisted by the PC, which means the Queue discovery component's discoverResources() method took longer than 5 minutes to complete. This is most likely a bug in the as5 plugin. Unfortunately, I don't know why the discovery hung, because our logging was lacking that information and I did not get a thread dump at the time it was actually hung.
https://bugzilla.redhat.com/show_bug.cgi?id=606999 adds better logging when discovery invocations time out. The stack trace of the hung discovery thread is now logged at WARN level. The next time the Queue discovery component hangs, this logging should help us ascertain the root cause of this issue.
From corey "This does occur in 5.1 versus 5.0. Queues show up in the latter, but not the former." (5:26:59 PM) ips: ccrouch - i have an eap 5.1 in inv and my 2 queues got discovered (5:27:26 PM) ips: so i don't think it happens every time (5:28:17 PM) ips: but i'm going to try to run discovery -f a bunch of times and check my agent log to see if any discovery component threads time out / get blacklisted
(5:43:00 PM) ccrouch: ips: wrt to the component blacklisting, when does that get reset? (5:47:47 PM) ccrouch: would it be possible for people to restart the agent say and try discovery again? (5:57:18 PM) ips: ccrouch: yeah i think restarting the agentresets it (5:57:38 PM) ips: there's also a way to rset it using an agent prompt command i think (5:59:50 PM) ips: yep: "discovery --blacklist=list" to list, and "discovery --blacklist=clear" to clear (6:01:22 PM) ccrouch: ok, i'll add that and ask corey to retest (6:02:59 PM) ips: k, as long as he attachs his agent log first :) (6:03:18 PM) ccrouch: ips +1 (6:03:20 PM) ips: i don't want to lose the stack trace from the hung thread (6:05:10 PM) ips: the thing that perplexes me is that if you look at RemoteProfileServiceConnectionProvider, i set the jnp.timeout and jnp.sotimeout props to 60s (6:05:54 PM) ips: so i would think those timeouts would timeout any profile service calls that are taking longer than a minute (6:06:29 PM) ccrouch: right (6:07:54 PM) ips: but the discovery component timeout is 5 mins ... (6:08:26 PM) ips: so very anxious to see corey's log
Have inventoried and reinventoried EAP 5.1 a few times, queues never got discovered. Attaching (gargantuan) log.
Created attachment 434097 [details] logfile for agent which should contain a few references to inventorying eap 5.1
discovery prompt command can be used to list and clear the blacklist -b, --blacklist={list|clear} : Operates on the blacklist which determines which resource types are not discoverable. (note that specifying this option will not run an actual discovery scan) 'list' prints blacklisted resource types. 'clear' delists all resource types which re-enables all types to be discoverable. So you can clear the blacklist and get the discovery component to be re-activated again via : discovery --blacklist=clear
When Corey wasn't seeing the queues discovered, he ran "discovery --blacklist=list" and his blacklist was empty. So I don't think he's seeing the same thing I saw where my queue discovery component got blacklisted. Since I don't see any errors in his Agent log related to queue discovery, I'm guessing the discovery just returned an empty Set for some reason, because if it threw an exception, I think we'd see that in the Agent log.
In my initial EAP 5.1 run, I think/thought I repro'd the behavior, I was not seeing queues. After stopping the agent and restarting it, they appeared to show up. I never saw anything on a blacklist, but I am not sure if that was because I stopped and restarted agent. I retried twice on two separate machines, to duplicate behavior (once of which was a heavily resourced box), so that I could try clearing blacklists, etc., from the UI ,but was unable to repro -- the queues appeared. It did seem that the queues took longer to show up in inventory, but they did eventually show up. I then tried repro'ing twice more using EAP 5.0.1 (versus a 5.1 build). Again, queues showed up. So in short, I am unable to get this to reproduce successfully. Assuming I did reproduce it the first time (i.e., it wasn't user error/misdiagnosis), an agent restart and/or cleared blacklist seemed to do the trick.
Dropping priority based on non-reproducability and availability of a workaround
Seems like a good candidate for automated testing, to determine if this is reproducible
Pushing to ON_QA to see if this is reproducible on the latest EAP5.1.1 builds
Verified on build#456 (Version: 4.1.0-SNAPSHOT Build Number: 702edd7) Started EAP5.1.1 , discovered and imported EAP5.1.1 in RHQ. The queues 'DLQ' and 'ExpiryQueue' are discovered successfully. Please refer the attached screenshot.
Created attachment 526040 [details] Screenshot_EAP5.1.1
Bulk closing of issues that were VERIFIED, had no target release and where the status changed more than a year ago.