Bug 1380471
Summary: | UI's live availability check results in repeated "A request timeout has expired after 30000 ms" | ||||||
---|---|---|---|---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | Larry O'Leary <loleary> | ||||
Component: | UI | Assignee: | Josejulio Martínez <jmartine> | ||||
Status: | CLOSED ERRATA | QA Contact: | Prachi <pyadav> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | JON 3.3.7 | CC: | fbrychta, jmartine, pyadav, spinder | ||||
Target Milestone: | CR01 | Keywords: | Triaged | ||||
Target Release: | JON 3.3.9 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
URL: | https://developer.jboss.org/message/963465 | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-10-02 17:21:51 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Larry O'Leary
2016-09-29 16:50:25 UTC
I couldn't reproduce this bug using "Steps to Reproduce". Reason is, server tries to ping the agent before trying to get resource avail. It tries to ping agent with a timeout of 5s, if agent doesn't ping back, it will report AvailabilityType.UNKNOWN (or AvailabilityType.DOWN if resource is of type Platform). As port 16163 is configured on DROP, it will fail the ping check. I could reproduce this was by setting a breakpoint [1] and keep it waiting there until timeout. [1] https://github.com/josejulio/rhq/blob/012b4f48f0072a4df3995cc3279cdd0cabde6361/modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/resource/ResourceManagerBean.java#L2710-L2710 Interesting. I am not able to reproduce this using my original steps either. I also attempted the following: 1. Confirm no sockets/connections exist from server to agent. 2. Navigate to RHQ Server resource. -- 1. Confirm no sockets/connections exist from server to agent. 2. Navigate to RHQ Agent resource. -- 1. Unblock port 16163. 2. Import EAP 6 server into inventory using port 9990. 3. Confirm EAP 6 reports available (i.e. connection settings are valid). 4. Navigate away from EAP 6 resource in UI. 5. Block (DROP) packets to port 9990. 6. Navigate to EAP 6 resource and wait a couple minutes. ^^ Repeate steps 1 to 5 6. Navigate to EAP 6 resource's metric table to confirm live metrics aren't causing the UI timeout. In all instances this worked without any warning/error to the UI. I would be okay with CLOSED/WORKSFORME unless you want to address the general timeout/hang issue you were able to reproduce form adding the break point. Basically, my concern is that if it is taking longer then 15 seconds to get the live availability for a resource, then we should treat the availability as UNKNOWN without waiting for the 30 second generic UI timeout. Primary reason is that if we wait 30 seconds, we now have another 1 or 2 availability checks queued up that will also result in the same UI warning -- potentially after the user has already navigated away. I'll defer to dev's expertise as it relates to simplicity of a fix and risk assessment. I already have fix, i just need to do a bit more of testing. I'll try to change timeout to 15s. Currently, on timeout, is showing previous availability. It makes sense to change it as UNKNOWN (or DOWN if resource is a Platform[1]) [1] https://github.com/josejulio/rhq/blob/012b4f48f0072a4df3995cc3279cdd0cabde6361/modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/resource/ResourceManagerBean.java#L2712-L2716 Larry, I don't think is necessary to lower the timeout to 15s, there is a countdown latch [1] that will only allow one refresh at a given time. [1] https://github.com/josejulio/rhq/blob/91291e00a58349c1c36166ac8d3a3c3bfc3bdc2f/modules/enterprise/gui/coregui/src/main/java/org/rhq/coregui/client/inventory/resource/detail/ResourceTitleBar.java#L150-L153 Okay. I think the reason I was concerned is because I can see a new socket being created every 15 seconds. If the socket is waiting for a connect or has hung, it seems to remain around for a couple of minutes. This resulted in 8 sockets to this single agent. commit f1bbd51c69a67a9ff59e42b6d1c515d25526bd6f Merge: 3f5df89 803dbc9 Author: Michael Burman <yak> Date: Wed Aug 30 21:22:48 2017 +0300 Merge pull request #317 from josejulio/bugs/1380471 Bug 1380471 - Check for timeouts when getting live availability commit 803dbc93d70b5b9136ebb2440a879eff621340ee Author: Josejulio Martínez <jmartine> Date: Tue Aug 22 13:30:53 2017 -0500 Bug 1380471 - Check for timeouts when getting live availability - Set availability to unknown on timeout Moving to ON_QA. JON 3.3.9 CR01 artifacts are available for test from here: http://download.eng.bos.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/135/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip *Note: jon-server-patch-3.3.0.GA.zip maps to CR01 build of jon-server-3.3.0.GA-update-09.zip. Verified Version : 3.3.0.GA Update 09 Build Number : fcb34f1:80f74f5 Created attachment 1330473 [details]
screen-shot
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2846 |