Bug 1140289

Summary: Background requests made to the broker are done under a hard-coded timeout.
Product: OpenShift Container Platform Reporter: Timothy Williams <tiwillia>
Component: NodeAssignee: Luke Meyer <lmeyer>
Status: CLOSED ERRATA QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1.0CC: jdetiber, jialiu, jokerman, jpazdziora, libra-onpremise-devel, lmeyer, mmccomas, xiama
Target Milestone: ---Keywords: NeedsTestCase, Upstream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rubygem-openshift-origin-console-1.31.3-1.git.62.44c654c.el6op openshift-origin-console-1.16.3-1.git.420.987e52a.el6op Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1144175 (view as bug list) Environment:
Last Closed: 2014-11-03 19:54:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1144175, 1194323    
Bug Blocks:    

Description Timothy Williams 2014-09-10 15:55:51 UTC
Description of problem:

Background requests made to the broker from the console are performed under a hard coded timeout. A customer recently hit this timeout when requesting the /console/applications page. The user hitting the timeout was a member of many domains, each with many gears. This high number of gears/domains appears to be causing the background requests to take over 10 seconds.

Below is the error received:
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
2014-09-08 15:55:09.410 [INFO ] Completed 500 Internal Server Error in 10002ms (pid:5886)
2014-09-08 15:55:09.414 [FATAL] AsyncAware::ThreadTimedOut (The thread #<Thread:0x007f605c2997f8 dead> (index=1) did not complete within 10 seconds.):
  openshift-origin-console (1.23.4.4) app/controllers/async_aware.rb:38:in `block in join'
  openshift-origin-console (1.23.4.4) app/controllers/async_aware.rb:34:in `map'
  openshift-origin-console (1.23.4.4) app/controllers/async_aware.rb:34:in `join'
  openshift-origin-console (1.23.4.4) app/controllers/async_aware.rb:47:in `join!'
  openshift-origin-console (1.23.4.4) app/controllers/applications_controller.rb:85:in `index'
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-

The code that was hit:
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
  def index
    if params[:test]
      @applications = Fixtures::Applications.list
      @domains = Fixtures::Applications.list_domains
      (@applications + @domains).each{ |d| d.send(:api_identity_id=, '2') }
    else
      async{ @applications = Application.find :all, :as => current_user, :params => {:include => :cartridges} }
      async{ @domains = user_domains }
-->    join!(10)
    end
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-

The 10 second timeout is hardcoded. Usually, if this timeout is hit, there is an issue with the broker, which navigating to the broker endpoints (such as /broker/rest/domain//applications?include=cartridges) would have revealed. In this particular case, there were no errors on the broker side. The logs are clean and all endpoints (eventually) return successfully. The call simply takes longer due to the high number of domains/gears that must be presented.

Another factor that we believe is causing this, is breaking up an openshift environment into numerous datacenters, which is now supported. This customer in particular splits their environment across 2 datacenters across the US. If the broker ends up making an mcollective request to one of several activemq instances that are in another datacenter, these broker calls could take even longer.

Version-Release number of selected component (if applicable):
2.1

Comment 3 Luke Meyer 2014-09-18 15:19:17 UTC
Investigate introducing a config value at the broker for this. PRs welcome.

Comment 4 Timothy Williams 2014-09-18 21:57:48 UTC
https://github.com/openshift/origin-server/pull/5823

Comment 5 Jason DeTiberus 2014-10-08 02:36:53 UTC
http://etherpad.corp.redhat.com/puddle-2-2-2014-10-07

Comment 6 Ma xiaoqiang 2014-10-08 07:18:30 UTC
Check on puddle [2.2/2014-10-07.2]

1. Create some apps
#for i in {1..20};do rhc app create testapp$i jbossews-1 -s --no-git ; rhc cartridge scale jbossews-1 -a testapp$i --min 3;done
2. Set BACKGROUND_REQUEST_TIMEOUT to 1
#vim /etc/openshift/console.conf
<--snip-->
#RED_HAT_ACCOUNT_URL=https://www.redhat.com/wapps/ugc

#CONTACT_MAIL=openshift

BACKGROUND_REQUEST_TIMEOUT=1
<--snip-->
3. Request the /console/applications
<--snip-->
The thread #<Thread:0x00000004d67000 dead> (index=0) did not complete within 1 seconds.
<--snip-->
4. Set BACKGROUND_REQUEST_TIMEOUT to 20
5. Request the /console/applications

In the step 5, list all the applications successfully.

Comment 8 errata-xmlrpc 2014-11-03 19:54:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2014-1796.html

Comment 9 Jan Pazdziora (Red Hat) 2015-01-05 12:29:32 UTC
*** Bug 1108246 has been marked as a duplicate of this bug. ***