Description of problem: Background requests made to the broker from the console are performed under a hard coded timeout. A customer recently hit this timeout when requesting the /console/applications page. The user hitting the timeout was a member of many domains, each with many gears. This high number of gears/domains appears to be causing the background requests to take over 10 seconds. Below is the error received: -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- 2014-09-08 15:55:09.410 [INFO ] Completed 500 Internal Server Error in 10002ms (pid:5886) 2014-09-08 15:55:09.414 [FATAL] AsyncAware::ThreadTimedOut (The thread #<Thread:0x007f605c2997f8 dead> (index=1) did not complete within 10 seconds.): openshift-origin-console (1.23.4.4) app/controllers/async_aware.rb:38:in `block in join' openshift-origin-console (1.23.4.4) app/controllers/async_aware.rb:34:in `map' openshift-origin-console (1.23.4.4) app/controllers/async_aware.rb:34:in `join' openshift-origin-console (1.23.4.4) app/controllers/async_aware.rb:47:in `join!' openshift-origin-console (1.23.4.4) app/controllers/applications_controller.rb:85:in `index' -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- The code that was hit: -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- def index if params[:test] @applications = Fixtures::Applications.list @domains = Fixtures::Applications.list_domains (@applications + @domains).each{ |d| d.send(:api_identity_id=, '2') } else async{ @applications = Application.find :all, :as => current_user, :params => {:include => :cartridges} } async{ @domains = user_domains } --> join!(10) end -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- The 10 second timeout is hardcoded. Usually, if this timeout is hit, there is an issue with the broker, which navigating to the broker endpoints (such as /broker/rest/domain//applications?include=cartridges) would have revealed. In this particular case, there were no errors on the broker side. The logs are clean and all endpoints (eventually) return successfully. The call simply takes longer due to the high number of domains/gears that must be presented. Another factor that we believe is causing this, is breaking up an openshift environment into numerous datacenters, which is now supported. This customer in particular splits their environment across 2 datacenters across the US. If the broker ends up making an mcollective request to one of several activemq instances that are in another datacenter, these broker calls could take even longer. Version-Release number of selected component (if applicable): 2.1
Investigate introducing a config value at the broker for this. PRs welcome.
https://github.com/openshift/origin-server/pull/5823
http://etherpad.corp.redhat.com/puddle-2-2-2014-10-07
Check on puddle [2.2/2014-10-07.2] 1. Create some apps #for i in {1..20};do rhc app create testapp$i jbossews-1 -s --no-git ; rhc cartridge scale jbossews-1 -a testapp$i --min 3;done 2. Set BACKGROUND_REQUEST_TIMEOUT to 1 #vim /etc/openshift/console.conf <--snip--> #RED_HAT_ACCOUNT_URL=https://www.redhat.com/wapps/ugc #CONTACT_MAIL=openshift BACKGROUND_REQUEST_TIMEOUT=1 <--snip--> 3. Request the /console/applications <--snip--> The thread #<Thread:0x00000004d67000 dead> (index=0) did not complete within 1 seconds. <--snip--> 4. Set BACKGROUND_REQUEST_TIMEOUT to 20 5. Request the /console/applications In the step 5, list all the applications successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2014-1796.html
*** Bug 1108246 has been marked as a duplicate of this bug. ***