Description of problem: I've noticed that after idling and often releases, my JBoss application is busted: http://jenkins-cloudydemo.rhcloud.com/ Currently, it returns a 503. How reproducible: Fairly regularly - just try and access that application. Steps to Reproduce: 1. Create a JBoss application 2. Wait a few days (past an idle boundary or a release) 3. Access the application and see if get a 503. Actual results: Application returns a 503 Expected results: Application is accessible (maybe after a pause for un-idling)
Since we can't see the logs this is an educated guess. The apps were returning a 404 which indicates that the application(s) (e.g. ROOT.war) did not deploy. We had seen similar problems in the past when we increased the timeout from 60 to 120s. It is now 300s. Mike and I are guessing that because so many instances are being restarted at once, each instance is getting less resources than normal and the application deployment is slower than normal causing the deployment to fail and rollback. The new timeout will only impact new JBoss instances. We'd have to create a new xslt to update existing instances. This is risky. I think documentation is our best option. Ideally we'd have several JBoss instances that we control in production that we can monitor during an upgrade so we can see the logs. Perhaps we should create a US to create a dozen production accounts with multiple JBoss instances and non-default applications? I have one instance that I have and seemed fine - I was running my JUDCon demo on there yesterday.
The JBoss application (my) came up fine post-migration and is up and running without issue. The Jenkins application (jenkins) was down. Was stopped 4/9/2012 and never restarted.
Re-test this bug on current stage env (2012-6-8), this bug still reproduced. 1. Make sure my app is in idle status curl -k -X GET -H 'Accept: application/xml' --user jialiu+1:214214 https://stg.openshift.redhat.com/broker/rest/domains/jialiu1/applications/jenkins/gear_groups <?xml version="1.0" encoding="UTF-8"?> <response> <data> <gear-group> <cartridges> <cartridge> <name>jenkins-1.4</name> </cartridge> </cartridges> <gears> <gear> <state>idle</state> <id>14a8c2adc2b64658897a1db6f3b7be08</id> </gear> </gears> <name>@@app/cart-jenkins-1.4</name> <gear-profile>small</gear-profile> </gear-group> </data> <type>gear_groups</type> <messages/> <supported-api-versions> <supported-api-version>1.0</supported-api-version> </supported-api-versions> <version>1.0</version> <status>ok</status> </response> 2. Access app url - jenkins-jialiu1.stg.rhcloud.com ]$ curl -k https://jenkins-jialiu1.stg.rhcloud.com/ <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>503 Service Temporarily Unavailable</title> </head><body> <h1>Service Temporarily Unavailable</h1> <p>The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.</p> <hr> <address>Apache/2.2.15 (Red Hat) Server at jenkins-jialiu1.stg.rhcloud.com Port 443</address> </body></html> Even I access it for serveral times, my app still can not be woke up.
Fixed with updated deploy_httpd_proxy.sh for jenkins cart. Also added migrate-jenkins-httpdproxy script to update existing apps
Crrently the latest devenv build is devenv_1827, the fix patch is not integrated into this intance, so keep this bug in "ON_QA" status, once we get new instance that integrate the fix patch, will verify this bug soon.
Verified this bug on devenv-stage_217, and PASS. After idle jenkins app, access this app will wake it up. I also check my app on stage env (http://jenkins-jialiu1.stg.rhcloud.com/), now it comes back.
But my idle jenkins app in INT is still in 503 status, is this fix pulled in INT? my jenkins app: https://jenkins-domint1.int.rhcloud.com/
Hey Xiaoli, sorry, I really don't know what was in the candidate build. Adam, is this something you can look into?
I think this was a timing issue for what made it into INT and what didn't. The INT push that is happening today should fix this.
My idle jenkins https://jenkins-domint1.int.rhcloud.com/ is still return 503: [jenkins-domint1.int.rhcloud.com runtime]\> cat .state idle May have not run one of the migrate scripts which will fix the existing idle jenkins app.
There is a migrate script that does need to be run to correct the problem on existing jenkins apps - li/misc/maintenance/bin/migrate-jenkins-httpdproxy. New jenkins applications should contain the fix.
(In reply to comment #11) > There is a migrate script that does need to be run to correct the problem on > existing jenkins apps - li/misc/maintenance/bin/migrate-jenkins-httpdproxy. > New jenkins applications should contain the fix. Hi, Bill Yeah, I know that, but the problem is this script may have not been run on INT, so need Thomas to help to run it in INT, let me needinfo him Thanks
Sorry about that, I didn't realize it needed to be run in INT. I've now run migrate-jenkins-httpdproxy in INT and it seemed to complete successfully. I've sent the migration logs to Bill for him to examine to see if it was truly successful.
12 jenkins applications have been migrated in INT. Thanks Thomas.
(In reply to comment #14) > 12 jenkins applications have been migrated in INT. Thanks Thomas. OK, Now my idle jenkins server come back, thanks all.