829776 – Jenkins cartridge (and Java cartridges in general) not responding

Bug 829776 - Jenkins cartridge (and Java cartridges in general) not responding

Summary: Jenkins cartridge (and Java cartridges in general) not responding

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bill DeCoste
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-06-07 14:17 UTC by Matt Hicks
Modified:	2015-05-14 22:55 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-06-25 18:26:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Matt Hicks 2012-06-07 14:17:50 UTC

Description of problem:
I've noticed that after idling and often releases, my JBoss application is busted:

http://jenkins-cloudydemo.rhcloud.com/

Currently, it returns a 503.

How reproducible:

Fairly regularly - just try and access that application.


Steps to Reproduce:
1. Create a JBoss application
2. Wait a few days (past an idle boundary or a release)
3. Access the application and see if get a 503.
  
Actual results:

Application returns a 503

Expected results:

Application is accessible (maybe after a pause for un-idling)

Comment 1 Bill DeCoste 2012-06-07 14:38:53 UTC

Since we can't see the logs this is an educated guess. The apps were returning a 404 which indicates that the application(s) (e.g. ROOT.war) did not deploy. We had seen similar problems in the past when we increased the timeout from 60 to 120s. It is now 300s. Mike and I are guessing that because so many instances are being restarted at once, each instance is getting less resources than normal and the application deployment is slower than normal causing the deployment to fail and rollback.

The new timeout will only impact new JBoss instances. We'd have to create a new xslt to update existing instances. This is risky.

I think documentation is our best option. Ideally we'd have several JBoss instances that we control in production that we can monitor during an upgrade so we can see the logs. Perhaps we should create a US to create a dozen production accounts with multiple JBoss instances and non-default applications? I have one instance that I have and seemed fine - I was running my JUDCon demo on there yesterday.

Comment 2 Bill DeCoste 2012-06-07 16:02:48 UTC

The JBoss application (my) came up fine post-migration and is up and running without issue.

The Jenkins application (jenkins) was down. Was stopped 4/9/2012 and never restarted.

Comment 3 Johnny Liu 2012-06-08 07:21:16 UTC

Re-test this bug on current stage env (2012-6-8), this bug still reproduced.

1. Make sure my app is in idle status
curl -k -X GET -H 'Accept: application/xml' --user jialiu+1:214214 https://stg.openshift.redhat.com/broker/rest/domains/jialiu1/applications/jenkins/gear_groups
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <data>
    <gear-group>
      <cartridges>
        <cartridge>
          <name>jenkins-1.4</name>
        </cartridge>
      </cartridges>
      <gears>
        <gear>
          <state>idle</state>
          <id>14a8c2adc2b64658897a1db6f3b7be08</id>
        </gear>
      </gears>
      <name>@@app/cart-jenkins-1.4</name>
      <gear-profile>small</gear-profile>
    </gear-group>
  </data>
  <type>gear_groups</type>
  <messages/>
  <supported-api-versions>
    <supported-api-version>1.0</supported-api-version>
  </supported-api-versions>
  <version>1.0</version>
  <status>ok</status>
</response>

2. Access app url - jenkins-jialiu1.stg.rhcloud.com

]$ curl -k https://jenkins-jialiu1.stg.rhcloud.com/
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Temporarily Unavailable</title>
</head><body>
<h1>Service Temporarily Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
<hr>
<address>Apache/2.2.15 (Red Hat) Server at jenkins-jialiu1.stg.rhcloud.com Port 443</address>
</body></html>


Even I access it for serveral times, my app still can not be woke up.

Comment 4 Bill DeCoste 2012-06-09 00:07:13 UTC

Fixed with updated deploy_httpd_proxy.sh for jenkins cart. Also added migrate-jenkins-httpdproxy script to update existing apps

Comment 5 Johnny Liu 2012-06-11 05:53:05 UTC

Crrently the latest devenv build is devenv_1827, the fix patch is not integrated into this intance, so keep this bug in "ON_QA" status, once we get new instance that integrate the fix patch, will verify this bug soon.

Comment 6 Johnny Liu 2012-06-14 07:52:50 UTC

Verified this bug on devenv-stage_217, and PASS.
After idle jenkins app, access this app will wake it up.

I also check my app on stage env (http://jenkins-jialiu1.stg.rhcloud.com/), now it comes back.

Comment 7 Xiaoli Tian 2012-06-14 08:43:02 UTC

But my idle jenkins app in INT is still in 503 status, is this fix pulled in INT?

my jenkins app: https://jenkins-domint1.int.rhcloud.com/

Comment 8 Thomas Wiest 2012-06-14 14:02:00 UTC

Hey Xiaoli, sorry, I really don't know what was in the candidate build.

Adam, is this something you can look into?

Comment 9 Adam Miller 2012-06-15 15:23:26 UTC

I think this was a timing issue for what made it into INT and what didn't. The INT push that is happening today should fix this.

Comment 10 Xiaoli Tian 2012-06-17 11:15:39 UTC

My idle jenkins https://jenkins-domint1.int.rhcloud.com/ is still return 503:
[jenkins-domint1.int.rhcloud.com runtime]\> cat .state 
idle

May have not run one of the migrate scripts which will fix the existing idle jenkins app.

Comment 11 Bill DeCoste 2012-06-18 15:13:22 UTC

There is a migrate script that does need to be run to correct the problem on existing jenkins apps - li/misc/maintenance/bin/migrate-jenkins-httpdproxy. New jenkins applications should contain the fix.

Comment 12 Xiaoli Tian 2012-06-19 11:03:41 UTC

(In reply to comment #11)
> There is a migrate script that does need to be run to correct the problem on
> existing jenkins apps - li/misc/maintenance/bin/migrate-jenkins-httpdproxy.
> New jenkins applications should contain the fix.

Hi, Bill

Yeah, I know that, but the problem is this script may have not been run on INT, so need Thomas to help to run it in INT, let me needinfo him

Thanks

Comment 13 Thomas Wiest 2012-06-19 15:21:41 UTC

Sorry about that, I didn't realize it needed to be run in INT.

I've now run migrate-jenkins-httpdproxy in INT and it seemed to complete successfully.

I've sent the migration logs to Bill for him to examine to see if it was truly successful.

Comment 14 Bill DeCoste 2012-06-19 15:29:39 UTC

12 jenkins applications have been migrated in INT. Thanks Thomas.

Comment 15 Xiaoli Tian 2012-06-20 06:18:42 UTC

(In reply to comment #14)
> 12 jenkins applications have been migrated in INT. Thanks Thomas.

OK, Now my idle jenkins server come back, thanks all.

Note You need to log in before you can comment on or make changes to this bug.