488972 – Availability Monitoring needed for Issue Tracker: Gatekeeper/ (Hibernate)

Bug 488972 - Availability Monitoring needed for Issue Tracker: Gatekeeper/ (Hibernate)

Summary: Availability Monitoring needed for Issue Tracker: Gatekeeper/ (Hibernate)

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Issue-Tracker
Classification:	Retired
Component:	Performance
Sub Component:
Version:	MR10
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Issue-Tracker Bug Watch List
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	491699
Blocks:
TreeView+	depends on / blocked

Reported:	2009-03-06 15:52 UTC by Tom Mirc
Modified:	2009-04-23 14:44 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Clones:	496690 (view as bug list)
Environment:
Last Closed:	2009-04-20 18:32:37 UTC
Embargoed:

Attachments	(Terms of Use)

Description Tom Mirc 2009-03-06 15:52:32 UTC

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.2) Gecko/2008091816 Red Hat/3.0.2-3.el5 Firefox/3.0.2

On February 17th GSS notified IT that they were not receiving notifications from Issue Tracker.  Nathan Lugert determined that the Issue Tracker componet Gatekeeper had been down since Feb 15th by reviewing the logs.  According to Nathan:

"Gatekeepers connection to the database was terminated and needed a JBoss
restart to re-initialize the connection. Currently GK uses JBPM
hibernate session which relies on the JBPM session. For unknown reasons
this session dies and cannot re-establish connection without a server
bounce. We are using a very old version of JBPM which may be part of the
issue."   

On Feb. 25th, Grant Shipley advocates addressing the addition of monitoring using the existing Engineering Support tools at the next available Governance Board meeting.
--
Sounds good.  Tom, can you bring this to the Gov board next time we meet.  I am anxious to get monitoring in place for these apps.
--
grant

> On Feb 24, 2009, at 9:59 AM, Grant Shipley wrote:
>
> > I think we would just use the standard Engineering Support / CIS  
> > monitoring system in place.

Reproducible: Sometimes

Steps to Reproduce:
1. Has failed intermittently and unexpectedly
2. For unknown reasons the JbPM session dies and cannot re-establish connection without a server bounce

Actual Results:  
Will only occur intermittently

Expected Results:  
Should be notification to Engineering Support prior to jBPM failing, or immediately after failure.  IT must know about failure of critical apps prior to failure so that we can be proactive in response and not reactive.

Comment 1 Grant Shipley 2009-03-12 18:53:09 UTC

Approved.  Moving to XPlanner backlog.

Comment 2 Lisa Lu 2009-04-20 15:16:15 UTC

Followup from Marek Mahut <is-ops-tickets>:

---------------------------------
Nagios is checking at the moment if sendmail and apache are working
correctly on cspserverers. I'll forward your ticket to ES, maybe they
can suggest/build a way to monitor performance.
---------------------------------

Comment 3 Lisa Lu 2009-04-20 15:20:52 UTC

Followup from Greg Blomquist <it-support>:

---------------------------------
I have to reject this one...

The offending message in the log is apparently an artifact of the nagios
smtp check.  I'm not sure what we can really do about that.  As long as
it actually alerts when SMTP is down (which I'm assuming that it does),
then it appears to be working as designed.

The check_smtp module in nagios appears to be part of the nagios-plugins
package.  So, it's not something we maintain or even modify.

---------------------------------

Note You need to log in before you can comment on or make changes to this bug.