User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.2) Gecko/2008091816 Red Hat/3.0.2-3.el5 Firefox/3.0.2 On February 17th GSS notified IT that they were not receiving notifications from Issue Tracker. Nathan Lugert determined that the Issue Tracker componet Gatekeeper had been down since Feb 15th by reviewing the logs. According to Nathan: "Gatekeepers connection to the database was terminated and needed a JBoss restart to re-initialize the connection. Currently GK uses JBPM hibernate session which relies on the JBPM session. For unknown reasons this session dies and cannot re-establish connection without a server bounce. We are using a very old version of JBPM which may be part of the issue." On Feb. 25th, Grant Shipley advocates addressing the addition of monitoring using the existing Engineering Support tools at the next available Governance Board meeting. -- Sounds good. Tom, can you bring this to the Gov board next time we meet. I am anxious to get monitoring in place for these apps. -- grant > On Feb 24, 2009, at 9:59 AM, Grant Shipley wrote: > > > I think we would just use the standard Engineering Support / CIS > > monitoring system in place. Reproducible: Sometimes Steps to Reproduce: 1. Has failed intermittently and unexpectedly 2. For unknown reasons the JbPM session dies and cannot re-establish connection without a server bounce Actual Results: Will only occur intermittently Expected Results: Should be notification to Engineering Support prior to jBPM failing, or immediately after failure. IT must know about failure of critical apps prior to failure so that we can be proactive in response and not reactive.
Approved. Moving to XPlanner backlog.
Followup from Marek Mahut <is-ops-tickets>: --------------------------------- Nagios is checking at the moment if sendmail and apache are working correctly on cspserverers. I'll forward your ticket to ES, maybe they can suggest/build a way to monitor performance. ---------------------------------
Followup from Greg Blomquist <it-support>: --------------------------------- I have to reject this one... The offending message in the log is apparently an artifact of the nagios smtp check. I'm not sure what we can really do about that. As long as it actually alerts when SMTP is down (which I'm assuming that it does), then it appears to be working as designed. The check_smtp module in nagios appears to be part of the nagios-plugins package. So, it's not something we maintain or even modify. ---------------------------------