Bug 622468 - RFE: be able to restart the agent VM on OOM or core dump
Summary: RFE: be able to restart the agent VM on OOM or core dump
Status: NEW
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: 4.0.0
Hardware: All
OS: All
low vote
Target Milestone: ---
: ---
Assignee: John Mazzitelli
QA Contact: Mike Foley
Depends On:
TreeView+ depends on / blocked
Reported: 2010-08-09 13:21 UTC by John Mazzitelli
Modified: 2014-06-18 21:07 UTC (History)
2 users (show)

Clone Of:
Last Closed:

Attachments (Terms of Use)

Description John Mazzitelli 2010-08-09 13:21:45 UTC
Description of problem:

Sometimes the agent my core dump (e.g. due to an error in a third party native library that causes a segfault) or may run out of memory (e.g. OutOfMemoryError).

It would be nice to have the agent VM restart whenever it encounters these. The SUN JVM has these options to easily enable this:

-XX:OnError="<cmd args>;<cmd args>"	Run user-defined commands on fatal error. (Introduced in 1.4.2 update 9.)

-XX:OnOutOfMemoryError="<cmd args>;
<cmd args>" 	Run user-defined commands when an OutOfMemoryError is first thrown. (Introduced in 1.4.2 update 12, 6) 

All we'd need to do is add these arguments to the VM startup command in rhq-agent.bat, rhq-agent.sh.

The only difficulty would be in determining what command to invoke - do we need to pass in the full VM command line (from "java" through to all VM opts and cmdlinea args again?). To make it easy, I say we just start the VM using the service wrapper script - rhq-agent-wrapper.bat/sh

Note that on Windows, we may already have configured the Java Service Wrapper to restart the agent when OutOfMemoryError messages are spit out - if these VM options work as advertised, we can unconfigure JSW and rely on the VM itself to restart itself.

Comment 1 Jay Shaughnessy 2014-06-18 21:07:05 UTC
We have the VM check thread, but this is different, this is actually having the VM restart on OOM crashes.

Note You need to log in before you can comment on or make changes to this bug.