Bug 622468

Summary: RFE: be able to restart the agent VM on OOM or core dump
Product: [Other] RHQ Project Reporter: John Mazzitelli <mazz>
Component: AgentAssignee: John Mazzitelli <mazz>
Status: NEW --- QA Contact: Mike Foley <mfoley>
Severity: low Docs Contact:
Priority: low    
Version: 4.0.0CC: hrupp, jshaughn
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description John Mazzitelli 2010-08-09 09:21:45 EDT
Description of problem:

Sometimes the agent my core dump (e.g. due to an error in a third party native library that causes a segfault) or may run out of memory (e.g. OutOfMemoryError).

It would be nice to have the agent VM restart whenever it encounters these. The SUN JVM has these options to easily enable this:

-XX:OnError="<cmd args>;<cmd args>"	Run user-defined commands on fatal error. (Introduced in 1.4.2 update 9.)

-XX:OnOutOfMemoryError="<cmd args>;
<cmd args>" 	Run user-defined commands when an OutOfMemoryError is first thrown. (Introduced in 1.4.2 update 12, 6) 

All we'd need to do is add these arguments to the VM startup command in rhq-agent.bat, rhq-agent.sh.

The only difficulty would be in determining what command to invoke - do we need to pass in the full VM command line (from "java" through to all VM opts and cmdlinea args again?). To make it easy, I say we just start the VM using the service wrapper script - rhq-agent-wrapper.bat/sh

Note that on Windows, we may already have configured the Java Service Wrapper to restart the agent when OutOfMemoryError messages are spit out - if these VM options work as advertised, we can unconfigure JSW and rely on the VM itself to restart itself.
Comment 1 Jay Shaughnessy 2014-06-18 17:07:05 EDT
We have the VM check thread, but this is different, this is actually having the VM restart on OOM crashes.