Description of problem: The agent crashes shortly after starting it up. I've attached the core dump file (hs_err_pid file) as well as the agent.log file Version-Release number of selected component (if applicable): 3.0.0 beta How reproducible: Very Steps to Reproduce: 1. Build RHQ with all community plugins enabled 2. Start agent 3. Import resources using AD portlet Actual results: Agent crashes with core dump Expected results: Agent remains standing Additional info: I spoke to Ian and Mazz about this issue, and there is reason to believe these crashes are either coming from the virt plugin or the augeas plugin. If the issue is with the augeas plugin, that needs to be pinpointed soon because the Apache resource now using augeas for the configuration facet.
Created attachment 399117 [details] core dump file
Created attachment 399118 [details] agent log file
The core dump identified the libjvm.so as the place of the crash, even though that doesn't tell much. I see that you are using JRE 6.0_18-b07 (is this a beta version?). Have you tried using different JRE/JDK?
Lukas, I believe "-bXX" is a build identifier used to denote the precise internal version that was released for the runtime environment as well as the hotspot.
FWIW, I don't seem to be having this problem in the enterprise build with openjdk lrwxrwxrwx. 1 root root 0 2010-05-13 08:22 /proc/2241/exe -> /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java*
Work with Joseph to try to reproduce.
Yes, this is an issue with Sun JDK, which is one of our supported JVMs on Linux. I would suggest testing with JRE 1.6.0_18-b07, but if you absolutely can't obtain that version the latest 6.x version will have to do. However, if this issue goes unresolved, I would propose that we put disclaimers against customers using _18. Note: there are many internet search results for "1.6.0_18-b07 crash"
Setting this back to high since I'm not aware this has been reproduced.
Dropping severity until able to reproduce
Just saw this again when rebuilding master/HEAD (commit 5911f875 @ Mon Aug 2 14:49:33 2010 -0400). Will upload the agent log and core files shortly. In this case, the crash came during an uninventory operation. Here are the messages that occurred in the server log at that time: 01:40:02,595 INFO [ResourceManagerBean] User [org.rhq.core.domain.auth.Subject[id=2,name=rhqadmin]] is marking resource [Resource[id=10001, type=Linux, key=marques-redhat, name=marques-redhat, parent=<null>, version=Linux 2.6.32.16-141.fc12.x86_64]] for asynchronous uninventory 01:40:03,206 WARN [ServerCommunicationsService] {Failed to truncate/delete spool for deleted agent [Agent[id=10001,name=marques-redhat,address=localhost,port=16163,remote-endpoint=socket://localhost:16163/?rhq.communications.connector.rhqtype=agent&numAcceptThreads=1&maxPoolSize=303&clientMaxPoolSize=304&socketTimeout=60000&enableTcpNoDelay=true&backlog=200,last-availability-report=1280986617975]] please manually remove the file: null}!!! missing resource message key=[Failed to truncate/delete spool for deleted agent [Agent[id=10001,name=marques-redhat,address=localhost,port=16163,remote-endpoint=socket://localhost:16163/?rhq.communications.connector.rhqtype=agent&numAcceptThreads=1&maxPoolSize=303&clientMaxPoolSize=304&socketTimeout=60000&enableTcpNoDelay=true&backlog=200,last-availability-report=1280986617975]] please manually remove the file: null] args=[java.lang.NullPointerException] 01:40:03,206 INFO [AgentManagerBean] Removed agent: Agent[id=10001,name=marques-redhat,address=localhost,port=16163,remote-endpoint=socket://localhost:16163/?rhq.communications.connector.rhqtype=agent&numAcceptThreads=1&maxPoolSize=303&clientMaxPoolSize=304&socketTimeout=60000&enableTcpNoDelay=true&backlog=200,last-availability-report=1280986617975]
Created attachment 436756 [details] core dump file from aug 5th 2010
Created attachment 436757 [details] agent log file from aug 5th 2010