Red Hat Bugzilla – Bug 543888
JON server may drop collected metric data after outage
Last modified: 2010-05-18 09:30:56 EDT
Description of problem:
The JON server may drop agent collected metric data after server outage.
Version-Release number of selected component (if applicable):
Hardly. 1st time issue. Posting it to bring to attention.
Steps to Reproduce:
1. On Administration, System Settings, Config, set "AgentMax Quiet Time Allowed" to 4000 minutes
2. Leave the agent on some remote system, collecting the data
3. Turn off the JON server, normally, non-disruptive shutdown (and its database)
4. After 10h, bring the server back on again.
When the agent sent the metrics merge data, some data was lost.
The full data should be merged.
This is an quite unusual setup (this setup is an POC).
The JON server works on my work notebook (which is online only from 9h00 - 18h00) and in this interim, an remote server send its metric to my machine. It's connected to the internet via an cellphone 3g connection, which may fail some times in the day. The solution for the remote server connecting to the jon server is use an dynamic DNS host. By the way, it work perfectly. However today, that was an gap between 5 AM untill the time I put the server back on. Notice that JON merged correctly data from 6 PM, the time the server went down, untill 5 AM. Then, the server threw some spurious SQL errors.
See attached agent.log and printscreen.
Created attachment 375756 [details]
This picture shows the approx. time when the server was shutdown and when the merge was stopped
Created attachment 375757 [details]
JON Sysout showing the shutdown, startup and the errors.
Notice that the read errors are due to the adverse condition of the 3g cellphone connection.
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs.
new = Tracking + FutureFeature + SubBug
making sure we're not missing any bugs in rhq_triage
Rodrigo, this looks like a problem with connecting to the JON database...
2009-12-03 09:33:30,820 WARN [org.jboss.resource.connectionmanager.TxConnectionManager] Connection error occured: org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@1b5b88c[state=NORMAL mc=org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@67946a handles=1 lastUse=1259839366711 permit=true trackByTx=true mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$OnePool@923ce0 context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@dacdcf xaResource=org.jboss.resource.connectionmanager.xa.JcaXAResourceWrapper@7618f8 txSync=null]
java.sql.BatchUpdateException: Batch entry 0 INSERT /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R02(schedule_id,time_stamp,value) VALUES(17548,1259823971366,9.5727616E7) was aborted. Call getNextException to see the cause.
If you can provide steps that reproduce this issue, we can take a look otherwise I'm going to close it.
I no longer support the environment and have no conditions to reproduce the error, so feel free to close the issue.