Bug 543888 - JON server may drop collected metric data after outage
Summary: JON server may drop collected metric data after outage
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: RHQ Project
Classification: Other
Component: Database
Version: unspecified
Hardware: All
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: RHQ Project Maintainer
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: rhq_triage
TreeView+ depends on / blocked
 
Reported: 2009-12-03 12:31 UTC by Rodrigo A B Freire
Modified: 2010-05-18 13:30 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2010-05-18 13:30:56 UTC
Embargoed:


Attachments (Terms of Use)
JON screenshot (160.46 KB, image/png)
2009-12-03 12:43 UTC, Rodrigo A B Freire
no flags Details
JON Sysout showing the shutdown, startup and the errors. (11.04 KB, application/zip)
2009-12-03 12:45 UTC, Rodrigo A B Freire
no flags Details

Description Rodrigo A B Freire 2009-12-03 12:31:29 UTC
Description of problem:
The JON server may drop agent collected metric data after server outage.

Version-Release number of selected component (if applicable):
JON 2.3.0GA

How reproducible:
Hardly. 1st time issue. Posting it to bring to attention.

Steps to Reproduce:
1. On Administration, System Settings, Config, set "AgentMax Quiet Time Allowed" to 4000 minutes
2. Leave the agent on some remote system, collecting the data
3. Turn off the JON server, normally, non-disruptive shutdown (and its database)
4. After 10h, bring the server back on again.
  
Actual results:
When the agent sent the metrics merge data, some data was lost.

Expected results:
The full data should be merged.

Additional info:
This is an quite unusual setup (this setup is an POC).
The JON server works on my work notebook (which is online only from 9h00 - 18h00) and in this interim, an remote server send its metric to my machine. It's connected to the internet via an cellphone 3g connection, which may fail some times in the day. The solution for the remote server connecting to the jon server is use an dynamic DNS host. By the way, it work perfectly. However today, that was an gap between 5 AM untill the time I put the server back on. Notice that JON merged correctly data from 6 PM, the time the server went down, untill 5 AM. Then, the server threw some spurious SQL errors.
See attached agent.log and printscreen.

Comment 1 Rodrigo A B Freire 2009-12-03 12:43:11 UTC
Created attachment 375756 [details]
JON screenshot

This picture shows the approx. time when the server was shutdown and when the merge was stopped

Comment 2 Rodrigo A B Freire 2009-12-03 12:45:09 UTC
Created attachment 375757 [details]
JON Sysout showing the shutdown, startup and the errors.

Notice that the read errors are due to the adverse condition of the 3g cellphone connection.

Comment 3 wes hayutin 2010-02-16 16:59:13 UTC
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs.

keyword:
new = Tracking + FutureFeature + SubBug

Comment 4 wes hayutin 2010-02-16 17:04:01 UTC
making sure we're not missing any bugs in rhq_triage

Comment 5 Charles Crouch 2010-05-18 03:35:17 UTC
Rodrigo, this looks like a problem with connecting to the JON database...

2009-12-03 09:33:30,820 WARN  [org.jboss.resource.connectionmanager.TxConnectionManager] Connection error occured: org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@1b5b88c[state=NORMAL mc=org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@67946a handles=1 lastUse=1259839366711 permit=true trackByTx=true mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$OnePool@923ce0 context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@dacdcf xaResource=org.jboss.resource.connectionmanager.xa.JcaXAResourceWrapper@7618f8 txSync=null]
java.sql.BatchUpdateException: Batch entry 0 INSERT  /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R02(schedule_id,time_stamp,value) VALUES(17548,1259823971366,9.5727616E7) was aborted.  Call getNextException to see the cause.
	at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2531)


If you can provide steps that reproduce this issue, we can take a look otherwise I'm going to close it.
Thanks

Comment 6 Rodrigo A B Freire 2010-05-18 13:08:15 UTC
Hi Charles

I no longer support the environment and have no conditions to reproduce the error, so feel free to close the issue.

Huge thanks

- RF


Note You need to log in before you can comment on or make changes to this bug.