Bug 543888

Summary: JON server may drop collected metric data after outage
Product: [Other] RHQ Project Reporter: Rodrigo A B Freire <rfreire>
Component: DatabaseAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: unspecifiedCC: ccrouch
Target Milestone: ---Keywords: SubBug
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-05-18 09:30:56 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 565628    
Attachments:
Description Flags
JON screenshot
none
JON Sysout showing the shutdown, startup and the errors. none

Description Rodrigo A B Freire 2009-12-03 07:31:29 EST
Description of problem:
The JON server may drop agent collected metric data after server outage.

Version-Release number of selected component (if applicable):
JON 2.3.0GA

How reproducible:
Hardly. 1st time issue. Posting it to bring to attention.

Steps to Reproduce:
1. On Administration, System Settings, Config, set "AgentMax Quiet Time Allowed" to 4000 minutes
2. Leave the agent on some remote system, collecting the data
3. Turn off the JON server, normally, non-disruptive shutdown (and its database)
4. After 10h, bring the server back on again.
  
Actual results:
When the agent sent the metrics merge data, some data was lost.

Expected results:
The full data should be merged.

Additional info:
This is an quite unusual setup (this setup is an POC).
The JON server works on my work notebook (which is online only from 9h00 - 18h00) and in this interim, an remote server send its metric to my machine. It's connected to the internet via an cellphone 3g connection, which may fail some times in the day. The solution for the remote server connecting to the jon server is use an dynamic DNS host. By the way, it work perfectly. However today, that was an gap between 5 AM untill the time I put the server back on. Notice that JON merged correctly data from 6 PM, the time the server went down, untill 5 AM. Then, the server threw some spurious SQL errors.
See attached agent.log and printscreen.
Comment 1 Rodrigo A B Freire 2009-12-03 07:43:11 EST
Created attachment 375756 [details]
JON screenshot

This picture shows the approx. time when the server was shutdown and when the merge was stopped
Comment 2 Rodrigo A B Freire 2009-12-03 07:45:09 EST
Created attachment 375757 [details]
JON Sysout showing the shutdown, startup and the errors.

Notice that the read errors are due to the adverse condition of the 3g cellphone connection.
Comment 3 wes hayutin 2010-02-16 11:59:13 EST
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs.

keyword:
new = Tracking + FutureFeature + SubBug
Comment 4 wes hayutin 2010-02-16 12:04:01 EST
making sure we're not missing any bugs in rhq_triage
Comment 5 Charles Crouch 2010-05-17 23:35:17 EDT
Rodrigo, this looks like a problem with connecting to the JON database...

2009-12-03 09:33:30,820 WARN  [org.jboss.resource.connectionmanager.TxConnectionManager] Connection error occured: org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@1b5b88c[state=NORMAL mc=org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@67946a handles=1 lastUse=1259839366711 permit=true trackByTx=true mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$OnePool@923ce0 context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@dacdcf xaResource=org.jboss.resource.connectionmanager.xa.JcaXAResourceWrapper@7618f8 txSync=null]
java.sql.BatchUpdateException: Batch entry 0 INSERT  /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R02(schedule_id,time_stamp,value) VALUES(17548,1259823971366,9.5727616E7) was aborted.  Call getNextException to see the cause.
	at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2531)


If you can provide steps that reproduce this issue, we can take a look otherwise I'm going to close it.
Thanks
Comment 6 Rodrigo A B Freire 2010-05-18 09:08:15 EDT
Hi Charles

I no longer support the environment and have no conditions to reproduce the error, so feel free to close the issue.

Huge thanks

- RF