Bug 1147097 - The server needs to handle failures inserting raw data
Summary: The server needs to handle failures inserting raw data
Keywords:
Status: ASSIGNED
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server, Monitoring, Storage Node
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHQ 4.13
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1147098
TreeView+ depends on / blocked
 
Reported: 2014-09-26 21:50 UTC by John Sanda
Modified: 2022-03-31 04:27 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
: 1147098 (view as bug list)
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description John Sanda 2014-09-26 21:50:28 UTC
Description of problem:
MetricsServer.addNumericData is the method responsible for storing raw metrics. If an error occurs, we log a warning and simply drop the data point(s) for which the error(s) occurred. Prior to RHQ 4.9, if errors occurred, the server threw an exception, and the agent would resend the report at some point in the future.

We could throw an exception as was done prior to RHQ 4.9, but that could be inefficient. Suppose we are insert 10,000 data points. An error occurs trying to insert the last one. We would incur the network I/O overhead of sending the whole measurement report back and forth again. Then we would re-insert data that has already been successfully been inserted.

A better approach would be to just let the server handle it. We log any data that we fail to insert. The log should be stored to disk so that we do not lose data across restarts. At some point in the future, after the failure(s), we go through the log and retry inserting data. This will help us better handle bursts in traffic after agents with lots of spooled measurement reports reconnect to the server for example.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


Note You need to log in before you can comment on or make changes to this bug.