Red Hat Bugzilla – Bug 1028623
Measurement data points can be supplied to condition processing in different order than generated on agent
Last modified: 2014-12-11 09:00:17 EST
Description of problem:
When measurement data comes in to the server, the alert condition processing relies on the order to be strictly the same as generated on the agent side. I.e. no datapoint should be processed before another that was detected prior to it.
The persistence to the Cassandra storage node is asynchronous and done on datapoint-by-datapoint basis. This means that the datapoints can be persisted in different order than the original order.
When supplying the data to alert condition processing, we use the order of datapoints in which they were persisted, not the one they were created on the agent.
This can lead to subtle bugs in alerting like false positive or negative dampening events.
Version-Release number of selected component (if applicable):
hardly, can be best seen in debugger
Steps to Reproduce:
1. Have a server, agent and some enabled metric schedules for some inventoried resources
2. Once a measurement report comes in, check the order of elements in the input parameter of the MeasurementDataManagerBean#addNumericData() method.
3. Check the order of elements of the data passed to the condition processing in the onFinish() callback inside addNumericData()
The order sometimes differ, depending on the order cassandra cluster happened to store the events
The order of elements passed to the condition processing should be the same as the one coming from agents.
The upstream fix was committed a while back. Moving to ON_QA but this code-level change is not easily testable. Suggest to just set closed/currentrelease or something like that.
This is almost impossible to QE. Verified by code inspection.