Description of problem: When measurement data comes in to the server, the alert condition processing relies on the order to be strictly the same as generated on the agent side. I.e. no datapoint should be processed before another that was detected prior to it. The persistence to the Cassandra storage node is asynchronous and done on datapoint-by-datapoint basis. This means that the datapoints can be persisted in different order than the original order. When supplying the data to alert condition processing, we use the order of datapoints in which they were persisted, not the one they were created on the agent. This can lead to subtle bugs in alerting like false positive or negative dampening events. Version-Release number of selected component (if applicable): JON 3.2.0.ER4 How reproducible: hardly, can be best seen in debugger Steps to Reproduce: 1. Have a server, agent and some enabled metric schedules for some inventoried resources 2. Once a measurement report comes in, check the order of elements in the input parameter of the MeasurementDataManagerBean#addNumericData() method. 3. Check the order of elements of the data passed to the condition processing in the onFinish() callback inside addNumericData() Actual results: The order sometimes differ, depending on the order cassandra cluster happened to store the events Expected results: The order of elements passed to the condition processing should be the same as the one coming from agents. Additional info:
The upstream fix was committed a while back. Moving to ON_QA but this code-level change is not easily testable. Suggest to just set closed/currentrelease or something like that.
This is almost impossible to QE. Verified by code inspection.