Description of problem: Some graphs fail to render, and the following exception is thrown. Caused by: java.lang.IllegalArgumentException: highValue (1.1431E10) is not greater than or equal to value (1.1475666666666666E10). at org.rhq.core.domain.measurement.composite.MeasurementDataNumericHighLowComposite.<init>(MeasurementDataNumericHighLowComposite.java:44) [rhq-core-domain-ejb3.jar:4.9.0] at org.rhq.server.metrics.MetricsServer.createComposites(MetricsServer.java:297) [rhq-server-metrics-4.9.0.jar:4.9.0] at org.rhq.server.metrics.MetricsServer.findDataForResource(MetricsServer.java:171) [rhq-server-metrics-4.9.0.jar:4.9.0] Version-Release number of selected component (if applicable): 4.9 How reproducible: Depending on the dataset, this can happen fairly frequently. The panel fails to render at all. It would be nice if the server caught the exception, or fixed the data so it didn't fail to show anything at all. I'm guessing the threshold is basically something that doesn't work for large numbers. if (highValue < value && Math.abs(highValue - value) > THRESHOLD) { ^^^ might want to use division not subtraction here.
Even if you fix the threshold, I'd be highly annoyed if my graph just failed to come up. Why not just have 'highValue = value' if there is a rendering issue.
I am inclined to think that this is a server side issue. There are a few things to consider in trying to figure out the root cause. There could be an bug in the code that computes the aggregate metrics that are being requested by the client. There could be a bug in the code that "bucketizes" the metric data into 60 data points. Maybe there is an error that occurred while computing/storing the aggregate metrics which is not handled. Elias, are you able to reproduce this fairly easily? If so, I could provide you with a patch that could log relevant details to give a better idea of what is happening.
I used to see this for things like disk usage or some sufficiently high numerics, for example timestamps in milliseconds. I'm not sure I can reproduce the problem as I have patched my server to avoid this problem entirely. I'm not sure what this 'check' is trying to do, but even if it fails, the user should get some data back. diff --git a/modules/core/domain/src/main/java/org/rhq/core/domain/measurement/composite/MeasurementDataNumericHighLowComposite.java b/modules/ index d84744f..604cf5d 100644 --- a/modules/core/domain/src/main/java/org/rhq/core/domain/measurement/composite/MeasurementDataNumericHighLowComposite.java +++ b/modules/core/domain/src/main/java/org/rhq/core/domain/measurement/composite/MeasurementDataNumericHighLowComposite.java @@ -27,8 +27,6 @@ public class MeasurementDataNumericHighLowComposite implements Serializable { private static final long serialVersionUID = 1L; - private static final double THRESHOLD = 0.00001d; - private long timestamp; private double value; private double highValue; @@ -40,14 +38,12 @@ protected MeasurementDataNumericHighLowComposite() { public MeasurementDataNumericHighLowComposite(long timestamp, double value, double highValue, double lowValue) { if (!Double.isNaN(value)) { - if (highValue < value && Math.abs(highValue - value) > THRESHOLD) { - throw new IllegalArgumentException("highValue (" + highValue - + ") is not greater than or equal to value (" + value + ")."); + if (highValue < value) { + highValue = value; } - if (lowValue > value && Math.abs(lowValue - value) > THRESHOLD) { - throw new IllegalArgumentException("lowValue (" + lowValue + ") is not less than or equal to value (" - + value + ")."); + if (lowValue > value) { + lowValue = value; } }
Thanks Elias. Your patch makes sense. I would like to do some testing to try and determine the root cause to determine what if anything else we ought to do.
The problem was a bug in a method that calculates aggregate metrics. It could produce an incorrect max because of if else statement that should have been an if. The code looked like, if (metric.getMin() < min) { min = metric.getMin(); } else if (metric.getMax() > max) { max = metric.getMax(); } This bug only effects 6 hr and 24 hr metrics. It manifests itself for 6 hour data for example when both the min and max of the 1 hour data being aggregated fall on the same 1 hour aggregate metric. Two things are needed to address this problem. 1) Stop generating invalid aggregate metrics, and 2) handle existing, invalid aggregate metrics. The changes that were merged into master from the jsanda/metrics-schema branch already take care of calculating the aggregates correctly. For existing data, I have committed a change to master that checks for invalid max values. If we come across one, we log a warning and "adjust" the metric. We set the max to the average and and persist the updated value. commit hash: b1b4eeef16
There was additional commit from the release/jon3.2.x branch I needed to cherry pick over to master. commit hash: a5afcb2b5f0
Bulk close of items fixed in RHQ 4.12 If you think this is not solved, then please open a *new* BZ and link to this one.