Currently, if two independent sessions (i.e. two different browsers) are used to query metrics i.e. in the OpenShift console or Prometheus UI one can get different results.
This is due to potentially having a session affinity to a different Prometheus replica instance.
This is fixed by the central Thanos Querier deployment.
Description of problem:
after upgrade OCP,search example metrics in query browser and prometheus UI, results are not the same
AlertmanagerConfigInconsistent from query browser is 71, but from prometheus UI is 32
some metrics are only found in query browser, eg: KubeStatefulSetReplicasMismatch
From query browser
sum(sort_desc(sum_over_time(ALERTS{alertstate="firing"}[24h]))) by (alertname)
AggregatedLoggingSystemCPUHigh 1
AlertmanagerConfigInconsistent 71
ClusterAutoscalerUnschedulablePods 1091
ClusterIPTablesStale 1076
ClusterMonitoringOperatorErrors 2
ClusterOperatorDegraded 48
ClusterOperatorDown 48
FluentdNodeDown 1
KubeAPILatencyHigh 7
KubeDaemonSetMisScheduled 831
KubeDaemonSetRolloutStuck 1172
KubeDeploymentReplicasMismatch 8275
KubeNodeNotReady 34
KubePodCrashLooping 2612
KubePodNotReady 4631
KubeStatefulSetReplicasMismatch 7
KubeStatefulSetUpdateNotRolledOut 7
RsyslogNodeDown 1147
RsyslogQueueLengthBurst 1293
TargetDown 424
Watchdog 2839
etcdMembersDown 19
From prometheus
sum(sort_desc(sum_over_time(ALERTS{alertstate="firing"}[24h]))) by (alertname)
{alertname="RsyslogQueueLengthBurst"} 1243
{alertname="KubeDaemonSetMisScheduled"} 504
{alertname="TargetDown"} 338
{alertname="KubeAPILatencyHigh"} 9
{alertname="AlertmanagerConfigInconsistent"} 32
{alertname="KubeNodeNotReady"} 36
{alertname="AggregatedLoggingSystemCPUHigh"} 5
{alertname="KubePodCrashLooping"} 2531
{alertname="ClusterAutoscalerUnschedulablePods"} 1032
{alertname="ClusterOperatorDown"} 14
{alertname="etcdMembersDown"} 14
{alertname="ClusterIPTablesStale"} 990
{alertname="KubeDaemonSetRolloutStuck"} 1101
{alertname="ClusterOperatorDegraded"} 14
{alertname="ClusterMonitoringOperatorErrors"} 2
{alertname="Watchdog"} 2769
{alertname="KubeDeploymentReplicasMismatch"} 7893
{alertname="KubePodNotReady"} 4423
{alertname="RsyslogNodeDown"} 1144
Version-Release number of selected component (if applicable):
upgrade from 4.2.0-0.nightly-2019-08-24-002347 to 4.2.0-0.nightly-2019-08-25-233755
How reproducible:
for upgrade environment, not found in fresh environment
Steps to Reproduce:
1. See the description
2.
3.
Actual results:
Expected results:
Additional info:
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
The Query Browser UI dynamically updates with the latest values, but the Prometheus UI does not. So if both pages are left open for a while, the values will be different.
Does that explain what you are seeing here?
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:0062
Description of problem: after upgrade OCP,search example metrics in query browser and prometheus UI, results are not the same AlertmanagerConfigInconsistent from query browser is 71, but from prometheus UI is 32 some metrics are only found in query browser, eg: KubeStatefulSetReplicasMismatch From query browser sum(sort_desc(sum_over_time(ALERTS{alertstate="firing"}[24h]))) by (alertname) AggregatedLoggingSystemCPUHigh 1 AlertmanagerConfigInconsistent 71 ClusterAutoscalerUnschedulablePods 1091 ClusterIPTablesStale 1076 ClusterMonitoringOperatorErrors 2 ClusterOperatorDegraded 48 ClusterOperatorDown 48 FluentdNodeDown 1 KubeAPILatencyHigh 7 KubeDaemonSetMisScheduled 831 KubeDaemonSetRolloutStuck 1172 KubeDeploymentReplicasMismatch 8275 KubeNodeNotReady 34 KubePodCrashLooping 2612 KubePodNotReady 4631 KubeStatefulSetReplicasMismatch 7 KubeStatefulSetUpdateNotRolledOut 7 RsyslogNodeDown 1147 RsyslogQueueLengthBurst 1293 TargetDown 424 Watchdog 2839 etcdMembersDown 19 From prometheus sum(sort_desc(sum_over_time(ALERTS{alertstate="firing"}[24h]))) by (alertname) {alertname="RsyslogQueueLengthBurst"} 1243 {alertname="KubeDaemonSetMisScheduled"} 504 {alertname="TargetDown"} 338 {alertname="KubeAPILatencyHigh"} 9 {alertname="AlertmanagerConfigInconsistent"} 32 {alertname="KubeNodeNotReady"} 36 {alertname="AggregatedLoggingSystemCPUHigh"} 5 {alertname="KubePodCrashLooping"} 2531 {alertname="ClusterAutoscalerUnschedulablePods"} 1032 {alertname="ClusterOperatorDown"} 14 {alertname="etcdMembersDown"} 14 {alertname="ClusterIPTablesStale"} 990 {alertname="KubeDaemonSetRolloutStuck"} 1101 {alertname="ClusterOperatorDegraded"} 14 {alertname="ClusterMonitoringOperatorErrors"} 2 {alertname="Watchdog"} 2769 {alertname="KubeDeploymentReplicasMismatch"} 7893 {alertname="KubePodNotReady"} 4423 {alertname="RsyslogNodeDown"} 1144 Version-Release number of selected component (if applicable): upgrade from 4.2.0-0.nightly-2019-08-24-002347 to 4.2.0-0.nightly-2019-08-25-233755 How reproducible: for upgrade environment, not found in fresh environment Steps to Reproduce: 1. See the description 2. 3. Actual results: Expected results: Additional info: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: