Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1745910

Summary: different metrics result in query browser and prometheus UI after upgrade
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.2.0CC: alegrand, anpicker, aos-bugs, erooth, jokerman, kakkoyun, kgeorgie, lcosic, mloibl, pkrupa, spadgett, surbania
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Currently, if two independent sessions (i.e. two different browsers) are used to query metrics i.e. in the OpenShift console or Prometheus UI one can get different results. This is due to potentially having a session affinity to a different Prometheus replica instance. This is fixed by the central Thanos Querier deployment.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 11:05:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2019-08-27 08:18:55 UTC
Description of problem:
after upgrade OCP,search example metrics in query browser and prometheus UI, results are not the same
AlertmanagerConfigInconsistent from query browser is 71, but from prometheus UI is 32
some metrics are only found in query browser, eg: KubeStatefulSetReplicasMismatch

From query browser
sum(sort_desc(sum_over_time(ALERTS{alertstate="firing"}[24h]))) by (alertname)
AggregatedLoggingSystemCPUHigh	1
AlertmanagerConfigInconsistent	71
ClusterAutoscalerUnschedulablePods	1091
ClusterIPTablesStale	1076
ClusterMonitoringOperatorErrors	2
ClusterOperatorDegraded	48
ClusterOperatorDown	48
FluentdNodeDown	1
KubeAPILatencyHigh	7
KubeDaemonSetMisScheduled	831
KubeDaemonSetRolloutStuck	1172
KubeDeploymentReplicasMismatch	8275
KubeNodeNotReady	34
KubePodCrashLooping	2612
KubePodNotReady	4631
KubeStatefulSetReplicasMismatch	7
KubeStatefulSetUpdateNotRolledOut	7
RsyslogNodeDown	1147
RsyslogQueueLengthBurst	1293
TargetDown	424
Watchdog	2839
etcdMembersDown	19

From prometheus
sum(sort_desc(sum_over_time(ALERTS{alertstate="firing"}[24h]))) by (alertname)
{alertname="RsyslogQueueLengthBurst"}	1243
{alertname="KubeDaemonSetMisScheduled"}	504
{alertname="TargetDown"}	338
{alertname="KubeAPILatencyHigh"}	9
{alertname="AlertmanagerConfigInconsistent"}	32
{alertname="KubeNodeNotReady"}	36
{alertname="AggregatedLoggingSystemCPUHigh"}	5
{alertname="KubePodCrashLooping"}	2531
{alertname="ClusterAutoscalerUnschedulablePods"}	1032
{alertname="ClusterOperatorDown"}	14
{alertname="etcdMembersDown"}	14
{alertname="ClusterIPTablesStale"}	990
{alertname="KubeDaemonSetRolloutStuck"}	1101
{alertname="ClusterOperatorDegraded"}	14
{alertname="ClusterMonitoringOperatorErrors"}	2
{alertname="Watchdog"}	2769
{alertname="KubeDeploymentReplicasMismatch"}	7893
{alertname="KubePodNotReady"}	4423
{alertname="RsyslogNodeDown"}	1144

Version-Release number of selected component (if applicable):
upgrade from 4.2.0-0.nightly-2019-08-24-002347 to 4.2.0-0.nightly-2019-08-25-233755

How reproducible:
for upgrade environment, not found in fresh environment

Steps to Reproduce:
1. See the description
2.
3.

Actual results:


Expected results:


Additional info:



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Andrew Pickering 2019-08-27 10:12:21 UTC
The Query Browser UI dynamically updates with the latest values, but the Prometheus UI does not. So if both pages are left open for a while, the values will be different.

Does that explain what you are seeing here?

Comment 14 errata-xmlrpc 2020-01-23 11:05:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062