1738291 – Failing Test: Prometheus when installed on the cluster should report less than two alerts in firing or pending state

Bug 1738291 - Failing Test: Prometheus when installed on the cluster should report less than two alerts in firing or pending state

Summary: Failing Test: Prometheus when installed on the cluster should report less tha...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Management Console
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.4.0
Assignee:	bpeterse
QA Contact:	Yadan Pei
Docs Contact:
URL:
Whiteboard:	buildcop
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-06 17:16 UTC by Russell Teague
Modified:	2020-02-21 14:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-02-21 14:54:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Russell Teague 2019-08-06 17:16:57 UTC

Description of problem:
Test failure:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_openshift-ansible/11801/pull-ci-openshift-openshift-ansible-master-e2e-aws-scaleup-rhel7/836

This appears to happen consistently on RHEL7 nodes that are being scaled up.  RHEL7 nodes should probably be ignored for this check.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install 4.2 cluster
2. Scale up RHEL 7 nodes
3. Run tests

Actual results:


Expected results:


Additional info:

Comment 4 Russell Teague 2019-08-08 17:13:01 UTC

The failure was due to another issue with mco causing nodes to not join the cluster.  Once that issue was resolved the alert was no longer raised.

Comment 5 Hongkai Liu 2019-11-21 17:15:45 UTC

Saw this in the ci test:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.2/4933

Failing tests:
[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should report less than two alerts in firing or pending state [Suite:openshift/conformance/parallel/minimal]

reopening ...

Comment 7 Lokesh Mandvekar 2019-11-25 14:14:52 UTC

Seeing this at https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.2/209

[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should report less than two alerts in firing or pending state [Suite:openshift/conformance/parallel/minimal] expand_less 	9m48s
fail [github.com/openshift/origin/test/extended/prometheus/prometheus_builds.go:135]: Expected
    <map[string]error | len:1>: {
        "ALERTS{alertname!=\"Watchdog\",alertstate=\"firing\"} >= 1": {
            s: "promQL query: ALERTS{alertname!=\"Watchdog\",alertstate=\"firing\"} >= 1 had reported incorrect results: ALERTS{alertname=\"ClusterOperatorDegraded\", alertstate=\"firing\", condition=\"Degraded\", endpoint=\"metrics\", instance=\"147.75.69.131:9099\", job=\"cluster-version-operator\", name=\"ingress\", namespace=\"openshift-cluster-version\", pod=\"cluster-version-operator-57556d999d-n8wpf\", reason=\"IngressControllersDegraded\", service=\"cluster-version-operator\", severity=\"critical\"} => 1 @[1574685135.377]\nALERTS{alertname=\"ClusterOperatorDown\", alertstate=\"firing\", endpoint=\"metrics\", instance=\"147.75.69.131:9099\", job=\"cluster-version-operator\", name=\"ingress\", namespace=\"openshift-cluster-version\", pod=\"cluster-version-operator-57556d999d-n8wpf\", service=\"cluster-version-operator\", severity=\"critical\", version=\"4.2.0-0.nightly-2019-11-25-111442\"} => 1 @[1574685135.377]\nALERTS{alertname=\"KubeDeploymentReplicasMismatch\", alertstate=\"firing\", deployment=\"router-default\", endpoint=\"https-main\", instance=\"10.130.0.8:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-5499974b5f-x95hx\", service=\"kube-state-metrics\", severity=\"critical\"} => 1 @[1574685135.377]\nALERTS{alertname=\"KubePodNotReady\", alertstate=\"firing\", namespace=\"openshift-ingress\", pod=\"router-default-5789d7d4c6-kkk8j\", severity=\"critical\"} => 1 @[1574685135.377]",
        },
    }
to be empty

Comment 8 Lokesh Mandvekar 2019-11-25 14:19:37 UTC

also seen at:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.2/251

Comment 9 bpeterse 2020-02-21 14:54:52 UTC

Closing as this seems to be a flake that we no longer see.

Note You need to log in before you can comment on or make changes to this bug.