Description of problem: Test failure: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_openshift-ansible/11801/pull-ci-openshift-openshift-ansible-master-e2e-aws-scaleup-rhel7/836 This appears to happen consistently on RHEL7 nodes that are being scaled up. RHEL7 nodes should probably be ignored for this check. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Install 4.2 cluster 2. Scale up RHEL 7 nodes 3. Run tests Actual results: Expected results: Additional info:
The failure was due to another issue with mco causing nodes to not join the cluster. Once that issue was resolved the alert was no longer raised.
Saw this in the ci test: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.2/4933 Failing tests: [Feature:Prometheus][Conformance] Prometheus when installed on the cluster should report less than two alerts in firing or pending state [Suite:openshift/conformance/parallel/minimal] reopening ...
Seeing this at https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.2/209 [Feature:Prometheus][Conformance] Prometheus when installed on the cluster should report less than two alerts in firing or pending state [Suite:openshift/conformance/parallel/minimal] expand_less 9m48s fail [github.com/openshift/origin/test/extended/prometheus/prometheus_builds.go:135]: Expected <map[string]error | len:1>: { "ALERTS{alertname!=\"Watchdog\",alertstate=\"firing\"} >= 1": { s: "promQL query: ALERTS{alertname!=\"Watchdog\",alertstate=\"firing\"} >= 1 had reported incorrect results: ALERTS{alertname=\"ClusterOperatorDegraded\", alertstate=\"firing\", condition=\"Degraded\", endpoint=\"metrics\", instance=\"147.75.69.131:9099\", job=\"cluster-version-operator\", name=\"ingress\", namespace=\"openshift-cluster-version\", pod=\"cluster-version-operator-57556d999d-n8wpf\", reason=\"IngressControllersDegraded\", service=\"cluster-version-operator\", severity=\"critical\"} => 1 @[1574685135.377]\nALERTS{alertname=\"ClusterOperatorDown\", alertstate=\"firing\", endpoint=\"metrics\", instance=\"147.75.69.131:9099\", job=\"cluster-version-operator\", name=\"ingress\", namespace=\"openshift-cluster-version\", pod=\"cluster-version-operator-57556d999d-n8wpf\", service=\"cluster-version-operator\", severity=\"critical\", version=\"4.2.0-0.nightly-2019-11-25-111442\"} => 1 @[1574685135.377]\nALERTS{alertname=\"KubeDeploymentReplicasMismatch\", alertstate=\"firing\", deployment=\"router-default\", endpoint=\"https-main\", instance=\"10.130.0.8:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-5499974b5f-x95hx\", service=\"kube-state-metrics\", severity=\"critical\"} => 1 @[1574685135.377]\nALERTS{alertname=\"KubePodNotReady\", alertstate=\"firing\", namespace=\"openshift-ingress\", pod=\"router-default-5789d7d4c6-kkk8j\", severity=\"critical\"} => 1 @[1574685135.377]", }, } to be empty
also seen at: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.2/251
Closing as this seems to be a flake that we no longer see.