Bug 1861391

Summary:	[sig-instrumentation] Prometheus when installed on the cluster should provide named network metrics
Product:	OpenShift Container Platform	Reporter:	Micah Abbott <miabbott>
Component:	Networking	Assignee:	Ben Bennett <bbennett>
Networking sub component:	openshift-sdn	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	urgent
Priority:	urgent	CC:	alegrand, anpicker, erooth, fpaoline, kakkoyun, lcosic, mloibl, pkrupa, surbania, wking
Version:	4.6
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-07-29 07:26:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Micah Abbott 2020-07-28 13:43:10 UTC

test: 
[sig-instrumentation] Prometheus when installed on the cluster should provide named network metrics

is failing frequently in CI, see search results:

https://search.ci.openshift.org/?search=Prometheus+when+installed+on+the+cluster+should+provide+named+network+metrics&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job

For example:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=Prometheus+when+installed+on+the+cluster+should+provide+named+network+metrics' | gre
p 'failures match' | sort                                                                                 
promote-release-openshift-machine-os-content-e2e-aws-4.6 - 123 runs, 100% failed, 2% of failures match                                                                                                               
pull-ci-cri-o-cri-o-master-e2e-aws - 69 runs, 75% failed, 67% of failures match                 
pull-ci-cri-o-cri-o-release-1.19-e2e-aws - 3 runs, 67% failed, 100% of failures match     
pull-ci-openshift-cloud-credential-operator-master-e2e-aws - 4 runs, 75% failed, 33% of failures match
pull-ci-openshift-cluster-api-provider-aws-master-e2e-aws - 5 runs, 100% failed, 40% of failures match
...
pull-ci-operator-framework-operator-registry-master-e2e-aws - 9 runs, 78% failed, 57% of failures match
rehearse-10454-pull-ci-cri-o-cri-o-master-e2e-aws - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cloud-credential-operator-master-e2e-azure - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cloud-credential-operator-master-e2e-gcp - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-single - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-azure - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-ovn-hybrid-step-registry - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-installer-master-e2e-aws-shared-vpc - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-installer-master-e2e-gcp-shared-vpc - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-origin-master-e2e-gcp - 1 runs, 100% failed, 100% of failures match
release-openshift-ocp-e2e-aws-scaleup-rhel7-4.6 - 8 runs, 88% failed, 57% of failures match
release-openshift-ocp-installer-e2e-aws-4.6 - 26 runs, 100% failed, 54% of failures match
release-openshift-ocp-installer-e2e-azure-4.6 - 17 runs, 94% failed, 31% of failures match
release-openshift-ocp-installer-e2e-openstack-4.6 - 8 runs, 100% failed, 13% of failures match
release-openshift-ocp-installer-e2e-ovirt-4.6 - 10 runs, 100% failed, 50% of failures match


Picking a specific release job [1]:

Jul 28 09:55:22.799: INFO: execpodz45p5[e2e-test-prometheus-8vgbz].container[agnhost-pause].log
unable to retrieve container logs for cri-o://f448f5b810748bc007461d38940fd913ff9c0e48cfcc52e4d56cfb80d90ad849
Jul 28 09:55:22.885: INFO: skipping dumping cluster info - cluster too large
...
fail [github.com/openshift/origin/test/extended/util/prometheus/helpers.go:174]: Expected
    <map[string]error | len:1>: {
        "pod_network_name_info{pod=\"execpodz45p5\",namespace=\"e2e-test-prometheus-8vgbz\",network_name=\"secondary\"} == 0": {
            s: "promQL query: pod_network_name_info{pod=\"execpodz45p5\",namespace=\"e2e-test-prometheus-8vgbz\",network_name=\"secondary\"} == 0 had reported incorrect results:\n[]",
        },
    }
to be empty


[1] https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1288039990483226624

Comment 5 Federico Paolinelli 2020-07-29 07:26:53 UTC


*** This bug has been marked as a duplicate of bug 1860837 ***