Bug 1861391

Summary: [sig-instrumentation] Prometheus when installed on the cluster should provide named network metrics
Product: OpenShift Container Platform Reporter: Micah Abbott <miabbott>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: urgent    
Priority: urgent CC: alegrand, anpicker, erooth, fpaoline, kakkoyun, lcosic, mloibl, pkrupa, surbania, wking
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-29 07:26:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Micah Abbott 2020-07-28 13:43:10 UTC
test: 
[sig-instrumentation] Prometheus when installed on the cluster should provide named network metrics

is failing frequently in CI, see search results:

https://search.ci.openshift.org/?search=Prometheus+when+installed+on+the+cluster+should+provide+named+network+metrics&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job

For example:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=Prometheus+when+installed+on+the+cluster+should+provide+named+network+metrics' | gre
p 'failures match' | sort                                                                                 
promote-release-openshift-machine-os-content-e2e-aws-4.6 - 123 runs, 100% failed, 2% of failures match                                                                                                               
pull-ci-cri-o-cri-o-master-e2e-aws - 69 runs, 75% failed, 67% of failures match                 
pull-ci-cri-o-cri-o-release-1.19-e2e-aws - 3 runs, 67% failed, 100% of failures match     
pull-ci-openshift-cloud-credential-operator-master-e2e-aws - 4 runs, 75% failed, 33% of failures match
pull-ci-openshift-cluster-api-provider-aws-master-e2e-aws - 5 runs, 100% failed, 40% of failures match
...
pull-ci-operator-framework-operator-registry-master-e2e-aws - 9 runs, 78% failed, 57% of failures match
rehearse-10454-pull-ci-cri-o-cri-o-master-e2e-aws - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cloud-credential-operator-master-e2e-azure - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cloud-credential-operator-master-e2e-gcp - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-single - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-azure - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-ovn-hybrid-step-registry - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-installer-master-e2e-aws-shared-vpc - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-installer-master-e2e-gcp-shared-vpc - 1 runs, 100% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-origin-master-e2e-gcp - 1 runs, 100% failed, 100% of failures match
release-openshift-ocp-e2e-aws-scaleup-rhel7-4.6 - 8 runs, 88% failed, 57% of failures match
release-openshift-ocp-installer-e2e-aws-4.6 - 26 runs, 100% failed, 54% of failures match
release-openshift-ocp-installer-e2e-azure-4.6 - 17 runs, 94% failed, 31% of failures match
release-openshift-ocp-installer-e2e-openstack-4.6 - 8 runs, 100% failed, 13% of failures match
release-openshift-ocp-installer-e2e-ovirt-4.6 - 10 runs, 100% failed, 50% of failures match


Picking a specific release job [1]:

Jul 28 09:55:22.799: INFO: execpodz45p5[e2e-test-prometheus-8vgbz].container[agnhost-pause].log
unable to retrieve container logs for cri-o://f448f5b810748bc007461d38940fd913ff9c0e48cfcc52e4d56cfb80d90ad849
Jul 28 09:55:22.885: INFO: skipping dumping cluster info - cluster too large
...
fail [github.com/openshift/origin/test/extended/util/prometheus/helpers.go:174]: Expected
    <map[string]error | len:1>: {
        "pod_network_name_info{pod=\"execpodz45p5\",namespace=\"e2e-test-prometheus-8vgbz\",network_name=\"secondary\"} == 0": {
            s: "promQL query: pod_network_name_info{pod=\"execpodz45p5\",namespace=\"e2e-test-prometheus-8vgbz\",network_name=\"secondary\"} == 0 had reported incorrect results:\n[]",
        },
    }
to be empty


[1] https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1288039990483226624

Comment 5 Federico Paolinelli 2020-07-29 07:26:53 UTC

*** This bug has been marked as a duplicate of bug 1860837 ***