Bug 1973075
Summary: | Prometheus when installed on the cluster should have non-Pod host cAdvisor metrics test failing often | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
Component: | Node | Assignee: | Harshal Patil <harpatil> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | akrzos, alegrand, anpicker, aos-bugs, dgrisonn, erooth, harpatil, jhusta, kakkoyun, lcosic, pkrupa, rfreiman, rphillips, spasquie, surbania, wking |
Version: | 4.8 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-01 11:06:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1950993 | ||
Bug Blocks: |
Comment 2
W. Trevor King
2021-06-29 17:09:52 UTC
Hi Trevor, My apologies, I should have been more elaborate in my comment when I closed that bug. Let me clarify a few things before I try to explain why I decided to close this bug for 4.8. I found and fixed an issue [1] in upstream cadvisor which modified how cadvisor reports error. The cadvisor gets imported in openshift in 2 different places. First in openshift/kubernetes repo and other one in openshift/origin repo. This PR [2] brought in the changes I made in upstream [1] in openshift/kubernetes, while this PR [3] brought those upstream [1] changes in openshift/origin. This brought the failures in this test significantly and reduced it to an occasional flakes. Since the PR for openshift/kubernetes [3] was merged before 4.8 window closed, the changes made it into 4.8 branch of openshift/kubernetes. However, when I raised the PR [4] Seth Jennings pointed out that the code in openshift/origin is no longer used in building the kubelet (even though it imports cadvisor). This means we do not need to merge [4] in order to fix this issue. We only need changes in openshift/kubernetes and not openshift/origin. This was confirmed when we look at the SNO CI for 4.8 [6], which had the changes in openshift/kubernetes but not in openshift/origin. So since we don't need the changes in openshift/origin and we already have the required in openshift/kubernetes and we see tests going from failing to occasional flakes [6] I decided to close this bug. The search [8] you mentioned in your comment is slightly misleading IMO. I tried to open some random results from that search [9], [10]. It seems, although the job failed, it wasn't due to failure of this test specifically. Rather pretty much all the tests in those jobs failed. So I am not sure if I would link those runs to this BZ. A good example of test job failure due to issue in this BZ would be this one [11], where the job is clearly failing due to failure of the test linked with this BZ. [1] https://github.com/google/cadvisor/pull/2868 [2] https://github.com/openshift/kubernetes/pull/802 [3] https://github.com/openshift/origin/pull/26232 [4] https://github.com/openshift/origin/pull/26243 [5] https://coreos.slack.com/archives/GK6BJJ1J5/p1623919675075400 [6] https://testgrid.k8s.io/redhat-single-node#periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-single-node&include-filter-by-regex=cAdvisor [7] https://bugzilla.redhat.com/show_bug.cgi?id=1973075#c2 [8] https://search.ci.openshift.org/?search=Prometheus+when+installed+on+the+cluster+should+have+non-Pod+host+cAdvisor+metrics&maxAge=336h&type=junit [9] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_multus-cni/90/pull-ci-openshift-multus-cni-release-4.7-e2e-aws/1410116803887108096 [10] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_multus-cni/90/pull-ci-openshift-multus-cni-release-4.7-e2e-aws/1409203294936502272 [11] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/1121/pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws-single-node/1383816160142692352 |