Bug 1848081 - [k8s.io] [sig-node] Events should be sent by kubelets and the scheduler about pods scheduling and running
Summary: [k8s.io] [sig-node] Events should be sent by kubelets and the scheduler about...
Keywords:
Status: CLOSED DUPLICATE of bug 1846529
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.6.0
Assignee: Ryan Phillips
QA Contact: MinLi
URL:
Whiteboard:
: 1849396 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-17 16:15 UTC by Ben Parees
Modified: 2023-09-14 06:02 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-22 10:44:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ben Parees 2020-06-17 16:15:37 UTC
Description of problem:

Test fails about half the time in ovirt jobs.

[k8s.io] [sig-node] Events should be sent by kubelets and the scheduler about pods scheduling and running [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s] expand_less 	1m8s
fail [k8s.io/kubernetes/test/e2e/node/events.go:116]: Unexpected error:
    <*errors.errorString | 0xc000202990>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

as seen in:
https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.5/1272937679863943168

fails 57% of the time in ovirt:
https://search.apps.build01.ci.devcluster.openshift.com/?search=%5C%5Bk8s%5C.io%5C%5D+%5C%5Bsig-node%5C%5D+Events+should+be+sent+by+kubelets+and+the+scheduler+about+pods+scheduling+and+running++%5C%5BConformance%5C%5D&maxAge=168h&context=1&type=junit&name=.*ovirt.*4.5.*&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 3 Gal Zaidman 2020-06-21 14:17:01 UTC
Today I took a closer look at this and it wasn't caused by the infrastructure upgrade, it is actually a old problem.
In the past, it was caused by a very low CPU on the CI test pod, which was resolved[1].
Now it came back from the dead....
I saw there was a fix for runc on [2] that is related, can you explain why it is related to the error?
I'm working to reproduce and investigate it in the meantime.

[1] https://github.com/openshift/release/pull/9299
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1846529

Comment 4 Gal Zaidman 2020-06-21 14:17:24 UTC
*** Bug 1849396 has been marked as a duplicate of this bug. ***

Comment 5 Ryan Phillips 2020-07-09 20:13:23 UTC
Peter knows more about that fix... I'll ask him to comment.

Comment 6 Peter Hunt 2020-07-14 17:22:05 UTC
Ryan and I talked off line, neither of us remember why we thought the aforementioned runc fix should fix this problem

Comment 7 Gal Zaidman 2020-07-15 06:59:42 UTC
ok so I'm seeing this test in 50% of our CI runs on 4.5.
I can provide you with an OCP on RHV cluster to debug for a specific amount of time if you want.
Can you also increase severity/priority? this is the main reason of 4.5 failures for us

Comment 9 Gal Zaidman 2020-07-22 10:44:04 UTC

*** This bug has been marked as a duplicate of bug 1846529 ***

Comment 10 Red Hat Bugzilla 2023-09-14 06:02:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.