1848081 – [k8s.io] [sig-node] Events should be sent by kubelets and the scheduler about pods scheduling and running

Bug 1848081 - [k8s.io] [sig-node] Events should be sent by kubelets and the scheduler about pods scheduling and running

Summary: [k8s.io] [sig-node] Events should be sent by kubelets and the scheduler about...

Keywords:
Status:	CLOSED DUPLICATE of bug 1846529
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Ryan Phillips
QA Contact:	MinLi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1849396 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-17 16:15 UTC by Ben Parees
Modified:	2023-09-14 06:02 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-22 10:44:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ben Parees 2020-06-17 16:15:37 UTC

Description of problem:

Test fails about half the time in ovirt jobs.

[k8s.io] [sig-node] Events should be sent by kubelets and the scheduler about pods scheduling and running [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s] expand_less 	1m8s
fail [k8s.io/kubernetes/test/e2e/node/events.go:116]: Unexpected error:
    <*errors.errorString | 0xc000202990>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

as seen in:
https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.5/1272937679863943168

fails 57% of the time in ovirt:
https://search.apps.build01.ci.devcluster.openshift.com/?search=%5C%5Bk8s%5C.io%5C%5D+%5C%5Bsig-node%5C%5D+Events+should+be+sent+by+kubelets+and+the+scheduler+about+pods+scheduling+and+running++%5C%5BConformance%5C%5D&maxAge=168h&context=1&type=junit&name=.*ovirt.*4.5.*&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 3 Gal Zaidman 2020-06-21 14:17:01 UTC

Today I took a closer look at this and it wasn't caused by the infrastructure upgrade, it is actually a old problem.
In the past, it was caused by a very low CPU on the CI test pod, which was resolved[1].
Now it came back from the dead....
I saw there was a fix for runc on [2] that is related, can you explain why it is related to the error?
I'm working to reproduce and investigate it in the meantime.

[1] https://github.com/openshift/release/pull/9299
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1846529

Comment 4 Gal Zaidman 2020-06-21 14:17:24 UTC

*** Bug 1849396 has been marked as a duplicate of this bug. ***

Comment 5 Ryan Phillips 2020-07-09 20:13:23 UTC

Peter knows more about that fix... I'll ask him to comment.

Comment 6 Peter Hunt 2020-07-14 17:22:05 UTC

Ryan and I talked off line, neither of us remember why we thought the aforementioned runc fix should fix this problem

Comment 7 Gal Zaidman 2020-07-15 06:59:42 UTC

ok so I'm seeing this test in 50% of our CI runs on 4.5.
I can provide you with an OCP on RHV cluster to debug for a specific amount of time if you want.
Can you also increase severity/priority? this is the main reason of 4.5 failures for us

Comment 9 Gal Zaidman 2020-07-22 10:44:04 UTC


*** This bug has been marked as a duplicate of bug 1846529 ***

Comment 10 Red Hat Bugzilla 2023-09-14 06:02:20 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.