2077167 – periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial is permfailing - unidling tests

Bug 2077167 - periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial is permfailing - unidling tests

Summary: periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial i...

Keywords:
Status:	CLOSED DUPLICATE of bug 2003228
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.9.z
Assignee:	jamo luhrsen
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-20 19:44 UTC by Ben Parees
Modified:	2022-05-27 16:51 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:	job=periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial=all job=periodic-ci-openshift-release-master-nightly-4.9-e2e-azure-fips=all
Last Closed:	2022-05-27 16:51:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 27172	0	None	open	Bug 2077167: disable unidling test failing under ovn-k	2022-05-25 20:31:58 UTC

Description Ben Parees 2022-04-20 19:44:48 UTC

job:
periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial 

is permfailing frequently in CI, see testgrid results:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial

The consistent failure is in this test:


: [sig-network-edge][Feature:Idling] Unidling should handle many TCP connections by possibly dropping those over a certain bound [Serial] [Suite:openshift/conformance/serial] expand_less 	38s
{  fail [github.com/openshift/origin/test/extended/idling/idling.go:346]: Expected
    <int>: 0
to be >=
    <int>: 16}


So we are starting w/ network edge to investigate.  Note that this is the 3rd or 4th bug reported for this test being stuck in a permfailing situation, so we don't seem to be fixing it correctly, or making the fix stick.


sample job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial/1516527466993160192

Comment 1 Miciah Dashiel Butler Masters 2022-04-21 16:14:58 UTC

Setting low priority and blocker- as no severity was specified.

Comment 2 Ben Parees 2022-04-21 16:43:51 UTC

given that this is permfailing one of our CI jobs, i'd like to see the priority raised to at least medium

we see no failures for this in 4.8, and 4.10+, but it's 1.4% of all failures in 4.9, so something seems to be distinctly broken in 4.9 specifically.

https://search.ci.openshift.org/?search=Unidling+should+handle+many+TCP+connections+by+possibly+dropping+those+over+a+certain+bound&maxAge=48h&context=1&type=bug%2Bjunit&name=4.8&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://search.ci.openshift.org/?search=Unidling+should+handle+many+TCP+connections+by+possibly+dropping+those+over+a+certain+bound&maxAge=48h&context=1&type=bug%2Bjunit&name=4.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://search.ci.openshift.org/?search=Unidling+should+handle+many+TCP+connections+by+possibly+dropping+those+over+a+certain+bound&maxAge=48h&context=1&type=bug%2Bjunit&name=4.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://search.ci.openshift.org/?search=Unidling+should+handle+many+TCP+connections+by+possibly+dropping+those+over+a+certain+bound&maxAge=48h&context=1&type=bug%2Bjunit&name=4.11&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 4 Ben Parees 2022-04-25 14:14:51 UTC

also this is possibly a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1989169  ?  that bug seems to have gotten stuck after getting to POST

Comment 7 Ben Parees 2022-04-28 19:08:09 UTC

This is a pretty consistent failure in the 4.9 serial jobs, which is preventing people from backporting changes to the 4.9 release branches:

https://search.ci.openshift.org/?search=+Unidling+should+handle+many+TCP+connections+by+possibly+dropping+those&maxAge=168h&context=1&type=junit&name=4.9.*serial&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Can we please raise the priority on this?

Comment 10 Ben Parees 2022-05-19 13:39:08 UTC

Candace, it's consistently failing in this job:

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial

https://search.ci.openshift.org/?search=Unidling+should+handle+many+TCP+connections+by+possibly+dropping+those+over+a+certain+bound&maxAge=48h&context=1&type=bug%2Bjunit&name=4.9-e2e-aws-single-node-serial&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

So it may be more specific to single node in particular, but it's the reason that job is never passing for us, and why it's important to be addressed (so that we can start getting signal from that job again)

Comment 19 jamo luhrsen 2022-05-27 16:51:18 UTC


*** This bug has been marked as a duplicate of bug 2003228 ***

Note You need to log in before you can comment on or make changes to this bug.