Bug 1695807 - Unit test flake post rebase: k8s.io/kubernetes/pkg/controller/podautoscaler TestEventNotCreated 3m0s
Summary: Unit test flake post rebase: k8s.io/kubernetes/pkg/controller/podautoscaler T...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.0
Assignee: Joel Smith
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-03 18:43 UTC by Clayton Coleman
Modified: 2019-06-04 10:47 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:47:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:47:11 UTC

Description Clayton Coleman 2019-04-03 18:43:21 UTC
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler TestEventNotCreated 3m0s

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22447/pull-ci-openshift-origin-master-unit/4610#githubcomopenshiftoriginvendork8siokubernetespkgcontrollerpodautoscaler-testeventnotcreated

goroutine 1200 [chan receive, 2 minutes]:
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).Run(0xc420138f00, 0xc420048d20)
	/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:166 +0x29f
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler.(*testCase).runTestWithController
	/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal_test.go:701 +0xbf

Set to high because it is flaking not infrequently.

Comment 2 Frederic Branczyk 2019-04-08 14:21:24 UTC
Joel from the Pod team is taking care of pod autoscaler things. Re-assigning.

Comment 3 Seth Jennings 2019-04-10 14:47:36 UTC
Removing as blocker but still need to stay on this.

15 occurrences in the last 48h.

https://search.svc.ci.openshift.org/?search=TestEventNotCreated&maxAge=48h&context=2&type=all

Comment 4 Clayton Coleman 2019-04-10 19:07:30 UTC
This is absolutely still a blocker.  50% of origin merges / PR jobs were failing on this.

I am disabling the test here: https://github.com/openshift/origin/pull/22527

Please ensure you follow up and fix.

Comment 6 Joel Smith 2019-04-10 22:42:42 UTC
I think I have figured out the cause of the deadlock in the test. I've got a WIP PR that should address the deadlock and fix another potential flake. I'm going to test it over and over to see if I can get it to hit the flake again with my change. If it looks good, I'll work on getting it merged upstream and in Origin.

https://github.com/openshift/origin/pull/22531

Comment 7 Joel Smith 2019-04-11 14:22:16 UTC
No flakes on my tests that ran overnight. Hopefully that's a good sign.

Comment 9 Joel Smith 2019-04-17 14:46:05 UTC
Hi Sunil, 
Here's a reproducer for the flake. From the origin source directory run this:

GOMAXPROCS=1 go test ./vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/... -timeout 60s -count 1

Before the fix, it should timeout after 60 seconds with a deadlock backtrace.

After the fix, it should give an "ok" result.

Unfortunately, there is another flake that we're fixing in https://github.com/openshift/origin/pull/22591 that needs to merge too, or the reproducer will fail on that one too. Once both fixes are in, then you should get an "ok" result.

Comment 12 errata-xmlrpc 2019-06-04 10:47:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.