Bug 1695807

Summary:	Unit test flake post rebase: k8s.io/kubernetes/pkg/controller/podautoscaler TestEventNotCreated 3m0s
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Node	Assignee:	Joel Smith <joelsmith>
Status:	CLOSED ERRATA	QA Contact:	Sunil Choudhary <schoudha>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	anpicker, aos-bugs, erooth, fbranczy, gblomqui, jokerman, mloibl, mmccomas, pkrupa, sjenning, surbania
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:47:03 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-04-03 18:43:21 UTC

github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler TestEventNotCreated 3m0s

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22447/pull-ci-openshift-origin-master-unit/4610#githubcomopenshiftoriginvendork8siokubernetespkgcontrollerpodautoscaler-testeventnotcreated

goroutine 1200 [chan receive, 2 minutes]:
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).Run(0xc420138f00, 0xc420048d20)
	/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:166 +0x29f
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler.(*testCase).runTestWithController
	/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal_test.go:701 +0xbf

Set to high because it is flaking not infrequently.

Comment 1 Clayton Coleman 2019-04-05 04:34:18 UTC

Also https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22488/pull-ci-openshift-origin-master-unit/4694#githubcomopenshiftoriginvendork8siokubernetespkgcontrollerpodautoscaler-testlegacysuperfluousmetrics

Comment 2 Frederic Branczyk 2019-04-08 14:21:24 UTC

Joel from the Pod team is taking care of pod autoscaler things. Re-assigning.

Comment 3 Seth Jennings 2019-04-10 14:47:36 UTC

Removing as blocker but still need to stay on this.

15 occurrences in the last 48h.

https://search.svc.ci.openshift.org/?search=TestEventNotCreated&maxAge=48h&context=2&type=all

Comment 4 Clayton Coleman 2019-04-10 19:07:30 UTC

This is absolutely still a blocker.  50% of origin merges / PR jobs were failing on this.

I am disabling the test here: https://github.com/openshift/origin/pull/22527

Please ensure you follow up and fix.

Comment 6 Joel Smith 2019-04-10 22:42:42 UTC

I think I have figured out the cause of the deadlock in the test. I've got a WIP PR that should address the deadlock and fix another potential flake. I'm going to test it over and over to see if I can get it to hit the flake again with my change. If it looks good, I'll work on getting it merged upstream and in Origin.

https://github.com/openshift/origin/pull/22531

Comment 7 Joel Smith 2019-04-11 14:22:16 UTC

No flakes on my tests that ran overnight. Hopefully that's a good sign.

Comment 9 Joel Smith 2019-04-17 14:46:05 UTC

Hi Sunil, 
Here's a reproducer for the flake. From the origin source directory run this:

GOMAXPROCS=1 go test ./vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/... -timeout 60s -count 1

Before the fix, it should timeout after 60 seconds with a deadlock backtrace.

After the fix, it should give an "ok" result.

Unfortunately, there is another flake that we're fixing in https://github.com/openshift/origin/pull/22591 that needs to merge too, or the reproducer will fail on that one too. Once both fixes are in, then you should get an "ok" result.

Comment 12 errata-xmlrpc 2019-06-04 10:47:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758