|Summary:||Unit test flake post rebase: k8s.io/kubernetes/pkg/controller/podautoscaler TestEventNotCreated 3m0s|
|Product:||OpenShift Container Platform||Reporter:||Clayton Coleman <ccoleman>|
|Component:||Node||Assignee:||Joel Smith <joelsmith>|
|Status:||CLOSED ERRATA||QA Contact:||Sunil Choudhary <schoudha>|
|Version:||4.1.0||CC:||anpicker, aos-bugs, erooth, fbranczy, gblomqui, jokerman, mloibl, mmccomas, pkrupa, sjenning, surbania|
|Fixed In Version:||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2019-06-04 10:47:03 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Clayton Coleman 2019-04-03 18:43:21 UTC
Comment 1 Clayton Coleman 2019-04-05 04:34:18 UTC
Comment 2 Frederic Branczyk 2019-04-08 14:21:24 UTC
Joel from the Pod team is taking care of pod autoscaler things. Re-assigning.
Comment 3 Seth Jennings 2019-04-10 14:47:36 UTC
Removing as blocker but still need to stay on this. 15 occurrences in the last 48h. https://search.svc.ci.openshift.org/?search=TestEventNotCreated&maxAge=48h&context=2&type=all
Comment 4 Clayton Coleman 2019-04-10 19:07:30 UTC
This is absolutely still a blocker. 50% of origin merges / PR jobs were failing on this. I am disabling the test here: https://github.com/openshift/origin/pull/22527 Please ensure you follow up and fix.
Comment 6 Joel Smith 2019-04-10 22:42:42 UTC
I think I have figured out the cause of the deadlock in the test. I've got a WIP PR that should address the deadlock and fix another potential flake. I'm going to test it over and over to see if I can get it to hit the flake again with my change. If it looks good, I'll work on getting it merged upstream and in Origin. https://github.com/openshift/origin/pull/22531
Comment 7 Joel Smith 2019-04-11 14:22:16 UTC
No flakes on my tests that ran overnight. Hopefully that's a good sign.
Comment 9 Joel Smith 2019-04-17 14:46:05 UTC
Hi Sunil, Here's a reproducer for the flake. From the origin source directory run this: GOMAXPROCS=1 go test ./vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/... -timeout 60s -count 1 Before the fix, it should timeout after 60 seconds with a deadlock backtrace. After the fix, it should give an "ok" result. Unfortunately, there is another flake that we're fixing in https://github.com/openshift/origin/pull/22591 that needs to merge too, or the reproducer will fail on that one too. Once both fixes are in, then you should get an "ok" result.
Comment 12 errata-xmlrpc 2019-06-04 10:47:03 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758