github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler TestEventNotCreated 3m0s
goroutine 1200 [chan receive, 2 minutes]:
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/podautoscaler.(*testCase).runTestWithController
Set to high because it is flaking not infrequently.
Joel from the Pod team is taking care of pod autoscaler things. Re-assigning.
Removing as blocker but still need to stay on this.
15 occurrences in the last 48h.
This is absolutely still a blocker. 50% of origin merges / PR jobs were failing on this.
I am disabling the test here: https://github.com/openshift/origin/pull/22527
Please ensure you follow up and fix.
I think I have figured out the cause of the deadlock in the test. I've got a WIP PR that should address the deadlock and fix another potential flake. I'm going to test it over and over to see if I can get it to hit the flake again with my change. If it looks good, I'll work on getting it merged upstream and in Origin.
No flakes on my tests that ran overnight. Hopefully that's a good sign.
Here's a reproducer for the flake. From the origin source directory run this:
GOMAXPROCS=1 go test ./vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/... -timeout 60s -count 1
Before the fix, it should timeout after 60 seconds with a deadlock backtrace.
After the fix, it should give an "ok" result.
Unfortunately, there is another flake that we're fixing in https://github.com/openshift/origin/pull/22591 that needs to merge too, or the reproducer will fail on that one too. Once both fixes are in, then you should get an "ok" result.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.