Bug 1695807
Summary: | Unit test flake post rebase: k8s.io/kubernetes/pkg/controller/podautoscaler TestEventNotCreated 3m0s | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
Component: | Node | Assignee: | Joel Smith <joelsmith> |
Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | anpicker, aos-bugs, erooth, fbranczy, gblomqui, jokerman, mloibl, mmccomas, pkrupa, sjenning, surbania |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:47:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: |
Description
Clayton Coleman
2019-04-03 18:43:21 UTC
Joel from the Pod team is taking care of pod autoscaler things. Re-assigning. Removing as blocker but still need to stay on this. 15 occurrences in the last 48h. https://search.svc.ci.openshift.org/?search=TestEventNotCreated&maxAge=48h&context=2&type=all This is absolutely still a blocker. 50% of origin merges / PR jobs were failing on this. I am disabling the test here: https://github.com/openshift/origin/pull/22527 Please ensure you follow up and fix. I think I have figured out the cause of the deadlock in the test. I've got a WIP PR that should address the deadlock and fix another potential flake. I'm going to test it over and over to see if I can get it to hit the flake again with my change. If it looks good, I'll work on getting it merged upstream and in Origin. https://github.com/openshift/origin/pull/22531 No flakes on my tests that ran overnight. Hopefully that's a good sign. Hi Sunil, Here's a reproducer for the flake. From the origin source directory run this: GOMAXPROCS=1 go test ./vendor/k8s.io/kubernetes/pkg/controller/podautoscaler/... -timeout 60s -count 1 Before the fix, it should timeout after 60 seconds with a deadlock backtrace. After the fix, it should give an "ok" result. Unfortunately, there is another flake that we're fixing in https://github.com/openshift/origin/pull/22591 that needs to merge too, or the reproducer will fail on that one too. Once both fixes are in, then you should get an "ok" result. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |