Bug 1429823
Summary: | [3.5.x] Observed a panic: "Invalid state transition: DELETED -> ADDED" (Invalid state transition: DELETED -> ADDED) - default router | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Vladislav Walek <vwalek> |
Component: | Networking | Assignee: | Jacob Tanenbaum <jtanenba> |
Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | akokshar, aloughla, ameade, aos-bugs, asherkho, bbennett, bmeng, bmorriso, dakini, eparis, erich, esauer, jgoulding, jtanenba, maschmid, mmasters, mwhittin, pportant, rromerom, twiest, wgordon, xtian |
Version: | 3.4.1 | Keywords: | OpsBlocker |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: quickly and repeatedly adding and deleting a route with same name in a namespace
Consequence: the router pod panics "invalid state transition: Deleted -> ADDED"
Fix: adding the objects UID to the event queue key generation function
Result: No panic from quickly adding and deleting routes
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-04-26 05:36:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1303130 |
Description
Vladislav Walek
2017-03-07 08:45:15 UTC
Can you get the router configuration please? Was this an F5-backesd router or haproxy? will get the configuration from customer. This is haproxy. Talked to eparis and since this is not a regression (it's been in all router releases... it is just really hard to tickle). We won't block 3.5.0 for it, but will get a fix ASAP. This is currently biting customers on OpenShift Dedicated as well. They are running v3.3.1.13. This will need to be backported far enough that we can fix this for those customers. This is currently biting customers on OpenShift Dedicated as well. They are running v3.3.1.13. This will need to be backported far enough that we can fix this for those customers. ha-router config is in the attachment. *** Bug 1434164 has been marked as a duplicate of this bug. *** Can the customer provide any details on the setup (masters/nodes) and workload that causes this panic? We have not been able to reproduce the exact situation locally A customer Eric is referring to on OpenShift Dedicated seems to hit this when doing the reproduction steps defined in Bug #1419771. Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/fd723fde7fc30aeabe8f511b509c1307f6b146fe change the router eventqueue key function changing the router eventqueue key function so that there is a higher chance that each item will have a unique key so the router does not panic. originally the thought was to add the creation timestamp because it was not user editable but the accessor function meta.CreationTimestamp() only gives the timestamp to the second and since these actions need to occur quickly a second is too long. Only adding creation timestamp I was able to observe the panic with the test script. I decided to use UID because it is much more likely that the UID is unique. Bug: 1429823 changelog: added a note explaining why routerKeyFn was added Can we get this backported to 3.4? (In reply to Thomas Wiest from comment #19) > Can we get this backported to 3.4? A backport should already be in flight as part of https://bugzilla.redhat.com/show_bug.cgi?id=1419771 It will be in the next 3.4 release. The backport made the cut-off for the next fix release. The 3.5.X PR will land as soon as 3.5.0 cuts: https://github.com/openshift/origin/issues/13494 *** Bug 1430541 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1129 *** Bug 1462675 has been marked as a duplicate of this bug. *** |