Bug 1437441

Summary: Observed a panic: "Pop() of key not in store: u1p1/builder-dockercfg-8t1np" (Pop() of key not in store: u1p1/builder-dockercfg-8t1np) in router pod logs
Product: OpenShift Container Platform Reporter: Hongan Li <hongli>
Component: NetworkingAssignee: Phil Cameron <pcameron>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, bbennett, eparis, zzhao
Version: 3.5.0   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-10 19:41:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongan Li 2017-03-30 10:29:37 UTC
Description of problem:
Observed a panic in f5 router pod logs when deleting project then re-creating the project, pod, svc and routes. but seems the F5 router pod is still working. 

Version-Release number of selected component (if applicable):
openshift v3.5.5
kubernetes v1.5.2+43a9be4
etcd 3.1.0

router image: ose-f5-router  v3.5.5  718c5ae0acda

How reproducible:
cannot reproduce

Steps to Reproduce:
1.
2.
3.

Actual results:
E0330 09:25:46.962559       1 runtime.go:66] Observed a panic: "Pop() of key not in store: u1p1/builder-dockercfg-8t1np" (Pop() of key not in store: u1p1/builder-dockercfg-8t1np)
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:72
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:65
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:51
/usr/lib/golang/src/runtime/asm_amd64.s:479
/usr/lib/golang/src/runtime/panic.go:458
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/pkg/client/cache/eventqueue.go:308
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/pkg/router/controller/factory/factory.go:139
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/pkg/router/controller/controller.go:254
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/pkg/router/controller/controller.go:82
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:96
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:97
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:52
/builddir/build/BUILD/atomic-openshift-git-0.3f53382/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:43
/usr/lib/golang/src/runtime/asm_amd64.s:2086



Expected results:
no panic error in router pod logs

Additional info:

Comment 1 Phil Cameron 2017-04-07 21:08:22 UTC
Some notes so far: This has to do with quickly adding and deleting routes. The router controller gets the route events and queues while waiting to process them. The queue is effectively a flow buffer. Investigating sequencing of operations and what is happening. More later...

This can be reproduced in the lab.

Comment 2 Phil Cameron 2017-04-13 14:17:37 UTC
*** Bug 1435433 has been marked as a duplicate of this bug. ***

Comment 3 Phil Cameron 2017-04-13 14:21:28 UTC
Bug reproduces on cluster in the lab. Working on finding the root cause.

Comment 4 Phil Cameron 2017-04-13 18:55:55 UTC
Reproduced Pop() 6 times over 5 hours.
Test is intensive add/delete route sequences concurrently from 4 separate scripts.

The Pop() should always find a valid item. Pop() locks the queue so it is consistent. Looks like the sequence of add, delete is creating an invalid record. Looking there for the root cause.

Comment 5 Ben Bennett 2017-04-24 15:16:00 UTC
Marking this UpcomingRelease because the thread is restarted and recovers successfully, but we still need to track-down and fix the panic.

Comment 6 Ben Bennett 2017-05-10 19:41:45 UTC

*** This bug has been marked as a duplicate of bug 1447928 ***

Comment 7 Ben Bennett 2017-06-22 16:02:06 UTC
*** Bug 1442904 has been marked as a duplicate of this bug. ***