Bug 1447928

Summary: Observed a panic: Pop() of key not in store:
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: NetworkingAssignee: Phil Cameron <pcameron>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: high CC: aos-bugs, bbennett, hongli, pcameron, pdwyer, smunilla, tojek.m
Version: 3.4.1   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: coding error Consequence: Pop() panics and stops router Fix: origin PR 14232 Result:
Story Points: ---
Clone Of:
: 1464563 1464567 1477358 (view as bug list) Environment:
Last Closed: 2017-08-10 05:21:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1464563, 1464567, 1477358    

Description Vladislav Walek 2017-05-04 09:12:28 UTC
Description of problem:

Hello, after errata https://access.redhat.com/errata/RHBA-2017:1129 customer upgraded to 3.4.1.18 router image. However, he is facing similar issue as https://bugzilla.redhat.com/show_bug.cgi?id=1429823

E0504 07:53:54.020124       1 runtime.go:64] Observed a panic: "Pop() of key not in store: namespace/pod_name/d
2399987-309d-11e7-829b-0211d686705d" (Pop() of key not in store: namespace/pod_name/d2399987-309d-11e7-829b-021
1d686705d)
/builddir/build/BUILD/atomic-openshift-git-0.0f9d380/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kube
rnetes/pkg/util/runtime/runtime.go:70
/builddir/build/BUILD/atomic-openshift-git-0.0f9d380/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kube
rnetes/pkg/util/runtime/runtime.go:63
/builddir/build/BUILD/atomic-openshift-git-0.0f9d380/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kube
rnetes/pkg/util/runtime/runtime.go:49
/usr/lib/golang/src/runtime/asm_amd64.s:479
/usr/lib/golang/src/runtime/panic.go:458

I will attach full log.
Workaround is to delete the pod and automatic recreation of the pod.

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.4.1
Router image 3.4.1.18

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ben Bennett 2017-05-10 19:41:03 UTC
More detail on the duplicate bug: https://bugzilla.redhat.com/show_bug.cgi?id=1437441

Comment 3 Ben Bennett 2017-05-10 19:41:45 UTC
*** Bug 1437441 has been marked as a duplicate of this bug. ***

Comment 6 Phil Cameron 2017-05-17 20:44:15 UTC
See 
https://github.com/openshift/origin/pull/14232
for proposed fix.

Comment 7 Vladislav Walek 2017-05-23 08:09:03 UTC
Hello Phil,
thanks you for reply. In which version it will be merged for OSCP ? Thank you

Comment 8 Phil Cameron 2017-05-23 12:49:19 UTC
No decision has been made yet. I expect it to ultimately be in 3.5, 3.4, 3.3. There is a new implementation that doesn't use the fixed code base for 3.6 and beyond so it won't be there.

Comment 10 zhaozhanqi 2017-06-05 07:39:21 UTC
verified this bug on openshift v3.6.94

no find the panic logs when running the following script:

#!/bin/bash


function _simulate_eq_panic() {
    sleep 0.0$((RANDOM%3))
    echo "  - worker name: $1 ... "

    case "$((RANDOM%3))" in
      0)  oc create  -f "$2"  ;;
      1)  oc replace -f "$2"  ;;
      2)  oc delete  -f "$2"  ;;
      *)  oc replace -f "$2"  ;;
    esac

}  #  End of function  _simulate_eq_panic.


#
#  main():
#
routefile=${1:-"https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/unsecure/route_unsecure.json"}
ntimes=${2:-50}

for i in `seq ${ntimes}`; do
  _simulate_eq_panic "worker_${i}" "${routefile}" &
done

_simulate_eq_panic "main" "${routefile}"

Comment 11 Ben Bennett 2017-06-22 15:10:30 UTC
*** Bug 1458587 has been marked as a duplicate of this bug. ***

Comment 13 errata-xmlrpc 2017-08-10 05:21:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716