Bug 1506375 - API server panics while running conformance: APIServer panic'd on GET /api/v1/namespaces/extended-test-cli-deployment-59v3j-tb8s9: multiple NewLogged calls!
Summary: API server panics while running conformance: APIServer panic'd on GET /api/...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.7.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.7.0
Assignee: Jordan Liggitt
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-25 19:55 UTC by Mike Fiedler
Modified: 2017-11-28 22:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-28 22:19:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
master logs (800.46 KB, application/x-gzip)
2017-10-25 19:55 UTC, Mike Fiedler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Mike Fiedler 2017-10-25 19:55:05 UTC
Created attachment 1343394 [details]
master logs

Description of problem:

 APIServer panic'd on GET /api/v1/namespaces/extended-test-cli-deployment-59v3j-tb8s9: multiple NewLogged calls!

Running a subset of conformance (https://github.com/openshift/svt/tree/master/conformance) against 3.7.0-0.178.0 the master-api logs started showing panics like the one above.

Eventually the conformance tests hung with taking a long time to terminate.   Lots of watch failures in the log as well


Version-Release number of selected component (if applicable): 3.7.0-0.178.0


How reproducible: Unknown - I will clean the cluster up and try again


Steps to Reproduce:
1.  1 master, 1 infra, 2 compute - all m4.xlarge on AWS
2.  Run the svt_conformance.sh script linked above.   There were failures due to a missing wildfly imagestream, so I addded that and ran the script a second time.


Actual results:

master-api panics and lots of broken watches.   Search on "APIServer panic'd" in the logs

Many conformance tests failed as well due this.




Additional info:

Comment 1 Michal Fojtik 2017-10-26 13:11:09 UTC
Clayton FYI since according to git blame you touched this code in kube ;-)

Comment 2 Jordan Liggitt 2017-10-26 13:25:22 UTC
WithPanicRecovery wraps with a http logger

When WithMaxInFlightLimit reaches its limit and calls tooManyRequests, that also tries to wrap, which fails.

Comment 3 Jordan Liggitt 2017-10-26 13:26:28 UTC
fixed in https://github.com/kubernetes/kubernetes/pull/48813 upstream

Comment 4 Jordan Liggitt 2017-10-26 13:30:02 UTC
picked in https://github.com/openshift/origin/pull/17048

Comment 5 Mike Fiedler 2017-10-31 19:01:08 UTC
Verification blocked by bug 1508061

Comment 7 Mike Fiedler 2017-11-07 23:07:36 UTC
Verified on 3.7.0-0.190.0.   Ran conformance multiple times without seeing this error.

Comment 10 errata-xmlrpc 2017-11-28 22:19:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.