Bug 1467416 - panic in net/http/server.go starting daemonset with 1200 pods
Summary: panic in net/http/server.go starting daemonset with 1200 pods
Keywords:
Status: CLOSED DUPLICATE of bug 1439324
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.6.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: DeShuai Ma
URL:
Whiteboard: aos-scalability-36
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-03 17:51 UTC by Mike Fiedler
Modified: 2017-07-03 18:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-03 18:35:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mike Fiedler 2017-07-03 17:51:12 UTC
Description of problem:

This looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1439324 but I hit it via a different route + line numbers and packages are somewhat different in the stack, so opening a new bug for initial triage.

The scale lab is currently at 1200 nodes (on the way to 2000).   I deployed logging in the cluster and when the logging-fluentd daemonset was created, I started seeing panics in the master api server logs.   Stack is:

Jul  3 13:35:22 172 atomic-openshift-master-api: I0703 13:35:22.987317   97941 logs.go:41] http: panic serving 172.16.0.20:41518: kill connection/stream
Jul  3 13:35:22 172 atomic-openshift-master-api: goroutine 3576844 [running]:
Jul  3 13:35:22 172 atomic-openshift-master-api: net/http.(*conn).serve.func1(0xc54ec97580)
Jul  3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1491 +0x12a
Jul  3 13:35:22 172 atomic-openshift-master-api: panic(0x4777520, 0xc4204e4fd0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/runtime/panic.go:458 +0x243
Jul  3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).timeout(0xc980ef01c0, 0xc9801a8a00)
Jul  3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:214 +0x187
Jul  3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc421365100, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:98 +0x28c
Jul  3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1(0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/maxinflight.go:95 +0x197
Jul  3 13:35:22 172 atomic-openshift-master-api: net/http.HandlerFunc.ServeHTTP(0xc42132bfc0, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1726 +0x44
Jul  3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:45 +0x212
Jul  3 13:35:22 172 atomic-openshift-master-api: net/http.HandlerFunc.ServeHTTP(0xc4213667b0, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1726 +0x44
Jul  3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request.WithRequestContext.func1(0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request/requestcontext.go:107 +0xef
Jul  3 13:35:22 172 atomic-openshift-master-api: net/http.HandlerFunc.ServeHTTP(0xc4213651c0, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1726 +0x44
Jul  3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc4213667e0, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:193 +0x51
Jul  3 13:35:22 172 atomic-openshift-master-api: net/http.serverHandler.ServeHTTP(0xc4247d2a80, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0)
Jul  3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:2202 +0x7d
Jul  3 13:35:22 172 atomic-openshift-master-api: net/http.(*conn).serve(0xc54ec97580, 0xa3fa120, 0xc9761c2440)
Jul  3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1579 +0x4b7
Jul  3 13:35:22 172 atomic-openshift-master-api: created by net/http.(*Server).Serve
Jul  3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:2293 +0x44d




Version-Release number of selected component (if applicable): 

openshift v3.6.126.1
kubernetes v1.6.1+5115d708d7



How reproducible:  Once so far.  I will update bz after trying to delete/recreate the ds


Steps to Reproduce:
1.  Deployed HA cluster with 1200 compute nodes
2.  Deployed logging


Actual results:

Panics in the API server logs

Expected results:

No error.
Additional info:

Comment 2 Derek Carr 2017-07-03 18:19:32 UTC
From the logs, the panic is coming from an actual timeout on serving the request:

see: https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go#L214

Comment 3 Derek Carr 2017-07-03 18:35:44 UTC
duplicating on the original bug as the line numbers do not align because the code was moved around in 1.5 from 1.6, but its the same basic request is timing out problem.

https://github.com/openshift/origin/blob/release-1.5/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters/timeout.go#L205

is the same as 1.6:

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go#L214

it appears that the LIST call starts writing, but is unable to finish serializing in time to write-out the response before hitting the timeout.

*** This bug has been marked as a duplicate of bug 1439324 ***


Note You need to log in before you can comment on or make changes to this bug.