Description of problem: This looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1439324 but I hit it via a different route + line numbers and packages are somewhat different in the stack, so opening a new bug for initial triage. The scale lab is currently at 1200 nodes (on the way to 2000). I deployed logging in the cluster and when the logging-fluentd daemonset was created, I started seeing panics in the master api server logs. Stack is: Jul 3 13:35:22 172 atomic-openshift-master-api: I0703 13:35:22.987317 97941 logs.go:41] http: panic serving 172.16.0.20:41518: kill connection/stream Jul 3 13:35:22 172 atomic-openshift-master-api: goroutine 3576844 [running]: Jul 3 13:35:22 172 atomic-openshift-master-api: net/http.(*conn).serve.func1(0xc54ec97580) Jul 3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1491 +0x12a Jul 3 13:35:22 172 atomic-openshift-master-api: panic(0x4777520, 0xc4204e4fd0) Jul 3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/runtime/panic.go:458 +0x243 Jul 3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).timeout(0xc980ef01c0, 0xc9801a8a00) Jul 3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:214 +0x187 Jul 3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc421365100, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:98 +0x28c Jul 3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1(0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/maxinflight.go:95 +0x197 Jul 3 13:35:22 172 atomic-openshift-master-api: net/http.HandlerFunc.ServeHTTP(0xc42132bfc0, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1726 +0x44 Jul 3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:45 +0x212 Jul 3 13:35:22 172 atomic-openshift-master-api: net/http.HandlerFunc.ServeHTTP(0xc4213667b0, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1726 +0x44 Jul 3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request.WithRequestContext.func1(0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request/requestcontext.go:107 +0xef Jul 3 13:35:22 172 atomic-openshift-master-api: net/http.HandlerFunc.ServeHTTP(0xc4213651c0, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1726 +0x44 Jul 3 13:35:22 172 atomic-openshift-master-api: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc4213667e0, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /builddir/build/BUILD/atomic-openshift-git-0.c91cc09/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:193 +0x51 Jul 3 13:35:22 172 atomic-openshift-master-api: net/http.serverHandler.ServeHTTP(0xc4247d2a80, 0xa3f59a0, 0xc6931d32b0, 0xc975fbf0e0) Jul 3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:2202 +0x7d Jul 3 13:35:22 172 atomic-openshift-master-api: net/http.(*conn).serve(0xc54ec97580, 0xa3fa120, 0xc9761c2440) Jul 3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:1579 +0x4b7 Jul 3 13:35:22 172 atomic-openshift-master-api: created by net/http.(*Server).Serve Jul 3 13:35:22 172 atomic-openshift-master-api: /usr/lib/golang/src/net/http/server.go:2293 +0x44d Version-Release number of selected component (if applicable): openshift v3.6.126.1 kubernetes v1.6.1+5115d708d7 How reproducible: Once so far. I will update bz after trying to delete/recreate the ds Steps to Reproduce: 1. Deployed HA cluster with 1200 compute nodes 2. Deployed logging Actual results: Panics in the API server logs Expected results: No error. Additional info:
From the logs, the panic is coming from an actual timeout on serving the request: see: https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go#L214
duplicating on the original bug as the line numbers do not align because the code was moved around in 1.5 from 1.6, but its the same basic request is timing out problem. https://github.com/openshift/origin/blob/release-1.5/vendor/k8s.io/kubernetes/pkg/genericapiserver/filters/timeout.go#L205 is the same as 1.6: https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go#L214 it appears that the LIST call starts writing, but is unable to finish serializing in time to write-out the response before hitting the timeout. *** This bug has been marked as a duplicate of bug 1439324 ***