Description of problem: Cluster is running fine for a couple of days, and then oc login fails with: ~~~ F0914 13:59:14.253797 6075 helpers.go:115] Error from server (InternalError): Internal error occurred: unexpected response: 504 ~~~ * In the masters API container logs the following Panic is observed: ~~~ killing connection/stream because serving request timed out and response had been started"} (killing connection/stream because serving request timed out and response had been started) 2020-09-14T19:15:53.931589517Z goroutine 18743417 [running]: 2020-09-14T19:15:53.931589517Z k8s.io/apimachinery/pkg/util/runtime.logPanic(0x27ebb80, 0xc000324fe0) 2020-09-14T19:15:53.931589517Z @/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3 2020-09-14T19:15:53.931589517Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc0038cfbd8, 0x1, 0x1) 2020-09-14T19:15:53.931589517Z @/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82 2020-09-14T19:15:53.931589517Z panic(0x27ebb80, 0xc000324fe0) 2020-09-14T19:15:53.931589517Z runtime/panic.go:679 +0x1b2 2020-09-14T19:15:53.931589517Z k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).timeout(0xc0016f2c60, 0xc004689ae0) 2020-09-14T19:15:53.931589517Z @/k8s.io/apiserver/pkg/server/filters/timeout.go:257 +0x1cf 2020-09-14T19:15:53.931589517Z k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc0000318a0, 0x34deb00, 0xc000a4b730, 0xc001719e00) 2020-09-14T19:15:53.931589517Z @/k8s.io/apiserver/pkg/server/filters/timeout.go:141 +0x310 2020-09-14T19:15:53.931589517Z k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1(0x34deb00, 0xc000a4b730, 0xc001719d00) 2020-09-14T19:15:53.931589517Z @/k8s.io/apiserver/pkg/server/filters/waitgroup.go:59 +0x121 2020-09-14T19:15:53.931589517Z net/http.HandlerFunc.ServeHTTP(0xc0033fefc0, 0x34deb00, 0xc000a4b730, 0xc001719d00) 2020-09-14T19:15:53.931589517Z net/http/server.go:2007 +0x44 2020-09-14T19:15:53.931589517Z k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x34deb00, 0xc000a4b730, 0xc001719c00) 2020-09-14T19:15:53.931589517Z @/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x274 2020-09-14T19:15:53.931589517Z net/http.HandlerFunc.ServeHTTP(0xc0033feff0, 0x34deb00, 0xc000a4b730, 0xc001719c00) 2020-09-14T19:15:53.931589517Z net/http/server.go:2007 +0x44 2020-09-14T19:15:53.931589517Z k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z @/k8s.io/apiserver/pkg/server/httplog/httplog.go:89 +0x2ca 2020-09-14T19:15:53.931589517Z net/http.HandlerFunc.ServeHTTP(0xc0000318c0, 0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z net/http/server.go:2007 +0x44 2020-09-14T19:15:53.931589517Z k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z @/k8s.io/apiserver/pkg/server/filters/wrap.go:51 +0x13e 2020-09-14T19:15:53.931589517Z net/http.HandlerFunc.ServeHTTP(0xc0000318e0, 0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z net/http/server.go:2007 +0x44 2020-09-14T19:15:53.931589517Z github.com/openshift/library-go/pkg/apiserver/apiserverconfig.WithCacheControl.func1(0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z github.com/openshift/library-go.0-20200402123743-4015ba624cae/pkg/apiserver/apiserverconfig/cachecontrol.go:27 +0x268 2020-09-14T19:15:53.931589517Z net/http.HandlerFunc.ServeHTTP(0xc0033ff020, 0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z net/http/server.go:2007 +0x44 2020-09-14T19:15:53.931589517Z k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc0033ff050, 0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z @/k8s.io/apiserver/pkg/server/handler.go:189 +0x51 2020-09-14T19:15:53.931589517Z net/http.serverHandler.ServeHTTP(0xc00091c620, 0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z net/http/server.go:2802 +0xa4 2020-09-14T19:15:53.931589517Z net/http.initNPNRequest.ServeHTTP(0x34ea1c0, 0xc0055bea80, 0xc0020a3880, 0xc00091c620, 0x34d1b00, 0xc0004898b8, 0xc001719b00) 2020-09-14T19:15:53.931589517Z net/http/server.go:3365 +0x8d 2020-09-14T19:15:53.931589517Z golang.org/x/net/http2.(*serverConn).runHandler(0xc004399500, 0xc0004898b8, 0xc001719b00, 0xc0016f2b40) 2020-09-14T19:15:53.931589517Z golang.org/x/net.0-20200202094626-16171245cfb2/http2/server.go:2149 +0x9f 2020-09-14T19:15:53.931589517Z created by golang.org/x/net/http2.(*serverConn).processHeaders 2020-09-14T19:15:53.931589517Z golang.org/x/net.0-20200202094626-16171245cfb2/http2/server.go:1883 +0x4eb 2020-09-14T19:15:53.931618032Z E0914 19:15:53.931588 1 wrap.go:39] apiserver panic'd on GET /openapi/v2 2020-09-14T19:16:12.128554426Z E0914 19:16:12.128479 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, context canceled] 2020-09-14T19:16:22.886801329Z E0914 19:16:22.886741 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, context canceled] ~~~ * The only way to resolve this, is to restart the masters. But after a few days, the problem is back. Version-Release number of selected component (if applicable): OCP 4.5.6
We need the must-gather logs shortly after the incident.
Looking into the customer case, I see crash looping pod and I see failing SubjectAccessReviews that time out or show messages like 2020-09-14T19:16:04.140061553Z E0914 19:16:04.140000 1 upgradeaware.go:327] Error proxying data from backend to client: write tcp 10.26.17.45:6443->10.26.17.55:33292: write: broken pipe suggesting that the network connection is cut off. Without further logs, this pretty much looks like a problem of the service network (SDN) on the masters. Please provide: - kubelet logs - logs of failing pods - logs of everything around SDN on the masters. Moving to SDN to investigate network health on the masters.
Hi team, Workaround 2 from kcs https://access.redhat.com/solutions/5448851 solved the issue. My case is closed now.
Issue still affecting v4.5.16 and crippling master nodes after hours/days 155360 I1112 02:04:12.571661 1 log.go:172] http2: panic serving 10.6.14.84:51786: killing connection/stream because serving request timed out and response had been started
(In reply to Brendan Shirren from comment #49) > Issue still affecting v4.5.16 and crippling master nodes after hours/days > > 155360 I1112 02:04:12.571661 1 log.go:172] http2: panic serving > 10.6.14.84:51786: killing connection/stream because serving request timed > out and response had been started Hi, could you gather must-gather for analysis?
(In reply to Lukasz Szaszkiewicz from comment #50) > (In reply to Brendan Shirren from comment #49) > > > Issue still affecting v4.5.16 and crippling master nodes after hours/days > > > > 155360 I1112 02:04:12.571661 1 log.go:172] http2: panic serving > > 10.6.14.84:51786: killing connection/stream because serving request timed > > out and response had been started > > Hi, could you gather must-gather for analysis? CU just provided the mustgather in the case, it can be yanked on supportshell now and is at the following hydra link: https://attachments.access.redhat.com/hydra/rest/cases/02794589/attachments/8612c8b6-22cb-48f5-8029-e2551d196b3d?usePresignedUrl=true