Bug 1881225

Summary: address apiserver panic on "close of nil channel"
Product: OpenShift Container Platform Reporter: Abu Kashem <akashem>
Component: kube-apiserverAssignee: Abu Kashem <akashem>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.6CC: aos-bugs, mfojtik, sttts, xxia
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:43:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Abu Kashem 2020-09-21 21:07:58 UTC
Description of problem:
We are seeing the following panic: 

> E0921 17:27:50.817978      17 wrap.go:39] apiserver panic'd on GET /readyz?verbose=1
> http2: panic serving 35.243.152.199:2279: close of nil channel

hasBeenReadyCh is always nil, I don't see it being created anywhere. We did not see this issue before when the patch generic_apiserver filter merged. Maybe this is an example of a bad merge or something? 

This is where the panic happens
https://github.com/openshift/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/healthz.go#L106

Since the channel is nil, the following never executes
> 		if hasBeenReadyCh != nil {
>			select {
>			case <-hasBeenReadyCh:
>				handler.ServeHTTP(w, r)
>				return
>			default:
>			}
>		}
>

https://github.com/openshift/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/patch_genericapiserver.go#L105


See the ci job: 
- https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-authentication-operator/351/pull-ci-openshift-cluster-authentication-operator-master-e2e/1308080817292972032

- kube-apiserver log: 
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-authentication-operator/351/pull-ci-openshift-cluster-authentication-operator-master-e2e/1308080817292972032/artifacts/e2e/gather-extra/pods/openshift-kube-apiserver_kube-apiserver-ci-op-snx6g8bq-da271-rtfbw-master-1_kube-apiserver.log

Comment 1 Abu Kashem 2020-09-21 21:09:17 UTC
The Stack trace of the panic:

E0921 17:27:50.817978      17 wrap.go:39] apiserver panic'd on GET /readyz?verbose=1
http2: panic serving 35.243.152.199:2279: close of nil channel
goroutine 913836 [running]:
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc036276720)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:108 +0x113
panic(0x42a19c0, 0x52ce8d0)
	/usr/lib/golang/src/runtime/panic.go:969 +0x175
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.(*GenericAPIServer).installReadyz.func1()
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/healthz.go:106 +0x31
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/healthz.handleRootHealth.func1.1()
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/healthz/healthz.go:276 +0x24
sync.(*Once).doSlow(0xc011a1c3d0, 0xc016d2c7d0)
	/usr/lib/golang/src/sync/once.go:66 +0xec
sync.(*Once).Do(...)
	/usr/lib/golang/src/sync/once.go:57
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/healthz.handleRootHealth.func1(0x53f40c0, 0xc05eb937f8, 0xc02446b100)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/healthz/healthz.go:275 +0xaaa
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/metrics.InstrumentHandlerFunc.func1(0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:406 +0x25e
net/http.HandlerFunc.ServeHTTP(0xc0111b0aa0, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/usr/lib/golang/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP(0xc042e89940, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:241 +0x5cd
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP(0xc01152cee0, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:234 +0x8c
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x4b5fed9, 0xf, 0xc00064cea0, 0xc01152cee0, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/handler.go:154 +0x74d
k8s.io/kubernetes/openshift-kube-apiserver/openshiftkubeapiserver.withOAuthInfo.func1(0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/openshift-kube-apiserver/openshiftkubeapiserver/patch_handlerchain.go:54 +0x6f
net/http.HandlerFunc.ServeHTTP(0xc012ad4ed0, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/usr/lib/golang/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/authorization.go:64 +0x563
net/http.HandlerFunc.ServeHTTP(0xc00bb2f740, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b100)
	/usr/lib/golang/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func2.4()
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/priority-and-fairness.go:136 +0x3c6
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle.func1()
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol/apf_filter.go:122 +0x15e
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol.immediateRequest.Finish(...)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol/apf_controller.go:645
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle(0xc000682160, 0x53f9f40, 0xc02aaef440, 0xc033019760, 0x53fac80, 0xc02519a500, 0xc02cc7a3c0, 0xc02cc7a3d0, 0xc014e8a900)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol/apf_filter.go:115 +0x6ea
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func2(0x7f35c73c0470, 0xc05eb937e8, 0xc02446b000)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/priority-and-fairness.go:139 +0x52d
net/http.HandlerFunc.ServeHTTP(0xc012ad4f00, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b000)
	/usr/lib/golang/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0x7f35c73c0470, 0xc05eb937e8, 0xc02446b000)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go:50 +0x2306
net/http.HandlerFunc.ServeHTTP(0xc00bb2f780, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b000)
	/usr/lib/golang/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1(0x7f35c73c0470, 0xc05eb937e8, 0xc02446b000)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:54 +0x750
net/http.HandlerFunc.ServeHTTP(0xc00bb2f800, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446b000)
	/usr/lib/golang/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthentication.func1(0x7f35c73c0470, 0xc05eb937e8, 0xc02446ae00)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/authentication.go:70 +0x672
net/http.HandlerFunc.ServeHTTP(0xc00a785950, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446ae00)
	/usr/lib/golang/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithCORS.func1(0x7f35c73c0470, 0xc05eb937e8, 0xc02446ae00)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/cors.go:75 +0x1d9
net/http.HandlerFunc.ServeHTTP(0xc00a7c9500, 0x7f35c73c0470, 0xc05eb937e8, 0xc02446ae00)
	/usr/lib/golang/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc036276720, 0xc0132a5240, 0x53fad40, 0xc05eb937e8, 0xc02446ae00)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:113 +0xb8
created by k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:99 +0x1cc

Comment 3 Ke Wang 2020-09-25 15:46:24 UTC
Verification:
PR has been loaded in 4.6 latest build,
$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-25-070943  | grep hyperkube
  hyperkube         https://github.com/openshift/kubernetes       359dd790a64aa35b626a6a081abddd7db9e2dd37
  
$  git log --date local --pretty="%h %an %cd - %s" 359dd790 | grep ' #368 '
62eb2d8bf76 David Eads Wed Sep 23 04:59:10 2020 - Merge pull request #368 from sttts/sttts-notready

Searching https://search.ci.openshift.org/?search=a+panic%3A+close+of+nil+channel&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job, after PR was merged date of Sep 23, there is no error ‘a panic: close of nil channel’ can be found, fix works well, so move the bug verified.

Comment 6 errata-xmlrpc 2020-10-27 16:43:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196