Bug 1912564

Summary: p&f: carry 97206: clean up executing request on panic
Product: OpenShift Container Platform Reporter: Abu Kashem <akashem>
Component: kube-apiserverAssignee: Abu Kashem <akashem>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.5CC: aos-bugs, kewang, mfojtik, xxia
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1912563
: 1912566 (view as bug list) Environment:
Last Closed: 2021-01-25 20:02:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1912563    
Bug Blocks: 1912566    

Description Abu Kashem 2021-01-04 19:09:07 UTC
+++ This bug was initially created as a clone of Bug #1912563 +++

Description of problem:
p&f: clean up executing request on panic

carry upstream PR: https://github.com/kubernetes/kubernetes/pull/97206

Comment 1 Abu Kashem 2021-01-12 16:08:51 UTC
I have another PR that depends on it - https://github.com/openshift/kubernetes/pull/501

Comment 3 Ke Wang 2021-01-18 04:59:21 UTC
There is no an e2e test to reproduce this issue, so need to rely on the unit tests here,

$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.6.0-0.nightly-2021-01-17-214551 | grep 'hyperkube'
  hyperkube       https://github.com/openshift/kubernetes            3b012051f912a6e29337c4087fcfc8161651690b

$  git log --date=local --pretty="%h %an %cd - %s" 3b01205 | grep  ' #501.*tka'
1917b20307b OpenShift Merge Robot Sat Jan 16 12:17:29 2021 - Merge pull request #501 from tkashem/carry-97323-4.6

$  git log --date=local --pretty="%h %an %cd - %s" 3b01205 | grep  ' #502.*tka'
d8fbfc8cdd3 OpenShift Merge Robot Sat Jan 16 04:58:00 2021 - Merge pull request #502 from tkashem/carry-97206-4.6

$ export KUBECONFIG=<OCP 4.6 kubeconifg>

$ cd <$GOPATH/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol

$ go test -v -run TestAPFControllerWithGracefulShutdown
=== RUN   TestAPFControllerWithGracefulShutdown
I0118 12:22:54.454469 3035888 apf_controller.go:261] Starting API Priority and Fairness config controller
I0118 12:22:54.454592 3035888 apf_controller.go:266] Running API Priority and Fairness config worker
    controller_test.go:392: waiting for the controller Run function to shutdown gracefully
I0118 12:22:54.554620 3035888 apf_controller.go:270] Shutting down API Priority and Fairness config worker
--- PASS: TestAPFControllerWithGracefulShutdown (0.20s)
PASS
ok  	k8s.io/apiserver/pkg/util/flowcontrol	0.218s

$ go test -v -run TestPriorityAndFairnessWithPanicRecoverAndTimeoutFilter   
=== RUN   TestPriorityAndFairnessWithPanicRecoverAndTimeoutFilter
=== RUN   TestPriorityAndFairnessWithPanicRecoverAndTimeoutFilter/priority_level_concurrency_is_set_to_1,_request_handler_panics,_next_request_should_not_be_rejected
I0118 12:33:53.140165 3036636 apf_controller.go:261] Starting API Priority and Fairness config controller
I0118 12:33:53.140561 3036636 apf_controller.go:266] Running API Priority and Fairness config worker
    priority-and-fairness_test.go:386: polling retry - error: expected to find metrics with matching labels: map[string]string{"priority_level":"test-pl"}
E0118 12:33:53.248202 3036636 runtime.go:76] Observed a panic: request handler panic'd as designed - "/request/panic"
goroutine 133 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc000546c00)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go:106 +0x113
panic(0x1616560, 0xc000089540)
	/usr/local/go/src/runtime/panic.go:969 +0x175
k8s.io/apiserver/pkg/server/filters.TestPriorityAndFairnessWithPanicRecoverAndTimeoutFilter.func1.3(0x7f19114d5310, 0xc0000100c8, 0xc0002e4a00)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness_test.go:403 +0x166
net/http.HandlerFunc.ServeHTTP(0xc0000bdf50, 0x7f19114d5310, 0xc0000100c8, 0xc0002e4a00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func1.4()
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness.go:127 +0x3c6
k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle.func2()
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/apf_filter.go:133 +0x1aa
k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset.(*request).Finish.func1(0xc00013f970, 0xc0004e5b6f, 0xc0003753e0)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset/queueset.go:324 +0x62
k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset.(*request).Finish(0xc00013f970, 0xc0003753e0, 0xc000089330)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset/queueset.go:325 +0x5d
k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle(0xc00013e370, 0x1a12160, 0xc000371170, 0xc00013f8c0, 0x1a12660, 0xc00003c9c0, 0xc000089170, 0xc000089180, 0xc000375320)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/apf_filter.go:123 +0x86a
k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func1(0x7f19114d5310, 0xc0000100c8, 0xc0002e4900)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness.go:130 +0x5c3
net/http.HandlerFunc.ServeHTTP(0xc000345b90, 0x7f19114d5310, 0xc0000100c8, 0xc0002e4900)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/filters.newHandlerChain.func1(0x7f19114d5310, 0xc0000100c8, 0xc0002e4800)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness_test.go:509 +0x24a
net/http.HandlerFunc.ServeHTTP(0xc000370000, 0x7f19114d5310, 0xc0000100c8, 0xc0002e4800)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc000546c00, 0xc000135d80, 0x1a126e0, 0xc0000100c8, 0xc0002e4800)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go:111 +0xb8
created by k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go:97 +0x1cc

goroutine 29 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x15d3c80, 0xc0000895c0)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc0004c3cc8, 0x1, 0x1)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89
panic(0x15d3c80, 0xc0000895c0)
	/usr/local/go/src/runtime/panic.go:969 +0x175
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc000135d80, 0x1a0e2e0, 0xc00028c460, 0xc0002e4800)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go:117 +0x448
k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x1a0e2e0, 0xc00028c460, 0xc0002e4600)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x269
net/http.HandlerFunc.ServeHTTP(0xc000370030, 0x1a0e2e0, 0xc00028c460, 0xc0002e4600)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/httplog/httplog.go:96 +0x33f
net/http.HandlerFunc.ServeHTTP(0xc000370060, 0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/wrap.go:70 +0xe6
net/http.HandlerFunc.ServeHTTP(0xc000135de0, 0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/usr/local/go/src/net/http/server.go:2042 +0x44
net/http.serverHandler.ServeHTTP(0xc0000dc460, 0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/usr/local/go/src/net/http/server.go:2843 +0xa3
net/http.initALPNRequest.ServeHTTP(0x1a12160, 0xc000407b60, 0xc00011a380, 0xc0000dc460, 0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/usr/local/go/src/net/http/server.go:3415 +0x8d
net/http.(*http2serverConn).runHandler(0xc0001c2900, 0xc0003ba278, 0xc0000c6500, 0xc0003e65a0)
	/usr/local/go/src/net/http/h2_bundle.go:5718 +0x8b
created by net/http.(*http2serverConn).processHeaders
	/usr/local/go/src/net/http/h2_bundle.go:5452 +0x505
E0118 12:33:53.248370 3036636 wrap.go:58] apiserver panic'd on GET /request/panic
2021/01/18 12:33:53 http2: panic serving 127.0.0.1:41816: request handler panic'd as designed - "/request/panic"
goroutine 133 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc000546c00)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go:106 +0x113
panic(0x1616560, 0xc000089540)
	/usr/local/go/src/runtime/panic.go:969 +0x175
k8s.io/apiserver/pkg/server/filters.TestPriorityAndFairnessWithPanicRecoverAndTimeoutFilter.func1.3(0x7f19114d5310, 0xc0000100c8, 0xc0002e4a00)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness_test.go:403 +0x166
net/http.HandlerFunc.ServeHTTP(0xc0000bdf50, 0x7f19114d5310, 0xc0000100c8, 0xc0002e4a00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func1.4()
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness.go:127 +0x3c6
k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle.func2()
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/apf_filter.go:133 +0x1aa
k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset.(*request).Finish.func1(0xc00013f970, 0xc0004e5b6f, 0xc0003753e0)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset/queueset.go:324 +0x62
k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset.(*request).Finish(0xc00013f970, 0xc0003753e0, 0xc000089330)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset/queueset.go:325 +0x5d
k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle(0xc00013e370, 0x1a12160, 0xc000371170, 0xc00013f8c0, 0x1a12660, 0xc00003c9c0, 0xc000089170, 0xc000089180, 0xc000375320)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/apf_filter.go:123 +0x86a
k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func1(0x7f19114d5310, 0xc0000100c8, 0xc0002e4900)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness.go:130 +0x5c3
net/http.HandlerFunc.ServeHTTP(0xc000345b90, 0x7f19114d5310, 0xc0000100c8, 0xc0002e4900)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/filters.newHandlerChain.func1(0x7f19114d5310, 0xc0000100c8, 0xc0002e4800)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness_test.go:509 +0x24a
net/http.HandlerFunc.ServeHTTP(0xc000370000, 0x7f19114d5310, 0xc0000100c8, 0xc0002e4800)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc000546c00, 0xc000135d80, 0x1a126e0, 0xc0000100c8, 0xc0002e4800)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go:111 +0xb8
created by k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go:97 +0x1cc

goroutine 29 [running]:
net/http.(*http2serverConn).runHandler.func1(0xc0003ba278, 0xc0004c3f8e, 0xc0001c2900)
	/usr/local/go/src/net/http/h2_bundle.go:5711 +0x16f
panic(0x15d3c80, 0xc0000895c0)
	/usr/local/go/src/runtime/panic.go:969 +0x175
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc0004c3cc8, 0x1, 0x1)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x10c
panic(0x15d3c80, 0xc0000895c0)
	/usr/local/go/src/runtime/panic.go:969 +0x175
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc000135d80, 0x1a0e2e0, 0xc00028c460, 0xc0002e4800)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go:117 +0x448
k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x1a0e2e0, 0xc00028c460, 0xc0002e4600)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x269
net/http.HandlerFunc.ServeHTTP(0xc000370030, 0x1a0e2e0, 0xc00028c460, 0xc0002e4600)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/httplog/httplog.go:96 +0x33f
net/http.HandlerFunc.ServeHTTP(0xc000370060, 0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/home/kewang/work/gows/src/github.com/openshift/kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/wrap.go:70 +0xe6
net/http.HandlerFunc.ServeHTTP(0xc000135de0, 0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/usr/local/go/src/net/http/server.go:2042 +0x44
net/http.serverHandler.ServeHTTP(0xc0000dc460, 0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/usr/local/go/src/net/http/server.go:2843 +0xa3
net/http.initALPNRequest.ServeHTTP(0x1a12160, 0xc000407b60, 0xc00011a380, 0xc0000dc460, 0x1a0ece0, 0xc0003ba278, 0xc0000c6500)
	/usr/local/go/src/net/http/server.go:3415 +0x8d
net/http.(*http2serverConn).runHandler(0xc0001c2900, 0xc0003ba278, 0xc0000c6500, 0xc0003e65a0)
	/usr/local/go/src/net/http/h2_bundle.go:5718 +0x8b
created by net/http.(*http2serverConn).processHeaders
	/usr/local/go/src/net/http/h2_bundle.go:5452 +0x505
    priority-and-fairness_test.go:433: waiting for the controller to shutdown
I0118 12:33:53.249482 3036636 apf_controller.go:270] Shutting down API Priority and Fairness config worker
--- PASS: TestPriorityAndFairnessWithPanicRecoverAndTimeoutFilter (0.21s)
    --- PASS: TestPriorityAndFairnessWithPanicRecoverAndTimeoutFilter/priority_level_concurrency_is_set_to_1,_request_handler_panics,_next_request_should_not_be_rejected (0.21s)
PASS
ok  	k8s.io/apiserver/pkg/server/filters	0.231s

All is well, so move the bug VERIFIED.

Comment 6 errata-xmlrpc 2021-01-25 20:02:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.6.13 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0171

Comment 7 Ke Wang 2021-02-24 16:10:08 UTC
Tried to do one e2e test for this bug, 

Using clusterbuster(https://github.com/RobertKrawitz/OpenShift4-tools) to stress the cluster,

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.9     True        False         6h26m   Cluster version is 4.6.9

$ ./clusterbuster -P server -b 5 -p 10 -D .01 -M 1 -N 6 -r 4 -d 2 -c 10 -m 1000 -v -s 5 -x                                          
2021-02-24T09:51:00.075029 namespace/clusterbuster-0 created                                                                                  ...                                                                                                                                
E0224 17:51:01.862779 3447628 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)            
2021-02-24T09:51:02.672688 namespace/clusterbuster-4 created                                                                                  ...                                                                                                                         
E0224 17:51:36.301838 3447850 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
...
E0224 17:54:35.248472 3449128 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
...
Created  6 namespace(s)
Created 60 secret(s)
Created 12 service(s)
Created 78 object(s) total

Per bug 1912564 related PRs https://github.com/openshift/kubernetes/pull/501 and https://github.com/openshift/kubernetes/pull/502, using oc logs command line to check all kube-apiservers,
$ oc logs -n openshift-kube-apiserver <kube-apiserver pod name> | grep -iE 'apf_controller.go|apf_filter.go' | grep 'no route'
...
2021-02-24T05:00:22.900222528Z I0224 05:00:22.900145      16 httplog.go:94] "HTTP" verb="GET" URI="/apis/build.openshift.io/v1?timeout=32s" latency="3.073522413s" userAgent="kube-controller-manager/v1.19.0+7070803...
t/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol/apf_filter.go:122...
...addedInfo="\nlogging error output: \"error trying to reach service: dial tcp 10.129.0.38:8443: connect: no route to host\\n\"\n"

You will get many error logs like the one above, that indicates there is something wrong with priority and fairness controller.

---------------------
Do the same test on 4.6.19 payload with PR 501 and 502 loaded,

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.19    True        False         17m     Cluster version is 4.6.19

$ ./clusterbuster -P server -b 5 -p 10 -D .01 -M 1 -N 6 -r 4 -d 2 -c 10 -m 1000 -v -s 5 -x
...
Created  6 namespace(s)
Created 60 secret(s)
Created 12 service(s)
Created 78 object(s) total

No any errors in results.

Using oc logs command line to check all kube-apiservers,
$ oc logs -n openshift-kube-apiserver <kube-apiserver pod name>  | grep -iE 'apf_controller.go|apf_filter.go' | grep 'no route'
No any errors in results.

priority and fairness controller works fine.