Bug 2001763

Summary: [OCP on RHV] panic detected in openshift-oauth-apiserver pods
Product: OpenShift Container Platform Reporter: Ke Wang <kewang>
Component: EtcdAssignee: Nobody <nobody>
Status: CLOSED DEFERRED QA Contact: ge liu <geliu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.9CC: aos-bugs, llopezmo, mfojtik, nobody, pnarkhed, slaskawi, sttts, surbania, tjungblu
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-10 09:13:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ke Wang 2021-09-07 06:57:01 UTC
Description of problem:
panic detected in openshift-oauth-apiserver pods

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-31-115131   True        False         5d18h   Cluster version is 4.9.0-0.nightly-2021-08-31-115131

$ oc get infrastructures.config.openshift.io -o json | jq .items[].status.platform
"oVirt"

How reproducible:
once

Reproduced steps:
The steps of reproduction are not yet known

Actual results:
$ oc get pod -n openshift-oauth-apiserver
NAME                        READY   STATUS    RESTARTS        AGE
apiserver-d6f9b59c4-4pb5p   1/1     Running   0               5d18h
apiserver-d6f9b59c4-558mx   1/1     Running   1 (3d20h ago)   5d18h
apiserver-d6f9b59c4-sddtc   1/1     Running   0               5h

masters=$(oc get no -l node-role.kubernetes.io/master | sed '1d' | awk '{print $1}')
$ for node in $masters; do echo $node;oc debug no/$node -- chroot /host bash -c "grep -ir 'panic' /var/log/pods/openshift-oauth-apiserver_apiserver* ";done
ge2n1-xp525-master-0
...
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-4pb5p_800ae3bb-c103-4071-9d49-905f6d87326f/oauth-apiserver/0.log:2021-09-03T10:38:21.610927014+00:00 stderr F panic(0x220fa60, 0x2908620)
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-4pb5p_800ae3bb-c103-4071-9d49-905f6d87326f/oauth-apiserver/0.log:2021-09-03T10:38:21.610927014+00:00 stderr F 	runtime/panic.go:965 +0x1b9
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-4pb5p_800ae3bb-c103-4071-9d49-905f6d87326f/oauth-apiserver/0.log:2021-09-03T10:38:21.768588128+00:00 stderr F panic(0x220fa60, 0x2908620)
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-4pb5p_800ae3bb-c103-4071-9d49-905f6d87326f/oauth-apiserver/0.log:2021-09-03T10:38:21.768588128+00:00 stderr F 	runtime/panic.go:965 +0x1b9

Removing debug pod ...
ge2n1-xp525-master-1
...
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:19.181907449+00:00 stderr F panic(0x220fa60, 0x2908620)
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:19.181907449+00:00 stderr F 	runtime/panic.go:965 +0x1b9
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:19.197628195+00:00 stderr F panic(0x220fa60, 0x2908620)
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:19.197628195+00:00 stderr F 	runtime/panic.go:965 +0x1b9
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:28.619506068+00:00 stderr F panic(0x220fa60, 0x2908620)
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:28.619506068+00:00 stderr F 	runtime/panic.go:965 +0x1b9
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:28.634713177+00:00 stderr F panic(0x220fa60, 0x2908620)
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:28.634713177+00:00 stderr F 	runtime/panic.go:965 +0x1b9
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:32.476346457+00:00 stderr F panic(0x220fa60, 0x2908620)
/var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-558mx_647614f1-5aaf-4423-902c-ee22b1f3b957/oauth-apiserver/0.log:2021-09-03T10:38:32.476346457+00:00 stderr F 	runtime/panic.go:965 +0x1b9

Removing debug pod ...
ge2n1-xp525-master-2

$ oc debug node/ge2n1-xp525-master-0

sh-4.4# grep -n panic /var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-4pb5p_800ae3bb-c103-4071-9d49-905f6d87326f/oauth-apiserver/0.log
81366:2021-09-03T10:38:21.610927014+00:00 stderr F panic(0x220fa60, 0x2908620)
81367:2021-09-03T10:38:21.610927014+00:00 stderr F 	runtime/panic.go:965 +0x1b9
81462:2021-09-03T10:38:21.768588128+00:00 stderr F panic(0x220fa60, 0x2908620)
81463:2021-09-03T10:38:21.768588128+00:00 stderr F 	runtime/panic.go:965 +0x1b9

sh-4.4# sed -n '81366,81461p' /var/log/pods/openshift-oauth-apiserver_apiserver-d6f9b59c4-4pb5p_800ae3bb-c103-4071-9d49-905f6d87326f/oauth-apiserver/0.log
2021-09-03T10:38:21.610927014+00:00 stderr F panic(0x220fa60, 0x2908620)
2021-09-03T10:38:21.610927014+00:00 stderr F 	runtime/panic.go:965 +0x1b9
2021-09-03T10:38:21.610927014+00:00 stderr F golang.org/x/net/http2.(*responseWriter).Header(0xc000833810, 0x3941100)
2021-09-03T10:38:21.610927014+00:00 stderr F 	golang.org/x/net.0-20210224082022-3d97a244fca7/http2/server.go:2601 +0x99
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.Error(0x2969fc0, 0xc000833810, 0xc00130c1a0, 0x18b, 0x1f4)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2059 +0x3b
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/server/healthz.handleRootHealth.func1(0x29747f0, 0xc0002bbe28, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/healthz/healthz.go:245 +0x6c5
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/metrics.InstrumentHandlerFunc.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/metrics/metrics.go:453 +0x2be
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc000563e00, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP(0xc0018fe8c0, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/mux/pathrecorder.go:241 +0x77a
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP(0xc000211180, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/mux/pathrecorder.go:234 +0x8c
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/server.director.ServeHTTP(0x261c5f1, 0xf, 0xc000a66ab0, 0xc000211180, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/handler.go:154 +0x914
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc00102ca50, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filters/authorization.go:64 +0x603
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc0008a9bc0, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc0008a9c00, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc00102ca80, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfc00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func1.4()
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/filters/priority-and-fairness.go:127 +0x1ba
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle.func2()
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/util/flowcontrol/apf_filter.go:176 +0x222
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/util/flowcontrol.immediateRequest.Finish(...)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/util/flowcontrol/apf_controller.go:752
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle(0xc000758000, 0x297a2f8, 0xc001a9edb0, 0xc001b45d90, 0x297a8e0, 0xc00137f840, 0xc0011ecc40, 0xc0011ecc50, 0xc001843980)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/util/flowcontrol/apf_filter.go:166 +0x907
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/filters/priority-and-fairness.go:130 +0x606
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc00102cab0, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc0008a9c80, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc00102cae0, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filters/impersonation.go:50 +0x240d
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc0008a9d00, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc0008a9d40, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc00102cb10, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filters/audit.go:55 +0x7d5
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc0008a9d80, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc0008a9dc0, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc00102cb70, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filters.withAuthentication.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cfb00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filters/authentication.go:80 +0x75c
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc00067be00, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cfa00)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f98ab162950, 0xc0002bbe18, 0xc0017cf900)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/endpoints/filterlatency/filterlatency.go:80 +0x38c
2021-09-03T10:38:21.610927014+00:00 stderr F net/http.HandlerFunc.ServeHTTP(0xc0008a9e40, 0x7f98ab162950, 0xc0002bbe18, 0xc0017cf900)
2021-09-03T10:38:21.610927014+00:00 stderr F 	net/http/server.go:2049 +0x44
2021-09-03T10:38:21.610927014+00:00 stderr F k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc001ad5aa0, 0xc00089d7e8, 0x297a988, 0xc0002bbe18, 0xc0017cf900)
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/filters/timeout.go:107 +0xb8
2021-09-03T10:38:21.610927014+00:00 stderr F created by k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
2021-09-03T10:38:21.610927014+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/filters/timeout.go:93 +0x1f4
2021-09-03T10:38:21.767177170+00:00 stderr F I0903 10:38:21.767142       1 healthz.go:244] etcd check failed: readyz
2021-09-03T10:38:21.767177170+00:00 stderr F [-]etcd failed: error getting data from etcd: etcdserver: request timed out
2021-09-03T10:38:21.768588128+00:00 stderr F E0903 10:38:21.767445       1 timeout.go:128] Header called after Handler finished
2021-09-03T10:38:21.768588128+00:00 stderr F goroutine 1524708 [running]:
2021-09-03T10:38:21.768588128+00:00 stderr F k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc002a2b500)
2021-09-03T10:38:21.768588128+00:00 stderr F 	k8s.io/apiserver.0/pkg/server/filters/timeout.go:102 +0x125


Expected results:
Should not see panic error in oauth-apiserver pod logs.

Additional info:

Comment 2 Sebastian Łaskawiec 2021-09-07 08:57:57 UTC
It's worth to mention that the API Server recovered from the panic. In case of `openshift-oauth-apiserver/pods/apiserver-d6f9b59c4-4pb5p/oauth-apiserver/oauth-apiserver/logs/current.log`, you can clearly see that it it keeps serving the requests.

Nevertheless, all panics happen after the Etcd failure:

2021-09-03T10:38:21.610550225Z I0903 10:38:21.610498       1 healthz.go:244] etcd check failed: healthz
2021-09-03T10:38:21.610550225Z [-]etcd failed: error getting data from etcd: etcdserver: request timed out
2021-09-03T10:38:21.610927014Z E0903 10:38:21.610872       1 timeout.go:128] Header called after Handler finished
2021-09-03T10:38:21.610927014Z goroutine 1524727 [running]:
2021-09-03T10:38:21.610927014Z k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc001ad5aa0)
...


Looking at this particular 2021-09-03T10:38:21 and Etcd health logs (namespaces/openshift-etcd/pods/etcd-ge2n1-xp525-master-0/etcd-health-monitor/etcd-health-monitor/logs/current.log), I can see that something was wrong:


2021-09-03T10:38:10.136161345Z {"level":"warn","ts":"2021-09-03T10:38:10.136Z","logger":"etcd-client","caller":"v3.0/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000641c00/#initially=[https://10.37.142.152:2379;https://10.37.142.161:2379;https://10.37.142.228:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
2021-09-03T10:38:10.136186499Z {"level":"info","ts":"2021-09-03T10:38:10.136Z","caller":"health/log.go:29","msg":"health check","addresses":["https://10.37.142.228:2379"],"check":"QuorumReadSingleTarget","health":false,"start":"2021-09-03T10:38:07.134Z","error":"client timeout exceeded: 1s"}
2021-09-03T10:38:10.383664827Z {"level":"warn","ts":"2021-09-03T10:38:10.383Z","logger":"etcd-client","caller":"v3.0/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000641c00/#initially=[https://10.37.142.152:2379;https://10.37.142.161:2379;https://10.37.142.228:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
2021-09-03T10:38:10.383706461Z {"level":"info","ts":"2021-09-03T10:38:10.383Z","caller":"health/log.go:29","msg":"health check","addresses":["https://10.37.142.228:2379"],"check":"SerializedReadSingleTarget","health":false,"start":"2021-09-03T10:38:07.381Z","error":"client timeout exceeded: 1s"}

The Etcd logs also seems wrong (namespaces/openshift-etcd/pods/etcd-ge2n1-xp525-master-0/etcd/etcd/logs/current.log):

2021-09-03T10:38:32.492057632Z {"level":"warn","ts":"2021-09-03T10:38:32.483Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"11.297318ms","request":"header:<ID:9968853707752541488 > lease_revoke:<id:0a587ba1446d0bfd>","response":"size:29","error":"lease not found"}
2021-09-03T10:38:32.492355421Z {"level":"warn","ts":"2021-09-03T10:38:32.492Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"35.115µs","request":"header:<ID:9968853707752541493 > lease_revoke:<id:267b7ba1464878a2>","response":"size:29","error":"lease not found"}
2021-09-03T10:38:32.492534359Z {"level":"warn","ts":"2021-09-03T10:38:32.492Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"22.689µs","request":"header:<ID:9968853707752541498 > lease_revoke:<id:0a587ba1446d0b5e>","response":"size:29","error":"lease not found"}
2021-09-03T10:38:32.492668020Z {"level":"warn","ts":"2021-09-03T10:38:32.492Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"45.336µs","request":"header:<ID:9968853707752541500 > lease_revoke:<id:0a587ba1446d0cc4>","response":"size:29","error":"lease not found"}
2021-09-03T10:38:32.493464003Z {"level":"warn","ts":"2021-09-03T10:38:32.493Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"13.585µs","request":"header:<ID:9968853707752541501 > lease_revoke:<id:0a587ba1446d0bfd>","response":"size:29","error":"lease not found"}
2021-09-03T10:38:32.493506491Z {"level":"warn","ts":"2021-09-03T10:38:32.493Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"4.822µs","request":"header:<ID:9968853707752541503 > lease_revoke:<id:267b7ba1464878a2>","response":"size:29","error":"lease not found"}
2021-09-03T10:38:32.494580758Z {"level":"warn","ts":"2021-09-03T10:38:32.494Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"40.204µs","request":"header:<ID:9968853707752541505 > lease_revoke:<id:0a587ba1446d0b5e>","response":"size:29","error":"lease not found"}
2021-09-03T10:38:32.496841162Z {"level":"warn","ts":"2021-09-03T10:38:32.495Z","caller":"etcdserver/server.go:1130","msg":"failed to revoke lease","lease-id":"0a587ba1446d0cc4","error":"lease not found"}
2021-09-03T10:38:32.496921944Z {"level":"warn","ts":"2021-09-03T10:38:32.496Z","caller":"etcdserver/server.go:1130","msg":"failed to revoke lease","lease-id":"0a587ba1446d0bfd","error":"lease not found"}
2021-09-03T10:38:32.496948975Z {"level":"warn","ts":"2021-09-03T10:38:32.496Z","caller":"etcdserver/server.go:1130","msg":"failed to revoke lease","lease-id":"267b7ba1464878a2","error":"lease not found"}
2021-09-03T10:38:32.496997062Z {"level":"info","ts":"2021-09-03T10:38:32.496Z","caller":"traceutil/trace.go:171","msg":"trace[735177488] transaction","detail":"{read_only:false; number_of_response:1; response_revision:878625; }","duration":"3.7077888s","start":"2021-09-03T10:38:28.789Z","end":"2021-09-03T10:38:32.496Z","steps":["trace[735177488] 'process raft request'  (duration: 3.705272134s)"],"step_count":1}

Sending to the Etcd Team to assess what's going on. I'm also requesting feedback from @sttts to check if we can do something about the panic (but I don't think it's necessary, the output message was quite obvious: "etcd check failed: healthz").

Comment 8 Lucas López Montero 2022-10-10 08:34:11 UTC
Cloning bug on the following Jira ticket as agreed with Thomas:

https://issues.redhat.com/browse/OCPBUGS-2136

This ticket can be closed again.

Comment 10 Red Hat Bugzilla 2023-09-18 04:25:47 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days