Bug 1922646 - Panic in authentication-operator invoking webhook authorization
Summary: Panic in authentication-operator invoking webhook authorization
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.7
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ---
: 4.8.0
Assignee: Standa Laznicka
QA Contact: Xingxing Xia
URL:
Whiteboard: LifecycleReset
Depends On:
Blocks: 1956797
TreeView+ depends on / blocked
 
Reported: 2021-01-30 18:25 UTC by Clayton Coleman
Modified: 2021-06-15 15:57 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1956797 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Clayton Coleman 2021-01-30 18:25:09 UTC
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1355501970990829568

E0130 13:57:28.518155       1 runtime.go:76] Observed a panic: runtime error: invalid memory address or nil pointer dereference
goroutine 31873 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc004315140)
	/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:106 +0x113
panic(0x2088fe0, 0x3726080)
	/usr/lib/golang/src/runtime/panic.go:969 +0x1b9
k8s.io/apiserver/plugin/pkg/authorizer/webhook.(*WebhookAuthorizer).Authorize(0xc000adf3e0, 0x271d280, 0xc005f57530, 0x273cb40, 0xc0022bd2c0, 0x0, 0xc003e50d00, 0x16f80cf, 0x271d880, 0xc00275b940)
	/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/plugin/pkg/authorizer/webhook/webhook.go:208 +0x8b9
k8s.io/apiserver/pkg/authorization/union.unionAuthzHandler.Authorize(0xc0007eb910, 0x1, 0x1, 0x271d280, 0xc005f57530, 0x273cb40, 0xc0022bd2c0, 0x1, 0x1, 0x23ad1e2, ...)
	/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/pkg/authorization/union/union.go:52 +0xfe
k8s.io/apiserver/pkg/authorization/union.unionAuthzHandler.Authorize(0xc0005c8ea0, 0x2, 0x2, 0x271d280, 0xc005f57530, 0x273cb40, 0xc0022bd2c0, 0x268f820, 0x1f4dd20, 0xc0044f9340, ...)
	/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/pkg/authorization/union/union.go:52 +0xfe
k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)
	/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/pkg/endpoints/filters/authorization.go:59 +0x165
net/http.HandlerFunc.ServeHTTP(0xc000533d00, 0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)
	/usr/lib/golang/src/net/http/server.go:2054 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)
	/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
net/http.HandlerFunc.ServeHTTP(0xc000533d40, 0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)
	/usr/lib/golang/src/net/http/server.go:2054 +0x44
k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1(0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)
	/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/pkg/server/filters/maxinflight.go:184 +0x4cf
net/http.HandlerFunc.ServeHTTP(0xc00098dbc0, 0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)
	/usr/lib/golang/src/net/http/server.go:2054 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)
	/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/pkg/endpoints/filterlatency/filterlatency.go:95 +0x165
net/http.HandlerFunc.ServeHTTP(0xc00098dbf0, 0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)
	/usr/lib/golang/src/net/http/server.go:2054 +0x44
k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0x7f93d5cf1d00, 0xc0060b8068, 0xc004283900)

Same reason as https://bugzilla.redhat.com/show_bug.cgi?id=1913525, same fix likely (vendor bump)

Comment 1 Standa Laznicka 2021-02-01 11:24:55 UTC
rather than linking a commit from k/apiserver as the fix in the referenced BZ does, I'll wait for an official kube release containing the fix

Comment 2 Stefan Schimanski 2021-02-02 11:02:45 UTC
This is an issue in upstream k8s.io/apiserver library, applying to all components that do delegated authn/authz. 

The fix will be part of 1.20.3 which is not released yet (to be expected in a week), and it also applies to 1.19 and hence many components in 4.6.

Upstream fixes:

1.19: https://github.com/kubernetes/kubernetes/pull/98233
1.20: https://github.com/kubernetes/kubernetes/pull/97862

Moving to 4.8 while we wait for upstream fixes to be released.

Comment 3 Michal Fojtik 2021-03-04 11:46:45 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 5 Michal Fojtik 2021-03-19 12:20:50 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 6 Michal Fojtik 2021-04-18 13:00:17 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 8 Michal Fojtik 2021-04-30 14:06:15 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 9 Standa Laznicka 2021-05-04 12:32:06 UTC
this was fixed in https://github.com/openshift/cluster-authentication-operator/pull/436

Comment 11 Xingxing Xia 2021-05-11 11:52:54 UTC
First have read https://github.com/kubernetes/kubernetes/pull/97820/files to understand why the panic happened: the code only checked the error returned from the triable function, but did not check in case the caller timed out. https://github.com/kubernetes/kubernetes/pull/97820/files fixed this before the execution of the line as shown in above stack `/go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apiserver/plugin/pkg/authorizer/webhook/webhook.go:208` .

Then checked latest 4.8 payload auth-o code, it already included https://github.com/kubernetes/kubernetes/pull/97820/files.

Comment 9 PR merged 21 days ago. So, checked authentication-operator.*Observed a panic: runtime error: invalid memory address or nil pointer dereference within 14 days via https://search.ci.openshift.org/?search=authentication-operator.*Observed+a+panic%3A+runtime+error%3A+invalid+memory+address+or+nil+pointer+dereference&maxAge=336h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job , got many authentication-operator such panics in 4.7 CI jobs. Narrow down the search to 4.8 via https://search.ci.openshift.org/?search=authentication-operator.*Observed+a+panic%3A+runtime+error%3A+invalid+memory+address+or+nil+pointer+dereference&maxAge=336h&context=1&type=junit&name=4%5C.8&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job , then got none result, that is, 4.8 fixed for auth-o.


Note You need to log in before you can comment on or make changes to this bug.