Bug 2049156

Summary: 'oc get project' caused 'Observed a panic: cannot deep copy core.NamespacePhase' when AllRequestBodies is used
Product: OpenShift Container Platform Reporter: Sergiusz Urbaniak <surbania>
Component: oauth-apiserverAssignee: Standa Laznicka <slaznick>
Status: CLOSED NOTABUG QA Contact: Xingxing Xia <xxia>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.10CC: aos-bugs, mfojtik, slaznick, surbania, wlewis, xxia, ytripath
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2049155 Environment:
Last Closed: 2022-02-15 15:24:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2049155    
Bug Blocks:    

Description Sergiusz Urbaniak 2022-02-01 16:37:30 UTC
+++ This bug was initially created as a clone of Bug #2049155 +++

This bug was initially created as a copy of Bug #2047335

I am copying this bug because: 



Description of problem:
'oc get project' caused 'Observed a panic: cannot deep copy core.NamespacePhase' when AllRequestBodies is used

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-26-234447

How reproducible:
Always

Steps to Reproduce:
1. oc edit apiserver cluster, change audit policy to AllRequestBodies.
2. Wait for pods to finish rotation
3. $ oc get project
Error from server (InternalError): an error on the server ("This request caused apiserver to panic. Look in the logs for details.") has prevented the request from succeeding (get projects.project.openshift.io)

$ oc get project default
Error from server (InternalError): an error on the server ("This request caused apiserver to panic. Look in the logs for details.") has prevented the request from succeeding (get projects.project.openshift.io default)

Actual results:
3. As above

Expected results:
3. Should succeed

Additional info:
OAS pod logs panic stack in next comment

Comment 6 Yash Tripathi 2022-02-04 03:31:34 UTC
Verified on 4.10.0-0.nightly-2022-02-02-000921 

$  oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-02-02-000921   True        False         35h     Cluster version is 4.10.0-0.nightly-2022-02-02-000921

$ oc get pod -n openshift-kube-apiserver -l apiserver --show-labels
NAME                                                                 READY   STATUS    RESTARTS   AGE   LABELS
kube-apiserver-ytripath-feb-7ntfh-master-0.c.openshift-qe.internal   5/5     Running   0          12h   apiserver=true,app=openshift-kube-apiserver,revision=12
kube-apiserver-ytripath-feb-7ntfh-master-1.c.openshift-qe.internal   5/5     Running   0          12h   apiserver=true,app=openshift-kube-apiserver,revision=12
kube-apiserver-ytripath-feb-7ntfh-master-2.c.openshift-qe.internal   5/5     Running   0          11h   apiserver=true,app=openshift-kube-apiserver,revision=12

$ oc edit apiserver/cluster
apiserver.config.openshift.io/cluster edited

$ oc get pod -n openshift-kube-apiserver -l apiserver --show-labels
NAME                                                                 READY   STATUS    RESTARTS   AGE     LABELS
kube-apiserver-ytripath-feb-7ntfh-master-0.c.openshift-qe.internal   5/5     Running   0          7m16s   apiserver=true,app=openshift-kube-apiserver,revision=14
kube-apiserver-ytripath-feb-7ntfh-master-1.c.openshift-qe.internal   5/5     Running   0          4m48s   apiserver=true,app=openshift-kube-apiserver,revision=14
kube-apiserver-ytripath-feb-7ntfh-master-2.c.openshift-qe.internal   5/5     Running   0          2m23s   apiserver=true,app=openshift-kube-apiserver,revision=14

$ oc get apiserver/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: APIServer
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    oauth-apiserver.openshift.io/secure-token-storage: "true"
    release.openshift.io/create-only: "true"
  creationTimestamp: "2022-02-02T15:16:00Z"
  generation: 2
  name: cluster
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: 07e5f205-6b78-4252-97fc-71faec5d5eb6
  resourceVersion: "742167"
  uid: c64d4741-7fe5-49e9-a71e-a6d718638d85
spec:
  audit:
    profile: AllRequestBodies

$ oc get project default
NAME      DISPLAY NAME   STATUS
default                  Active

$ oc get projects
NAME                                               DISPLAY NAME   STATUS
default                                                           Active
kube-node-lease                                                   Active
/**snipped**/

Actual results:
3. `oc get projects` runs successfully

Expected results:
3. oc get projects` should succeed

Comment 9 Yash Tripathi 2022-02-07 12:48:23 UTC
As discussed, this bz needs further comprehensive verification

Comment 10 Xingxing Xia 2022-02-15 15:24:18 UTC
The PR https://github.com/openshift/oauth-apiserver/pull/73 does 2 things: one is make oc get oauthaccesstoken (and useroauthaccesstoken etc) to display "CREATED" as time window since created instead of timestamp. Latest 4.10.0-0.nightly-2022-02-15-041303 result:
$ oc get useroauthaccesstoken
NAME                                                 CLIENT NAME                    CREATED ...
sha256~8Mp0...                                       openshift-challenging-client   53m     ...

Old 4.10.0-rc.0 (which does not include the PR) result:
$ oc get useroauthaccesstoken
NAME                                                 CLIENT NAME                    CREATED              ...
sha256~0A0bf1_vgMr...                                openshift-challenging-client   2022-02-15T08:40:29Z ...

This is cosmetic change, nothing to do with the bug.

For panic issue, the PR changed for the resources like users, identities etc. In old 4.10.0-rc.0 (which was created from 4.10.0-0.nightly-2022-02-02-000921 before the PR merging):
$ oc patch apiserver cluster --type=merge -p='{"spec":{"audit":{"profile":"AllRequestBodies"}}}'
Wait pods rotation completes.

$ oc get apiservice | grep oauth
v1.oauth.openshift.io                         openshift-oauth-apiserver/api                                True        6h59m
v1.user.openshift.io                          openshift-oauth-apiserver/api                                True        6h59m
$ oc api-resources | grep -e oauth.openshift.io -e user.openshift.io # Find all oauth-apiserver resources
oauthaccesstokens                                         oauth.openshift.io/v1                         false        OAuthAccessToken
oauthauthorizetokens                                      oauth.openshift.io/v1                         false        OAuthAuthorizeToken
oauthclientauthorizations                                 oauth.openshift.io/v1                         false        OAuthClientAuthorization
oauthclients                                              oauth.openshift.io/v1                         false        OAuthClient
tokenreviews                                              oauth.openshift.io/v1                         false        TokenReview
useroauthaccesstokens                                     oauth.openshift.io/v1                         false        UserOAuthAccessToken
groups                                                    user.openshift.io/v1                          false        Group
identities                                                user.openshift.io/v1                          false        Identity
useridentitymappings                                      user.openshift.io/v1                          false        UserIdentityMapping
users

$ for RESOURCE in oauthaccesstokens oauthauthorizetokens oauthclientauthorizations oauthclients useroauthaccesstokens groups identities users
do
  oc get $RESOURCE # I also tried oc get TYPE NAME for them, also succeeded
  echo
done
$ oc create -f tokenreview.json # tokenreview and below type do not accept GET requests
$ oc create useridentitymapping acme_ldap:adamjones ajones
Above commands for all oauth-apiserver resources succeeded, and checked logs of pods, no panic.

******** So in old 4.10.0-rc.0 before the PR, this bug did not occur. Per Dev's discussion in above Slack: 'did you find a type where it could panic? if we don’t find anything, we’ll close out as "not a bug" ' So closing as NOTA. ********

(I agree to the PR's conversion like string(identity.User.UID), it is to satisfy https://github.com/kubernetes/kubernetes/blob/9968b0e/staging/src/k8s.io/apimachinery/pkg/runtime/converter.go#L614-L641 , the conversion wants to make the type is not a string alias (https://github.com/openshift/oauth-apiserver/blob/master/vendor/k8s.io/apimachinery/pkg/types/uid.go defines the alias "type UID string"), but above tests don't find any places that call deep copy for above resources)