Created attachment 1875014 [details] Detailed panic logs snippet from kube-apiserver pods. Description of problem: Once the kube-apiserver audit profile is set to “AllRequestBodies”, ”oc get ValidatingWebhookConfiguration” and “oc get MutatingWebhookConfiguration” calls fails with an error “Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR; received from peer“. Kube-apiserver logs indicated “http2: panic serving xxx.xx.xxx.21:49748: cannot deep copy int” Version-Release number of selected component (if applicable): 4.11.0-0.nightly-2022-04-24-135651 or latest How reproducible: Always Steps to Reproduce: 1.Patch the audit-profile value “AllRequestBodies” for kube-apiserver. $ oc patch apiserver cluster -p '{"spec": {"audit": {"profile": "AllRequestBodies"}}}' --type merge 2.Allow kube-apiserver operator pods to rollout to new revision. 3.Once kube-apiserver operator pods in new revision, execute $ oc get ValidatingWebhookConfiguration. $ oc get MutatingWebhookConfiguration. See the behavior. Actual results: $ oc get MutatingWebhookConfiguration Unable to connect to the server: stream error: stream ID 165; INTERNAL_ERROR; received from peer $ oc get ValidatingWebhookConfiguration Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR; received from peer Expected results: Both commands should return respective MutatingWebhookConfiguration, ValidatingWebhookConfiguration objects from the cluster without any error. Additional info: The inline logs are seen under the kube-apiserver side. $ oc logs kube-apiserver-<ONE_OF_THE_POD> -n openshift-kube-apiserver ... E0425 11:24:10.986109 17 wrap.go:58] "apiserver panic'd" method="GET" URI="/apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations?limit=500" audit-ID="6fdf6ada-7b54-4e21-ab31-884c66b8e0c0" http2: panic serving xxx.xx.xxx.21:54610: cannot deep copy int goroutine 3350622 [running]: k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1() /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:105 +0xb0 panic({0x47c7da0, 0xc029c529a0}) /usr/lib/golang/src/runtime/panic.go:1038 +0x215 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1·dwrap·1() /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:86 +0x2a k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1() /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:95 +0x250 panic({0x47c7da0, 0xc029c529a0}) /usr/lib/golang/src/runtime/panic.go:1038 +0x215 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime.DeepCopyJSONValue({0x45287e0, 0x8cf6ad0}) /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/converter.go:639 +0x273 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1.(*TableRow).DeepCopy(0xc039a2a480) /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/deepcopy.go:33 +0xe8 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1.(*TableRow).DeepCopyInto(...) /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/zz_generated.deepcopy.go:1068 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1.(*Table).DeepCopyInto(0xc039d737a0, 0xc039d738c0) /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/zz_generated.deepcopy.go:1001 +0x27c k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1.(*Table).DeepCopy(...) /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/zz_generated.deepcopy.go:1013 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1.(*Table).DeepCopyObject(0xc039d737a0) /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/zz_generated.deepcopy.go:1019 +0x45 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/audit.copyWithoutManagedFields({0x5c457e0, 0xc039d737a0}) <SNIPPED> Attached kube-apiserver panic logs(kube-apiserver_pod_panic_logs.txt) in BZ Must-gather logs( from 4.11.0-0.nightly-2022-04-25-220649) could find here : https://drive.google.com/file/d/1zohQsn_Mk9ZVVtcmNCEQJsC4EBZtprR9/view?usp=sharing
setting it to blocker+, the panic must not happen, and the request should succeed
upstream fix in progress - https://github.com/kubernetes/kubernetes/pull/110408. Once it merges we will pick it in o/kubernetes
Installed latest OCP version which include the fix. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-15-222801 True False 6m59s Cluster version is 4.11.0-0.nightly-2022-06-15-222801 Patched the audit profile to "AllRequestBodies". $ oc patch apiserver cluster -p '{"spec": {"audit": {"profile": "AllRequestBodies"}}}' --type merge apiserver.config.openshift.io/cluster patched $ oc get apiserver/cluster -ojson | jq .spec.audit { "profile": "AllRequestBodies" } Once the new revisions rolled out, tried to get both ValidatingWebhookConfiguration and MutatingWebhookConfiguration objects. Both actions were succeeded. $ oc get ValidatingWebhookConfiguration NAME WEBHOOKS AGE alertmanagerconfigs.openshift.io 1 36m autoscaling.openshift.io 2 44m machine-api 2 45m multus.openshift.io 1 47m performance-addon-operator 1 47m prometheusrules.openshift.io 1 36m snapshot.storage.k8s.io 1 46m test-validating-cfg2 1 21s $ oc get MutatingWebhookConfiguration NAME WEBHOOKS AGE machine-api 2 45m test-mutating-cfg2 1 35s Checked the kube-apiserver logs for any panic messages and nothing found. $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-1.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-0.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-2.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" Repeated the same steps when kube-apiserver patched with "WriteRequestBodies" and "None". Everything works as expected. $ oc patch apiserver cluster -p '{"spec": {"audit": {"profile": "WriteRequestBodies"}}}' --type merge apiserver.config.openshift.io/cluster patched $ oc get apiserver/cluster -ojson | jq .spec.audit { "profile": "WriteRequestBodies" } $ oc get ValidatingWebhookConfiguration NAME WEBHOOKS AGE alertmanagerconfigs.openshift.io 1 77m autoscaling.openshift.io 2 85m machine-api 2 85m multus.openshift.io 1 87m performance-addon-operator 1 88m prometheusrules.openshift.io 1 77m snapshot.storage.k8s.io 1 86m $ oc get MutatingWebhookConfiguration NAME WEBHOOKS AGE machine-api 2 85m $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-1.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-0.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-2.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" $ oc patch apiserver cluster -p '{"spec": {"audit": {"profile": "None"}}}' --type merge apiserver.config.openshift.io/cluster patched $ oc get apiserver/cluster -ojson | jq .spec.audit { "profile": "None" } $ oc get ValidatingWebhookConfiguration NAME WEBHOOKS AGE alertmanagerconfigs.openshift.io 1 103m autoscaling.openshift.io 2 111m machine-api 2 112m multus.openshift.io 1 114m performance-addon-operator 1 114m prometheusrules.openshift.io 1 103m snapshot.storage.k8s.io 1 113m $ oc get MutatingWebhookConfiguration NAME WEBHOOKS AGE machine-api 2 112m $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-1.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-0.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" $ oc logs kube-apiserver-jmekkatt-mpo-jzr7v-master-2.c.openshift-qe.internal -n openshift-kube-apiserver | grep -i "panic" Hence the issue has fixed in tested version and moved ticket to "verified".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069