Bug 2051985
Summary: | An APIRequestCount without dots in the name can cause a panic | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Pablo Alonso Rodriguez <palonsor> |
Component: | kube-apiserver | Assignee: | Luis Sanchez <sanchezl> |
Status: | CLOSED ERRATA | QA Contact: | jmekkatt |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.8 | CC: | akashem, andbartl, aos-bugs, jmekkatt, mfojtik, rsandu, sanchezl, snetting, xxia |
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:47:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2074094 |
Description
Pablo Alonso Rodriguez
2022-02-08 13:34:47 UTC
------------------------------Steps to reproduce in UNFIXED Build------------------------ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-07-053433 True False 4h25m Cluster version is 4.11.0-0.nightly-2022-04-07-053433 Apply the apirequestcount object with an unsupproted name. $ cat wrongapirequestcountyaml apiVersion: apiserver.openshift.io/v1 kind: APIRequestCount metadata: name: test-alert spec: numberOfUsersToReport: 10 groups: - name: test-alert-rules rules: # Alert for any instance that is unreachable for >5 minutes. - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." # Alert for any instance that has a median request latency >1s. - alert: APIHighRequestLatency expr: api_http_request_latencies_second{quantile="0.5"} > 1 for: 10m annotations: summary: "High request latency on {{ $labels.instance }}" description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)" $ oc create -f wrongapirequestcountyaml apirequestcount.apiserver.openshift.io/test-alert created $ oc get apirequestcount | grep "test-alert" test-alert Allow kue-apiserver to roll out to new versions $ oc patch apiserver cluster -p '{"spec": {"audit": {"profile": "AllRequestBodies"}}}' --type merge apiserver.config.openshift.io/cluster patched $ oc get pods -n openshift-kube-apiserver | grep 'apiserver' | grep -v 'guard' kube-apiserver-ip-10-0-xxx-xxx.us-east-2.compute.internal 5/5 Running 0 133m kube-apiserver-ip-10-0-xxx-xxx.us-east-2.compute.internal 5/5 Running 0 137m kube-apiserver-ip-10-0-xxx-xxx.us-east-2.compute.internal 4/5 CrashLoopBackOff 27 (4m8s ago) 120m $ oc logs -n openshift-kube-apiserver kube-apiserver-ip-10-0-xxx-xxx.us-east-2.compute.internal | grep -i panic E0412 09:29:21.979868 16 runtime.go:78] Observed a panic: runtime.boundsError{x:1, y:1, signed:true, code:0x0} (runtime error: index out of range [1] with length 1) k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic({0x4f10b20, 0xc006437968}) panic({0x4f10b20, 0xc006437968}) /usr/lib/golang/src/runtime/panic.go:1038 +0x215 panic: runtime error: index out of range [1] with length 1 [recovered] panic: runtime error: index out of range [1] with length 1 panic({0x4f10b20, 0xc006437968}) /usr/lib/golang/src/runtime/panic.go:1038 +0x215 kube-apiserver was in crashloop with an panic error mentioned in customer side after new revision of kube-apiserver rolled out. ---------------------- Steps to reproduce in Fixed (latest 4.11) build--------------------------- $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-12-000004 True False 4h41m Cluster version is 4.11.0-0.nightly-2022-04-12-000004 $ oc create -f wrongapirequestcountyaml The APIRequestCount "test-alert" is invalid: metadata.name: Invalid value: "test-alert": apirequestcount test-alert: name must be of the form 'resource.version.group' $ oc get apirequestcount | grep "test-alert" $ "apirequestcount" object was unable to create as it violates the name form "resource.version.group" and hence the issue is not happening with in fixed/latest build. I have tried to create "apiresourcecount" object with valid name as below and worked as expected. $ cat apirequestcount.yaml apiVersion: apiserver.openshift.io/v1 kind: APIRequestCount metadata: name: test-alert.api.v2 spec: numberOfUsersToReport: 10 groups: - name: test-alert-rules rules: # Alert for any instance that is unreachable for >5 minutes. - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." # Alert for any instance that has a median request latency >1s. - alert: APIHighRequestLatency expr: api_http_request_latencies_second{quantile="0.5"} > 1 for: 10m annotations: summary: "High request latency on {{ $labels.instance }}" description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)" $ oc create -f apirequestcount.yaml apirequestcount.apiserver.openshift.io/test-alert.api.v2 created $ oc get apirequestcount | grep test-alert.api test-alert.api.v2 Allowed to roll out the kube-apiserver with new revision to see if that create issues. $oc patch apiserver cluster -p '{"spec": {"audit": {"profile": "AllRequestBodies"}}}' --type merge apiserver.config.openshift.io/cluster patched $ oc get pods -n openshift-kube-apiserver | grep 'apiserver' | grep -v 'guard' kube-apiserver-xxx-njk-4c7px-master-0.c.openshift-qe.internal 5/5 Running 0 2m29s kube-apiserver-xxx-njk-4c7px-master-1.c.openshift-qe.internal 5/5 Running 0 8m11s kube-apiserver-xxx-njk-4c7px-master-2.c.openshift-qe.internal 5/5 Running 0 5m24s $ oc logs -n openshift-kube-apiserver kube-apiserver-xxx-njk-4c7px-master-0.c.openshift-qe.internal | grep -i panic $ Hence the issue has not seen with fixed (latest 4.11 build) version , moved ticket state to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |