Bug 1902018
| Summary: | Many HTTP 429 error reported by kube-apiserver - problem disappeared after disabling APIPriorityAndFairness | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Simon Reber <sreber> |
| Component: | kube-apiserver | Assignee: | Abu Kashem <akashem> |
| Status: | CLOSED NOTABUG | QA Contact: | Ke Wang <kewang> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.5 | CC: | akashem, anowak, aos-bugs, mfojtik, oarribas, ocasalsa, sttts, wlewis, xxia |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-02-25 18:10:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Simon Reber
2020-11-26 15:24:11 UTC
sreber, I downloaded the prometheus dump, the tar ball seems to be corrupt. I get an `unexpected EOF` error when I try to extract it. sreber,
looking at the metrics, it looks like it is hitting the p&f panic bug.
To work around the issue, you can apply the following yaml, this exempts cluster workload from service-accounts. This should stabilize the cluster while you have p&f enabled.
Also, we need to pin point the underlying root cause (the panic we are seeing). I will check the must-gather log to pin point the panic.
apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
kind: FlowSchema
metadata:
name: exempt-service-accounts
spec:
distinguisherMethod:
type: ByUser
matchingPrecedence: 10
priorityLevelConfiguration:
name: exempt
rules:
- nonResourceRules:
- nonResourceURLs:
- '*'
verbs:
- '*'
resourceRules:
- apiGroups:
- '*'
clusterScope: true
namespaces:
- '*'
resources:
- '*'
verbs:
- '*'
subjects:
- group:
name: system:serviceaccounts
kind: Group
Please delete this flowschema once you upgrade to the version with the fix.
Update:
- The PR that fixes the p&f panic issue has merged in upstream - https://github.com/kubernetes/kubernetes/pull/97206
We are back porting the fix to 4.5, 4.6 and 4.7:
> 4.5: https://github.com/openshift/origin/pull/25777
> 4.6: https://github.com/openshift/kubernetes/pull/502 and https://github.com/openshift/kubernetes/pull/501 and
> 4.7: https://github.com/openshift/kubernetes/pull/509 and https://github.com/openshift/kubernetes/pull/508
The PRs for master/4.7 have merged and I have asked qe to expedite testing. The corresponding BZ for this is: https://bugzilla.redhat.com/show_bug.cgi?id=1912564
|