Bug 1965368
Summary: | ClusterQuotaAdmission received non-meta object - message constantly reported in OpenShift Container Platform 4.7 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Simon Reber <sreber> |
Component: | kube-apiserver | Assignee: | Filip Krepinsky <fkrepins> |
Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.7 | CC: | adsoni, aos-bugs, david.karlsen, dhellmann, dyocum, fkrepins, hjaiswal, jkaur, lszaszki, lukas.bauer, maszulik, mfojtik, mmarkand, musman, ocohen, palonsor, pmagotra, rbaumgar, skrenger, skudupud, sprashar, sttts, wlewis, xxia |
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-12 04:35:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2104482 |
Description
Simon Reber
2021-05-27 14:52:01 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. We have a customer facing the same issue that the kube-api server is consuming max of CPU resources and the Kube-apiserver pods logs shows this output: ~~~ ccResolverWrapper: sending update to cc: {[{https://localhost:2379 <nil> 0 <nil>}] <nil> <nil>} 2021-07-09T06:36:50.297990740Z I0709 06:36:50.297885 17 clientconn.go:948] ClientConn switching balancer to "pick_first" 2021-07-09T06:36:50.300944216Z I0709 06:36:50.298272 17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {CONNECTING <nil>} 2021-07-09T06:36:50.318796017Z I0709 06:36:50.313817 17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {READY <nil>} 2021-07-09T06:36:50.318796017Z I0709 06:36:50.315980 17 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" 2021-07-09T06:36:52.532112913Z W0709 06:36:52.531817 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:52.582434039Z W0709 06:36:52.582254 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:52.731282776Z W0709 06:36:52.731187 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:52.931948178Z W0709 06:36:52.931846 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:53.131199569Z W0709 06:36:53.131091 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:53.330927116Z W0709 06:36:53.330835 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:53.531801993Z W0709 06:36:53.531689 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured ~~~ Can you suggest a workaround for this? If more information is required please let me know I'll update the bug. Hello Team, Any update on this? Thanks. Hi, I am really sorry but I haven't had time to dig into this issue and I cannot give you any ETA. BTW: How do you know that entries like "ClusterQuotaAdmission received non-meta object *unstructured.Unstructured" spike the CPU? As per the screenshot of the console provided by the Customer, Kube-apiserver was consuming the max amount of CPU resource and the Kube-apiserver pods were overloaded with these logs: ~~~ 2021-07-09T06:36:52.532112913Z W0709 06:36:52.531817 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured ~~~ Is there any way to confirm the increased load on Kube-apiserver is due to these spamming logs? Or is it because of something else? Shall I ask for Prometheus metrics? I have the issue on my cluster, 4.7.22 interesting is that the pod on master-1 only consumes 80-100 mCores the pod on master-0 consumes 230-150 mCores the pod on master-2 consumes 380-400 mCores only log on master-2 contains "ClusterQuotaAdmission received non-meta object" messages +1: ``` for pod in kube-apiserver-ilp-dts-g-c01ocp01 kube-apiserver-ilp-dts-g-c01ocp02 kube-apiserver-ilp-dts-g-c01ocp03;do k logs $pod |grep ClusterQuotaAdmission|wc -l;done Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-ilp-dts-g-c01ocp01. Please use `kubectl.kubernetes.io/default-container` instead 995 Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-ilp-dts-g-c01ocp02. Please use `kubectl.kubernetes.io/default-container` instead 54025 Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-ilp-dts-g-c01ocp03. Please use `kubectl.kubernetes.io/default-container` instead 6076 ``` ``` k top pods NAME CPU(cores) MEMORY(bytes) kube-apiserver-ilp-dts-g-c01ocp01 198m 4068Mi kube-apiserver-ilp-dts-g-c01ocp02 480m 4164Mi kube-apiserver-ilp-dts-g-c01ocp03 424m 4475Mi ``` 4.7.23 ``` k get pods NAME READY STATUS RESTARTS AGE kube-apiserver-ilp-dts-g-c01ocp01 5/5 Running 0 2d8h kube-apiserver-ilp-dts-g-c01ocp02 5/5 Running 0 2d8h kube-apiserver-ilp-dts-g-c01ocp03 5/5 Running 0 2d8h ``` so alive for about same time... This BZ was initially created to resolve only the log entries like "ClusterQuotaAdmission received non-meta object *unstructured.Unstructured" I haven't had time to look into it yet but is very unlikely that it would cause increased usage of CPU. Has anyone done an analysis that would explain increased CPU? The reason the messages appear is superfluous logging when parsing some types (mainly CRDs like monitoring.coreos.com/v1 ServiceMonitor or PrometheusRule) in ClusterResourceQuota admission controller.
This should not be the reason for high CPU usage. The high amount of objects processed by the APIServer causes the logging to appear more often. But the logging itself does not indicate any errors
I have posted a fix to disable the logging.
Regarding the high CPU usage:
> must-gather from customer Cluster where this message is reported can be found on https://drive.google.com/file/d/1a6BPJdvCzQR77nwXa1AHwtzT8DRi1RDW/view?usp=sharing
I did not find anything unusual in this must-gather (focused on kube-apiserver)
@jkaur I could not access your must-gather logs and the other evidence uploaded to this BZ is not that much useful.
@fkrepins I have another must-gather https://drive.google.com/drive/folders/1_QO0_4HTmpur0mLY0ynvMwKUXpwJ3urY?usp=sharing and it happens on my newly created 4.8.11 cluster as well. Thanks, I looked at these, but couldn't found anything abnormal with regards to this issue. There is probably some other reason why the cluster got unhealthy. Verification steps, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-09-29-021903 True False 58m Cluster version is 4.10.0-0.nightly-2021-09-29-021903 - There is no warning message 'ClusterQuotaAdmission received non-meta object *unstructured.Unstructured' in openshift-kube-apiserver log files, $ for kas_pod in $(oc get pods -n openshift-kube-apiserver -l apiserver | sed '1d' | awk '{print $1}'); do oc logs $kas_pod -n openshift-kube-apiserver|grep ClusterQuotaAdmission|wc -l;done Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-kewang-29410g1-7flqf-master-0.c.openshift-qe.internal. Please use `kubectl.kubernetes.io/default-container` instead 0 Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-kewang-29410g1-7flqf-master-1.c.openshift-qe.internal. Please use `kubectl.kubernetes.io/default-container` instead 0 Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-kewang-29410g1-7flqf-master-2.c.openshift-qe.internal. Please use `kubectl.kubernetes.io/default-container` instead 0 - Check the nodes workload, they are normal. $ for node in $(oc get no | sed '1d' | awk '{print $1}'); do echo "$node --->"; oc describe node/$node | sed -n '/^Allocated resources/,/hugepages-2Mi/p';echo;done kewang-29410g1-7flqf-master-0.c.openshift-qe.internal ---> Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2020m (57%) 0 (0%) memory 6837Mi (49%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) kewang-29410g1-7flqf-master-1.c.openshift-qe.internal ---> Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1755m (50%) 0 (0%) memory 5512Mi (39%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) kewang-29410g1-7flqf-master-2.c.openshift-qe.internal ---> Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1809m (51%) 0 (0%) memory 5607Mi (40%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) kewang-29410g1-7flqf-worker-a-95kb7.c.openshift-qe.internal ---> Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 727m (20%) 0 (0%) memory 3333Mi (23%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) kewang-29410g1-7flqf-worker-b-wwkbp.c.openshift-qe.internal ---> Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 670m (19%) 0 (0%) memory 2980Mi (21%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) I think there is no plan to backport this, unless @maszulik or @sttts have a different opinion. (In reply to Filip Krepinsky from comment #34) > I think there is no plan to backport this, unless @maszulik or > @sttts have a different opinion. My default answer is not to backport, unless there's a high demand and the impact is pretty bad, which is not the case with this fix. @sprashar The error should be inconsequential and not cause the higher CPU usage. So no workaround is needed, apart from not looking at the logs. Nevertheless if you really feel you have a case for unusual CPU utilization I advise you to open a new bug and attach measurement/profiling of a CPU utilization, metrics, SOS and MG. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 Hi, there has not been any change in the direction and there are no plans to backport this. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |