Created attachment 1738143 [details] prometheus container logs Description of problem: upgrade from 4.6.7 to 4.7.0-0.nightly-2020-12-04-013308, "cannot populate chunk **" error in prometheus container logs # oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 ... level=warn ts=2020-12-09T19:00:10.224Z caller=manager.go:598 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate6h\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h])) - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))) + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))) / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\nlabels:\n verb: write\n" err="cannot populate chunk 626243141636: not found" level=warn ts=2020-12-09T23:00:09.457Z caller=manager.go:598 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate3d\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[3d])) - ((sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.1\",scope=~\"resource|\",verb=~\"LIST|GET\"}[3d])) or vector(0)) + sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.5\",scope=\"namespace\",verb=~\"LIST|GET\"}[3d])) + sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"5\",scope=\"cluster\",verb=~\"LIST|GET\"}[3d])))) + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))) / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\nlabels:\n verb: read\n" err="cannot populate chunk 14561113538563: not found" level=warn ts=2020-12-10T05:00:10.138Z caller=manager.go:598 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate1d\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])) - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))) + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))) / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\nlabels:\n verb: write\n" err="cannot populate chunk 14698049175561: not found" # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- ls -alR /prometheus/ /prometheus/: total 40 drwxrwsr-x. 6 root nobody 4096 Dec 10 07:00 . drwxr-xr-x. 1 root root 17 Dec 9 13:17 .. drwxr-sr-x. 3 nobody nobody 4096 Dec 10 05:00 01ES5GAC644TTBS1E9QPHZM9FK drwxr-sr-x. 3 nobody nobody 4096 Dec 10 07:00 01ES5Q63E5317DHP4R43X0651Q drwxrwsr-x. 2 nobody nobody 4096 Dec 10 07:00 chunks_head -rw-rw-r--. 1 nobody nobody 20001 Dec 10 08:19 queries.active drwxrwsr-x. 3 nobody nobody 4096 Dec 10 07:50 wal /prometheus/01ES5GAC644TTBS1E9QPHZM9FK: total 37296 drwxr-sr-x. 3 nobody nobody 4096 Dec 10 05:00 . drwxrwsr-x. 6 root nobody 4096 Dec 10 07:00 .. drwxr-sr-x. 2 nobody nobody 4096 Dec 10 05:00 chunks -rw-r--r--. 1 nobody nobody 38166650 Dec 10 05:00 index -rw-r--r--. 1 nobody nobody 283 Dec 10 05:00 meta.json -rw-r--r--. 1 nobody nobody 9 Dec 10 05:00 tombstones /prometheus/01ES5GAC644TTBS1E9QPHZM9FK/chunks: total 67532 drwxr-sr-x. 2 nobody nobody 4096 Dec 10 05:00 . drwxr-sr-x. 3 nobody nobody 4096 Dec 10 05:00 .. -rw-r--r--. 1 nobody nobody 69139866 Dec 10 05:00 000001 /prometheus/01ES5Q63E5317DHP4R43X0651Q: total 40856 drwxr-sr-x. 3 nobody nobody 4096 Dec 10 07:00 . drwxrwsr-x. 6 root nobody 4096 Dec 10 07:00 .. drwxr-sr-x. 2 nobody nobody 4096 Dec 10 07:00 chunks -rw-r--r--. 1 nobody nobody 41815901 Dec 10 07:00 index -rw-r--r--. 1 nobody nobody 283 Dec 10 07:00 meta.json -rw-r--r--. 1 nobody nobody 9 Dec 10 07:00 tombstones /prometheus/01ES5Q63E5317DHP4R43X0651Q/chunks: total 86840 drwxr-sr-x. 2 nobody nobody 4096 Dec 10 07:00 . drwxr-sr-x. 3 nobody nobody 4096 Dec 10 07:00 .. -rw-r--r--. 1 nobody nobody 88909041 Dec 10 07:00 000001 /prometheus/chunks_head: total 158712 drwxrwsr-x. 2 nobody nobody 4096 Dec 10 07:00 . drwxrwsr-x. 6 root nobody 4096 Dec 10 07:00 .. -rw-r--r--. 1 nobody nobody 103782253 Dec 10 07:00 000014 -rw-r--r--. 1 nobody nobody 58718433 Dec 10 08:00 000015 /prometheus/wal: total 523564 drwxrwsr-x. 3 nobody nobody 4096 Dec 10 07:50 . drwxrwsr-x. 6 root nobody 4096 Dec 10 07:00 .. -rw-r--r--. 1 nobody nobody 134053888 Dec 10 05:50 00000034 -rw-r--r--. 1 nobody nobody 133988352 Dec 10 06:40 00000035 -rw-r--r--. 1 nobody nobody 53084160 Dec 10 07:00 00000036 -rw-r--r--. 1 nobody nobody 134119424 Dec 10 07:50 00000037 -rw-r--r--. 1 nobody nobody 80862549 Dec 10 08:19 00000038 drwxr-sr-x. 2 nobody nobody 4096 Dec 10 07:00 checkpoint.00000033 /prometheus/wal/checkpoint.00000033: total 14472 drwxr-sr-x. 2 nobody nobody 4096 Dec 10 07:00 . drwxrwsr-x. 3 nobody nobody 4096 Dec 10 07:50 .. -rw-r--r--. 1 nobody nobody 14811136 Dec 10 07:00 00000000 Version-Release number of selected component (if applicable): upgrade from 4.6.7 to 4.7.0-0.nightly-2020-12-04-013308 How reproducible: not sure Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
meet again with 4.7.0-0.nightly-2020-12-20-031835 level=warn ts=2020-12-21T09:00:09.070Z caller=manager.go:598 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate6h\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[6h])) - ((sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.1\",scope=~\"resource|\",verb=~\"LIST|GET\"}[6h])) or vector(0)) + sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.5\",scope=\"namespace\",verb=~\"LIST|GET\"}[6h])) + sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"5\",scope=\"cluster\",verb=~\"LIST|GET\"}[6h])))) + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))) / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))\nlabels:\n verb: read\n" err="cannot populate chunk 2870363553792: not found"
also find in 4.8.0-0.nightly-2021-06-12-174011 # oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | grep "cannot populate chunk" level=warn ts=2021-06-14T21:00:07.166Z caller=manager.go:601 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate6h\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\n - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h])))\n + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h])))\n / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\nlabels:\n verb: write\n" err="cannot populate chunk 7875292299265: not found" level=warn ts=2021-06-15T01:00:06.810Z caller=manager.go:601 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate1d\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])))\n + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])))\n / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\nlabels:\n verb: write\n" err="cannot populate chunk 7733071839237: not found" level=warn ts=2021-06-15T03:00:07.300Z caller=manager.go:601 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate1d\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])))\n + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])))\n / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\nlabels:\n verb: write\n" err="cannot populate chunk 6800124411917: not found" level=warn ts=2021-06-15T07:00:09.090Z caller=manager.go:601 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate3d\nexpr: label_replace(sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\n / scalar(sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))),\n \"type\", \"error\", \"_none_\", \"\") or label_replace((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",scope=\"resource\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))\n - (sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.1\",scope=\"resource\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))\n or vector(0))) / scalar(sum(rate(apiserver_request_total{job=\"apiserver\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))),\n \"type\", \"slow-resource\", \"_none_\", \"\") or label_replace((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",scope=\"namespace\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))\n - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.5\",scope=\"namespace\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d])))\n / scalar(sum(rate(apiserver_request_total{job=\"apiserver\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))),\n \"type\", \"slow-namespace\", \"_none_\", \"\") or label_replace((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",scope=\"cluster\",verb=~\"LIST|GET\"}[3d]))\n - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"5\",scope=\"cluster\",verb=~\"LIST|GET\"}[3d])))\n / scalar(sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))),\n \"type\", \"slow-cluster\", \"_none_\", \"\")\nlabels:\n verb: read\n" err="cannot populate chunk 7027355025423: not found" ...
Test with payload 4.9.0-0.nightly-2021-09-05-192114 $ oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | grep "cannot populate chunk" no result $ oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | grep "Evaluating rule failed" no result
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759