Bug 1906315 - "cannot populate chunk **" error in prometheus container logs
Summary: "cannot populate chunk **" error in prometheus container logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.9.0
Assignee: Arunprasad Rajkumar
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-10 09:11 UTC by Junqi Zhao
Modified: 2022-03-16 18:59 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:28:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
prometheus container logs (58.73 KB, text/plain)
2020-12-10 09:11 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1353 0 None None None 2021-09-03 10:40:20 UTC
Github prometheus prometheus issues 8221 0 None open cannot populate chunk N: not found 2021-02-21 13:38:01 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:29:10 UTC

Description Junqi Zhao 2020-12-10 09:11:08 UTC
Created attachment 1738143 [details]
prometheus container logs

Description of problem:
upgrade from 4.6.7 to 4.7.0-0.nightly-2020-12-04-013308, "cannot populate chunk **" error in prometheus container logs
# oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0
...
level=warn ts=2020-12-09T19:00:10.224Z caller=manager.go:598 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate6h\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h])) - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))) + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))) / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\nlabels:\n  verb: write\n" err="cannot populate chunk 626243141636: not found"
level=warn ts=2020-12-09T23:00:09.457Z caller=manager.go:598 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate3d\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[3d])) - ((sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.1\",scope=~\"resource|\",verb=~\"LIST|GET\"}[3d])) or vector(0)) + sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.5\",scope=\"namespace\",verb=~\"LIST|GET\"}[3d])) + sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"5\",scope=\"cluster\",verb=~\"LIST|GET\"}[3d])))) + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))) / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\nlabels:\n  verb: read\n" err="cannot populate chunk 14561113538563: not found"
level=warn ts=2020-12-10T05:00:10.138Z caller=manager.go:598 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate1d\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])) - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))) + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))) / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\nlabels:\n  verb: write\n" err="cannot populate chunk 14698049175561: not found"


# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- ls -alR /prometheus/
/prometheus/:
total 40
drwxrwsr-x. 6 root   nobody  4096 Dec 10 07:00 .
drwxr-xr-x. 1 root   root      17 Dec  9 13:17 ..
drwxr-sr-x. 3 nobody nobody  4096 Dec 10 05:00 01ES5GAC644TTBS1E9QPHZM9FK
drwxr-sr-x. 3 nobody nobody  4096 Dec 10 07:00 01ES5Q63E5317DHP4R43X0651Q
drwxrwsr-x. 2 nobody nobody  4096 Dec 10 07:00 chunks_head
-rw-rw-r--. 1 nobody nobody 20001 Dec 10 08:19 queries.active
drwxrwsr-x. 3 nobody nobody  4096 Dec 10 07:50 wal

/prometheus/01ES5GAC644TTBS1E9QPHZM9FK:
total 37296
drwxr-sr-x. 3 nobody nobody     4096 Dec 10 05:00 .
drwxrwsr-x. 6 root   nobody     4096 Dec 10 07:00 ..
drwxr-sr-x. 2 nobody nobody     4096 Dec 10 05:00 chunks
-rw-r--r--. 1 nobody nobody 38166650 Dec 10 05:00 index
-rw-r--r--. 1 nobody nobody      283 Dec 10 05:00 meta.json
-rw-r--r--. 1 nobody nobody        9 Dec 10 05:00 tombstones

/prometheus/01ES5GAC644TTBS1E9QPHZM9FK/chunks:
total 67532
drwxr-sr-x. 2 nobody nobody     4096 Dec 10 05:00 .
drwxr-sr-x. 3 nobody nobody     4096 Dec 10 05:00 ..
-rw-r--r--. 1 nobody nobody 69139866 Dec 10 05:00 000001

/prometheus/01ES5Q63E5317DHP4R43X0651Q:
total 40856
drwxr-sr-x. 3 nobody nobody     4096 Dec 10 07:00 .
drwxrwsr-x. 6 root   nobody     4096 Dec 10 07:00 ..
drwxr-sr-x. 2 nobody nobody     4096 Dec 10 07:00 chunks
-rw-r--r--. 1 nobody nobody 41815901 Dec 10 07:00 index
-rw-r--r--. 1 nobody nobody      283 Dec 10 07:00 meta.json
-rw-r--r--. 1 nobody nobody        9 Dec 10 07:00 tombstones

/prometheus/01ES5Q63E5317DHP4R43X0651Q/chunks:
total 86840
drwxr-sr-x. 2 nobody nobody     4096 Dec 10 07:00 .
drwxr-sr-x. 3 nobody nobody     4096 Dec 10 07:00 ..
-rw-r--r--. 1 nobody nobody 88909041 Dec 10 07:00 000001

/prometheus/chunks_head:
total 158712
drwxrwsr-x. 2 nobody nobody      4096 Dec 10 07:00 .
drwxrwsr-x. 6 root   nobody      4096 Dec 10 07:00 ..
-rw-r--r--. 1 nobody nobody 103782253 Dec 10 07:00 000014
-rw-r--r--. 1 nobody nobody  58718433 Dec 10 08:00 000015

/prometheus/wal:
total 523564
drwxrwsr-x. 3 nobody nobody      4096 Dec 10 07:50 .
drwxrwsr-x. 6 root   nobody      4096 Dec 10 07:00 ..
-rw-r--r--. 1 nobody nobody 134053888 Dec 10 05:50 00000034
-rw-r--r--. 1 nobody nobody 133988352 Dec 10 06:40 00000035
-rw-r--r--. 1 nobody nobody  53084160 Dec 10 07:00 00000036
-rw-r--r--. 1 nobody nobody 134119424 Dec 10 07:50 00000037
-rw-r--r--. 1 nobody nobody  80862549 Dec 10 08:19 00000038
drwxr-sr-x. 2 nobody nobody      4096 Dec 10 07:00 checkpoint.00000033

/prometheus/wal/checkpoint.00000033:
total 14472
drwxr-sr-x. 2 nobody nobody     4096 Dec 10 07:00 .
drwxrwsr-x. 3 nobody nobody     4096 Dec 10 07:50 ..
-rw-r--r--. 1 nobody nobody 14811136 Dec 10 07:00 00000000


Version-Release number of selected component (if applicable):
upgrade from 4.6.7 to 4.7.0-0.nightly-2020-12-04-013308

How reproducible:
not sure

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Junqi Zhao 2020-12-21 09:42:40 UTC
meet again with 4.7.0-0.nightly-2020-12-20-031835
level=warn ts=2020-12-21T09:00:09.070Z caller=manager.go:598 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate6h\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[6h])) - ((sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.1\",scope=~\"resource|\",verb=~\"LIST|GET\"}[6h])) or vector(0)) + sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.5\",scope=\"namespace\",verb=~\"LIST|GET\"}[6h])) + sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"5\",scope=\"cluster\",verb=~\"LIST|GET\"}[6h])))) + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))) / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))\nlabels:\n  verb: read\n" err="cannot populate chunk 2870363553792: not found"

Comment 9 Junqi Zhao 2021-06-16 13:26:22 UTC
also find in 4.8.0-0.nightly-2021-06-12-174011
# oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | grep "cannot populate chunk"
level=warn ts=2021-06-14T21:00:07.166Z caller=manager.go:601 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate6h\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\n  - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h])))\n  + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h])))\n  / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\nlabels:\n  verb: write\n" err="cannot populate chunk 7875292299265: not found"
level=warn ts=2021-06-15T01:00:06.810Z caller=manager.go:601 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate1d\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n  - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])))\n  + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])))\n  / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\nlabels:\n  verb: write\n" err="cannot populate chunk 7733071839237: not found"
level=warn ts=2021-06-15T03:00:07.300Z caller=manager.go:601 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate1d\nexpr: ((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n  - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])))\n  + sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d])))\n  / sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\nlabels:\n  verb: write\n" err="cannot populate chunk 6800124411917: not found"
level=warn ts=2021-06-15T07:00:09.090Z caller=manager.go:601 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: apiserver_request:burnrate3d\nexpr: label_replace(sum(rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\n  / scalar(sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))),\n  \"type\", \"error\", \"_none_\", \"\") or label_replace((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",scope=\"resource\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))\n  - (sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.1\",scope=\"resource\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))\n  or vector(0))) / scalar(sum(rate(apiserver_request_total{job=\"apiserver\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))),\n  \"type\", \"slow-resource\", \"_none_\", \"\") or label_replace((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",scope=\"namespace\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))\n  - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"0.5\",scope=\"namespace\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d])))\n  / scalar(sum(rate(apiserver_request_total{job=\"apiserver\",subresource!~\"proxy|log|exec\",verb=~\"LIST|GET\"}[3d]))),\n  \"type\", \"slow-namespace\", \"_none_\", \"\") or label_replace((sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",scope=\"cluster\",verb=~\"LIST|GET\"}[3d]))\n  - sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",le=\"5\",scope=\"cluster\",verb=~\"LIST|GET\"}[3d])))\n  / scalar(sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))),\n  \"type\", \"slow-cluster\", \"_none_\", \"\")\nlabels:\n  verb: read\n" err="cannot populate chunk 7027355025423: not found"
...

Comment 17 hongyan li 2021-09-06 05:14:16 UTC
Test with payload 4.9.0-0.nightly-2021-09-05-192114
$ oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | grep "cannot populate chunk"
no result
$ oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | grep "Evaluating rule failed"
no result

Comment 22 errata-xmlrpc 2021-10-18 17:28:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.