Bug 2179071 - [RHOSP 17.1] Libpodstats doesn't report cpu and memory usage for ceph services
Summary: [RHOSP 17.1] Libpodstats doesn't report cpu and memory usage for ceph services
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: collectd-libpod-stats
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: beta
: 17.1
Assignee: Yadnesh Kulkarni
QA Contact: Leonid Natapov
mgeary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-16 15:02 UTC by Yadnesh Kulkarni
Modified: 2023-08-16 01:15 UTC (History)
4 users (show)

Fixed In Version: collectd-libpod-stats-1.0.5-4.el9ost
Doc Type: Bug Fix
Doc Text:
Before this update, the collectd plugin libpodstats could not gather metrics because the Cgroup path to Ceph containers changed in RHEL 9 from `/sys/fs/cgroup/machine.slice` to `/sys/fs/cgroup/system.slice/system-ceph<FSID>`. With this update, libpodstats can now parse CPU and memory metrics from cgroups under the new path.
Clone Of:
Environment:
Last Closed: 2023-08-16 01:14:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github infrawatch collectd-libpod-stats pull 6 0 None open Parse records of containers using volatile overlay mounts 2023-03-21 15:35:33 UTC
Red Hat Issue Tracker OSP-23181 0 None None None 2023-03-16 15:03:38 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:15:01 UTC

Description Yadnesh Kulkarni 2023-03-16 15:02:24 UTC
Description of problem:

Ceph memory and cpu usage panels in cloud view dashboard remain empty. Restarting collectd and metrics_qdr containers didn't help.

Collectd libpodtstats is not reporting these metrics about ceph osds, mons & mgr.
~~~
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-crash-ceph-0", service="default-osp-coll-meter", type_instance="base"} 10022912
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-0", service="default-osp-coll-meter", type_instance="base"} 367702016
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-1", service="default-osp-coll-meter", type_instance="base"} 325013504
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-2", service="default-osp-coll-meter", type_instance="base"} 415793152
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-3", service="default-osp-coll-meter", type_instance="base"} 401096704
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-4", service="default-osp-coll-meter", type_instance="base"} 467902464
~~~

This was also observed with "collectd_libpodstats_pod_cpu_percent".

Version-Release number of selected component (if applicable):
collectd-libpod-stats-1.0.4-4.el9ost.x86_64

How reproducible:
Deploy STF 1.5.1 with OSP 17.1

Comment 1 Martin Magr 2023-03-20 14:02:53 UTC
collectd-libpodstats is dependent on container records in containers.json file. For some reason all the ceph container does not have it's record in the file:

[root@ceph-0 ~]# podman ps
CONTAINER ID  IMAGE                                                                                                                           COMMAND               CREATED     STATUS               PORTS       NAMES
f908a497d264  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n client.crash.c...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-crash-ceph-0
d9d1fbbe15ca  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.0 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-0
19aa3f83f69e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.1 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-1
6565775a4c50  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.2 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-2
f4bf2b73938c  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.3 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-3
e56ccf568fd0  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.4 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-4
3f61d78a6ddd  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-rsyslog:17.1_20230314.1                                       kolla_start           2 days ago  Up 2 days                        rsyslog
813668fbcf6f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-collectd:17.1_20230314.1                                      kolla_start           2 days ago  Up 2 days (healthy)              collectd
207760d59d07  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-cron:17.1_20230314.1                                          kolla_start           2 days ago  Up 2 days (healthy)              logrotate_crond
21600f8d2927  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-qdrouterd:17.1_20230314.1                                     kolla_start           2 days ago  Up 2 days (healthy)              metrics_qdr
[root@ceph-0 ~]# cat /var/lib/containers/storage/overlay-containers/containers.json | jq -c '.[] | select(.names[])'
{"id":"a611f3fe7c1174f5634c4a9dbc61f0c3c6b67e6f1a3f6035fd2c740aa679cbc4","names":["container-puppet-rsyslog"],"image":"93c7e6650887e88d8f8608a81b720cc454b69dadce87173802d5b20040e0d161","layer":"b033a484b92ecd2be921b8617598d30d7cae8083f60379c5526fea7fe067542e","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-rsyslog:17.1_20230314.1\",\"image-id\":\"93c7e6650887e88d8f8608a81b720cc454b69dadce87173802d5b20040e0d161\",\"name\":\"container-puppet-rsyslog\",\"created-at\":1679138662}","created":"2023-03-18T11:24:22.591904997Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"0eb4ef7bc0a7e0a3fd05242aa710ffbca930d2bbd02da6f9eeddc57005535404","names":["container-puppet-metrics_qdr"],"image":"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d","layer":"2aad193e4c60706e2d2bdf23c951cbcc371dac685ed342341baef24eb1fd0604","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-qdrouterd:17.1_20230314.1\",\"image-id\":\"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d\",\"name\":\"container-puppet-metrics_qdr\",\"created-at\":1679138662}","created":"2023-03-18T11:24:22.948726884Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"1b7b6f6f0da3383f2c239d9ac713344965e743450cc09c5dc585c864b59d8206","names":["container-puppet-crond"],"image":"101a64a8a73ce2c853b8eb3bd2b4932ad0e831bde499c9575b9111cffa7476c6","layer":"3baf1c29e4d8651e376aa676c6feb7e9474fdfcbd3f67b4a9aff0663da097e6d","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-cron:17.1_20230314.1\",\"image-id\":\"101a64a8a73ce2c853b8eb3bd2b4932ad0e831bde499c9575b9111cffa7476c6\",\"name\":\"container-puppet-crond\",\"created-at\":1679138662}","created":"2023-03-18T11:24:22.957957376Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"2de8ac0a26badc9b449f3f1ba43dcef0e4d84ca33029b51bb9d5e7397cb0de7c","names":["container-puppet-collectd"],"image":"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c","layer":"0277a686f7036ceddc25a79dad1da36adfacb5312c337c1d9db5e17fb114ca9d","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-collectd:17.1_20230314.1\",\"image-id\":\"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c\",\"name\":\"container-puppet-collectd\",\"created-at\":1679138663}","created":"2023-03-18T11:24:23.008527557Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"d48f5ca9b34ae86b516b8fddfde26623767dbabb257c7ce9b7ea3b57beb0d6fa","names":["metrics_qdr_init_logs"],"image":"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d","layer":"d54aaa272693a90ab0a6f97983cfd0d603b1097f97ca3a2f224ccfb7e8a05bbd","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-qdrouterd:17.1_20230314.1\",\"image-id\":\"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d\",\"name\":\"metrics_qdr_init_logs\",\"created-at\":1679138689}","created":"2023-03-18T11:24:49.280717372Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c779,c831","ProcessLabel":"system_u:system_r:container_t:s0:c779,c831"}}
{"id":"3a0f3e8e3310db249ba2026f9494c177f5770ca16954cebc3fac71c0924df8cb","names":["collectd_init_perm"],"image":"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c","layer":"73aa6065ff786529cb2ae3c45e1db12fb0244b5a6f3f2ef9df8012bd0bc06bbe","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-collectd:17.1_20230314.1\",\"image-id\":\"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c\",\"name\":\"collectd_init_perm\",\"created-at\":1679139087}","created":"2023-03-18T11:31:27.165449235Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c267,c591","ProcessLabel":"system_u:system_r:container_t:s0:c267,c591"}}
{"id":"3f61d78a6dddd985e81c3e4f840dc5d5de4a8a5d13cb27585d9560b863d92b1b","names":["rsyslog"],"image":"93c7e6650887e88d8f8608a81b720cc454b69dadce87173802d5b20040e0d161","layer":"9012b1ba5385809732291c0293b33ed36752882c11f1857ce0e3895cf058fa29","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-rsyslog:17.1_20230314.1\",\"image-id\":\"93c7e6650887e88d8f8608a81b720cc454b69dadce87173802d5b20040e0d161\",\"name\":\"rsyslog\",\"created-at\":1679139514}","created":"2023-03-18T11:38:34.606010419Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","MountOpts":["metacopy=on"],"ProcessLabel":""}}
{"id":"813668fbcf6f1a92a499465bfcd5823858f852af0adcdaf4c2e85c655247bc31","names":["collectd"],"image":"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c","layer":"068c2a1928b86460f5cbf7d95cd2ad7213df52bc62c91ed04fc86e2273fb0396","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-collectd:17.1_20230314.1\",\"image-id\":\"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c\",\"name\":\"collectd\",\"created-at\":1679139514}","created":"2023-03-18T11:38:34.674544176Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"207760d59d078680b017927aff1cb4d8594a98c80e4d23690a7ed503b57d4957","names":["logrotate_crond"],"image":"101a64a8a73ce2c853b8eb3bd2b4932ad0e831bde499c9575b9111cffa7476c6","layer":"ac4017bd7f9307b2ff52afe8feac0b68aadb9a6dbfb191b5988e35cd97e657c2","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-cron:17.1_20230314.1\",\"image-id\":\"101a64a8a73ce2c853b8eb3bd2b4932ad0e831bde499c9575b9111cffa7476c6\",\"name\":\"logrotate_crond\",\"created-at\":1679139907}","created":"2023-03-18T11:45:07.037630905Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","MountOpts":["metacopy=on"],"ProcessLabel":""}}
{"id":"21600f8d292776962daef51032bc3dc683431ce826c57c625b9407dd80d06e8d","names":["metrics_qdr"],"image":"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d","layer":"912e5fd65f8b104112aae0320dbff4131a02fd793e6f2b000ce250fe5ca93acb","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-qdrouterd:17.1_20230314.1\",\"image-id\":\"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d\",\"name\":\"metrics_qdr\",\"created-at\":1679146521}","created":"2023-03-18T13:35:21.595933021Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c78,c531","ProcessLabel":"system_u:system_r:container_t:s0:c78,c531"}}
[root@ceph-0 ~]#

Will have to investigate what is the reason and if it is possible to have the records for all ceph containers again. If it won't be possible, we will have to adapt collectd-libpodstats to get the data some other way.

Comment 3 Yadnesh Kulkarni 2023-03-21 05:54:11 UTC
So there's another file under the same directory which holds the records for ceph services
~~~
[root@controller-2 ~]# cat /var/lib/containers/storage/overlay-containers/volatile-containers.json | jq -c '.[] | select(.names[])'
{"id":"d453aff45980224d51a5a3472ed2837e027995a08080c1c68cd68055f10e4f41","names":["ceph-71168c27-fd76-5750-9462-4f236bedb0ec-mon-controller-2"],"image":"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320","layer":"cfb35d17839905918c1a5c432e4d369cf00f8fd718a3e9c67f4418270e0f427c","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:62bb7f2797522866948caf50fb0e517e12a0e3db9113319bcf32ed3a25c1f1b0\",\"image-id\":\"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320\",\"name\":\"ceph-71168c27-fd76-5750-9462-4f236bedb0ec-mon-controller-2\",\"created-at\":1678181207}","created":"2023-03-07T09:26:47.375894872Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","MountOpts":["metacopy=on"],"ProcessLabel":"","Volatile":true}}
{"id":"61231c353d4cac1b4b90774fd6e34ea2cd108a697e97027ca7abf20fc8241f8f","names":["ceph-71168c27-fd76-5750-9462-4f236bedb0ec-mgr-controller-2-cyfrwp"],"image":"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320","layer":"9ee90047e514c0135d29aee1ec19f807fd08d26e3203ea662a408d705d4f8cc9","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:62bb7f2797522866948caf50fb0e517e12a0e3db9113319bcf32ed3a25c1f1b0\",\"image-id\":\"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320\",\"name\":\"ceph-71168c27-fd76-5750-9462-4f236bedb0ec-mgr-controller-2-cyfrwp\",\"created-at\":1678181218}","created":"2023-03-07T09:26:58.115355378Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":"","Volatile":true}}
{"id":"f517fdb3d610e41bf732101cfb9dbb3a5eabbf4d02a391d30258d5daad19ecdd","names":["ceph-71168c27-fd76-5750-9462-4f236bedb0ec-crash-controller-2"],"image":"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320","layer":"56064a1840b17ff42e3f0cc29321faffa236967b712c38e6d2fe2ea88d684cbd","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:62bb7f2797522866948caf50fb0e517e12a0e3db9113319bcf32ed3a25c1f1b0\",\"image-id\":\"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320\",\"name\":\"ceph-71168c27-fd76-5750-9462-4f236bedb0ec-crash-controller-2\",\"created-at\":1678181222}","created":"2023-03-07T09:27:02.824753764Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":"","Volatile":true}}
{"id":"c832de9901617590cd409455d7279e8064ccd69fc87831f43d5a9014c9d2cead","names":["ceph-71168c27-fd76-5750-9462-4f236bedb0ec-rgw-rgw-controller-2-fwygiv"],"image":"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320","layer":"42893240a81d07e6fc842e23b852146158aceeb7df8080778fce0ce75abcdd59","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:62bb7f2797522866948caf50fb0e517e12a0e3db9113319bcf32ed3a25c1f1b0\",\"image-id\":\"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320\",\"name\":\"ceph-71168c27-fd76-5750-9462-4f236bedb0ec-rgw-rgw-controller-2-fwygiv\",\"created-at\":1678184543}","created":"2023-03-07T10:22:23.580194978Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":"","Volatile":true}}
~~~

Comment 7 Leonid Natapov 2023-05-07 15:21:42 UTC
ceph memory and usage in cloud view dashboard grafana now show metrics.

tested with collectd-libpod-stats-1.0.5-4.el9ost.x86_64

Comment 16 errata-xmlrpc 2023-08-16 01:14:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.