Bug 2179071

Summary: [RHOSP 17.1] Libpodstats doesn't report cpu and memory usage for ceph services
Product: Red Hat OpenStack Reporter: Yadnesh Kulkarni <ykulkarn>
Component: collectd-libpod-statsAssignee: Yadnesh Kulkarni <ykulkarn>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact: mgeary <mgeary>
Priority: medium    
Version: 17.1 (Wallaby)CC: jelynch, joflynn, lmadsen, mmagr
Target Milestone: betaKeywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: collectd-libpod-stats-1.0.5-4.el9ost Doc Type: Bug Fix
Doc Text:
Before this update, the collectd plugin libpodstats could not gather metrics because the Cgroup path to Ceph containers changed in RHEL 9 from `/sys/fs/cgroup/machine.slice` to `/sys/fs/cgroup/system.slice/system-ceph<FSID>`. With this update, libpodstats can now parse CPU and memory metrics from cgroups under the new path.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:14:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yadnesh Kulkarni 2023-03-16 15:02:24 UTC
Description of problem:

Ceph memory and cpu usage panels in cloud view dashboard remain empty. Restarting collectd and metrics_qdr containers didn't help.

Collectd libpodtstats is not reporting these metrics about ceph osds, mons & mgr.
~~~
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-crash-ceph-0", service="default-osp-coll-meter", type_instance="base"} 10022912
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-0", service="default-osp-coll-meter", type_instance="base"} 367702016
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-1", service="default-osp-coll-meter", type_instance="base"} 325013504
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-2", service="default-osp-coll-meter", type_instance="base"} 415793152
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-3", service="default-osp-coll-meter", type_instance="base"} 401096704
collectd_libpodstats_pod_memory{container="sg-core", endpoint="prom-https", host="ceph-0.redhat.local", plugin_instance="ceph-osd-4", service="default-osp-coll-meter", type_instance="base"} 467902464
~~~

This was also observed with "collectd_libpodstats_pod_cpu_percent".

Version-Release number of selected component (if applicable):
collectd-libpod-stats-1.0.4-4.el9ost.x86_64

How reproducible:
Deploy STF 1.5.1 with OSP 17.1

Comment 1 Martin Magr 2023-03-20 14:02:53 UTC
collectd-libpodstats is dependent on container records in containers.json file. For some reason all the ceph container does not have it's record in the file:

[root@ceph-0 ~]# podman ps
CONTAINER ID  IMAGE                                                                                                                           COMMAND               CREATED     STATUS               PORTS       NAMES
f908a497d264  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n client.crash.c...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-crash-ceph-0
d9d1fbbe15ca  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.0 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-0
19aa3f83f69e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.1 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-1
6565775a4c50  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.2 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-2
f4bf2b73938c  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.3 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-3
e56ccf568fd0  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:f165a015a577bc7aeebb22355f01d471722cee7873fa0f7cd23e1c210e45f30e  -n osd.4 -f --set...  2 days ago  Up 2 days                        ceph-aef1676b-ed41-5bf7-abbd-0138af40edb5-osd-4
3f61d78a6ddd  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-rsyslog:17.1_20230314.1                                       kolla_start           2 days ago  Up 2 days                        rsyslog
813668fbcf6f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-collectd:17.1_20230314.1                                      kolla_start           2 days ago  Up 2 days (healthy)              collectd
207760d59d07  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-cron:17.1_20230314.1                                          kolla_start           2 days ago  Up 2 days (healthy)              logrotate_crond
21600f8d2927  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-qdrouterd:17.1_20230314.1                                     kolla_start           2 days ago  Up 2 days (healthy)              metrics_qdr
[root@ceph-0 ~]# cat /var/lib/containers/storage/overlay-containers/containers.json | jq -c '.[] | select(.names[])'
{"id":"a611f3fe7c1174f5634c4a9dbc61f0c3c6b67e6f1a3f6035fd2c740aa679cbc4","names":["container-puppet-rsyslog"],"image":"93c7e6650887e88d8f8608a81b720cc454b69dadce87173802d5b20040e0d161","layer":"b033a484b92ecd2be921b8617598d30d7cae8083f60379c5526fea7fe067542e","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-rsyslog:17.1_20230314.1\",\"image-id\":\"93c7e6650887e88d8f8608a81b720cc454b69dadce87173802d5b20040e0d161\",\"name\":\"container-puppet-rsyslog\",\"created-at\":1679138662}","created":"2023-03-18T11:24:22.591904997Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"0eb4ef7bc0a7e0a3fd05242aa710ffbca930d2bbd02da6f9eeddc57005535404","names":["container-puppet-metrics_qdr"],"image":"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d","layer":"2aad193e4c60706e2d2bdf23c951cbcc371dac685ed342341baef24eb1fd0604","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-qdrouterd:17.1_20230314.1\",\"image-id\":\"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d\",\"name\":\"container-puppet-metrics_qdr\",\"created-at\":1679138662}","created":"2023-03-18T11:24:22.948726884Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"1b7b6f6f0da3383f2c239d9ac713344965e743450cc09c5dc585c864b59d8206","names":["container-puppet-crond"],"image":"101a64a8a73ce2c853b8eb3bd2b4932ad0e831bde499c9575b9111cffa7476c6","layer":"3baf1c29e4d8651e376aa676c6feb7e9474fdfcbd3f67b4a9aff0663da097e6d","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-cron:17.1_20230314.1\",\"image-id\":\"101a64a8a73ce2c853b8eb3bd2b4932ad0e831bde499c9575b9111cffa7476c6\",\"name\":\"container-puppet-crond\",\"created-at\":1679138662}","created":"2023-03-18T11:24:22.957957376Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"2de8ac0a26badc9b449f3f1ba43dcef0e4d84ca33029b51bb9d5e7397cb0de7c","names":["container-puppet-collectd"],"image":"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c","layer":"0277a686f7036ceddc25a79dad1da36adfacb5312c337c1d9db5e17fb114ca9d","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-collectd:17.1_20230314.1\",\"image-id\":\"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c\",\"name\":\"container-puppet-collectd\",\"created-at\":1679138663}","created":"2023-03-18T11:24:23.008527557Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"d48f5ca9b34ae86b516b8fddfde26623767dbabb257c7ce9b7ea3b57beb0d6fa","names":["metrics_qdr_init_logs"],"image":"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d","layer":"d54aaa272693a90ab0a6f97983cfd0d603b1097f97ca3a2f224ccfb7e8a05bbd","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-qdrouterd:17.1_20230314.1\",\"image-id\":\"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d\",\"name\":\"metrics_qdr_init_logs\",\"created-at\":1679138689}","created":"2023-03-18T11:24:49.280717372Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c779,c831","ProcessLabel":"system_u:system_r:container_t:s0:c779,c831"}}
{"id":"3a0f3e8e3310db249ba2026f9494c177f5770ca16954cebc3fac71c0924df8cb","names":["collectd_init_perm"],"image":"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c","layer":"73aa6065ff786529cb2ae3c45e1db12fb0244b5a6f3f2ef9df8012bd0bc06bbe","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-collectd:17.1_20230314.1\",\"image-id\":\"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c\",\"name\":\"collectd_init_perm\",\"created-at\":1679139087}","created":"2023-03-18T11:31:27.165449235Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c267,c591","ProcessLabel":"system_u:system_r:container_t:s0:c267,c591"}}
{"id":"3f61d78a6dddd985e81c3e4f840dc5d5de4a8a5d13cb27585d9560b863d92b1b","names":["rsyslog"],"image":"93c7e6650887e88d8f8608a81b720cc454b69dadce87173802d5b20040e0d161","layer":"9012b1ba5385809732291c0293b33ed36752882c11f1857ce0e3895cf058fa29","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-rsyslog:17.1_20230314.1\",\"image-id\":\"93c7e6650887e88d8f8608a81b720cc454b69dadce87173802d5b20040e0d161\",\"name\":\"rsyslog\",\"created-at\":1679139514}","created":"2023-03-18T11:38:34.606010419Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","MountOpts":["metacopy=on"],"ProcessLabel":""}}
{"id":"813668fbcf6f1a92a499465bfcd5823858f852af0adcdaf4c2e85c655247bc31","names":["collectd"],"image":"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c","layer":"068c2a1928b86460f5cbf7d95cd2ad7213df52bc62c91ed04fc86e2273fb0396","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-collectd:17.1_20230314.1\",\"image-id\":\"bc28cfc7858c29e083a1fff1afc18f9ad276316462edc48d809473031d816f3c\",\"name\":\"collectd\",\"created-at\":1679139514}","created":"2023-03-18T11:38:34.674544176Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":""}}
{"id":"207760d59d078680b017927aff1cb4d8594a98c80e4d23690a7ed503b57d4957","names":["logrotate_crond"],"image":"101a64a8a73ce2c853b8eb3bd2b4932ad0e831bde499c9575b9111cffa7476c6","layer":"ac4017bd7f9307b2ff52afe8feac0b68aadb9a6dbfb191b5988e35cd97e657c2","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-cron:17.1_20230314.1\",\"image-id\":\"101a64a8a73ce2c853b8eb3bd2b4932ad0e831bde499c9575b9111cffa7476c6\",\"name\":\"logrotate_crond\",\"created-at\":1679139907}","created":"2023-03-18T11:45:07.037630905Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","MountOpts":["metacopy=on"],"ProcessLabel":""}}
{"id":"21600f8d292776962daef51032bc3dc683431ce826c57c625b9407dd80d06e8d","names":["metrics_qdr"],"image":"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d","layer":"912e5fd65f8b104112aae0320dbff4131a02fd793e6f2b000ce250fe5ca93acb","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-qdrouterd:17.1_20230314.1\",\"image-id\":\"6a9951d5ebd11bf996467dcb0238018e0df37c1958bdbc9cb9774a2426b5209d\",\"name\":\"metrics_qdr\",\"created-at\":1679146521}","created":"2023-03-18T13:35:21.595933021Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c78,c531","ProcessLabel":"system_u:system_r:container_t:s0:c78,c531"}}
[root@ceph-0 ~]#

Will have to investigate what is the reason and if it is possible to have the records for all ceph containers again. If it won't be possible, we will have to adapt collectd-libpodstats to get the data some other way.

Comment 3 Yadnesh Kulkarni 2023-03-21 05:54:11 UTC
So there's another file under the same directory which holds the records for ceph services
~~~
[root@controller-2 ~]# cat /var/lib/containers/storage/overlay-containers/volatile-containers.json | jq -c '.[] | select(.names[])'
{"id":"d453aff45980224d51a5a3472ed2837e027995a08080c1c68cd68055f10e4f41","names":["ceph-71168c27-fd76-5750-9462-4f236bedb0ec-mon-controller-2"],"image":"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320","layer":"cfb35d17839905918c1a5c432e4d369cf00f8fd718a3e9c67f4418270e0f427c","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:62bb7f2797522866948caf50fb0e517e12a0e3db9113319bcf32ed3a25c1f1b0\",\"image-id\":\"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320\",\"name\":\"ceph-71168c27-fd76-5750-9462-4f236bedb0ec-mon-controller-2\",\"created-at\":1678181207}","created":"2023-03-07T09:26:47.375894872Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","MountOpts":["metacopy=on"],"ProcessLabel":"","Volatile":true}}
{"id":"61231c353d4cac1b4b90774fd6e34ea2cd108a697e97027ca7abf20fc8241f8f","names":["ceph-71168c27-fd76-5750-9462-4f236bedb0ec-mgr-controller-2-cyfrwp"],"image":"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320","layer":"9ee90047e514c0135d29aee1ec19f807fd08d26e3203ea662a408d705d4f8cc9","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:62bb7f2797522866948caf50fb0e517e12a0e3db9113319bcf32ed3a25c1f1b0\",\"image-id\":\"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320\",\"name\":\"ceph-71168c27-fd76-5750-9462-4f236bedb0ec-mgr-controller-2-cyfrwp\",\"created-at\":1678181218}","created":"2023-03-07T09:26:58.115355378Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":"","Volatile":true}}
{"id":"f517fdb3d610e41bf732101cfb9dbb3a5eabbf4d02a391d30258d5daad19ecdd","names":["ceph-71168c27-fd76-5750-9462-4f236bedb0ec-crash-controller-2"],"image":"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320","layer":"56064a1840b17ff42e3f0cc29321faffa236967b712c38e6d2fe2ea88d684cbd","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:62bb7f2797522866948caf50fb0e517e12a0e3db9113319bcf32ed3a25c1f1b0\",\"image-id\":\"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320\",\"name\":\"ceph-71168c27-fd76-5750-9462-4f236bedb0ec-crash-controller-2\",\"created-at\":1678181222}","created":"2023-03-07T09:27:02.824753764Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":"","Volatile":true}}
{"id":"c832de9901617590cd409455d7279e8064ccd69fc87831f43d5a9014c9d2cead","names":["ceph-71168c27-fd76-5750-9462-4f236bedb0ec-rgw-rgw-controller-2-fwygiv"],"image":"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320","layer":"42893240a81d07e6fc842e23b852146158aceeb7df8080778fce0ce75abcdd59","metadata":"{\"image-name\":\"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph@sha256:62bb7f2797522866948caf50fb0e517e12a0e3db9113319bcf32ed3a25c1f1b0\",\"image-id\":\"34880245f74a1270bb43a8cd9a76f7799b1644a4784f1d7bcf7a144e8ad08320\",\"name\":\"ceph-71168c27-fd76-5750-9462-4f236bedb0ec-rgw-rgw-controller-2-fwygiv\",\"created-at\":1678184543}","created":"2023-03-07T10:22:23.580194978Z","flags":{"MountLabel":"system_u:object_r:container_file_t:s0:c1022,c1023","ProcessLabel":"","Volatile":true}}
~~~

Comment 7 Leonid Natapov 2023-05-07 15:21:42 UTC
ceph memory and usage in cloud view dashboard grafana now show metrics.

tested with collectd-libpod-stats-1.0.5-4.el9ost.x86_64

Comment 16 errata-xmlrpc 2023-08-16 01:14:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577