Description of problem: At this point Red Hat still supports legacy telemetry services in RHOSP 16.1. healthcheck_gnocchi_statsd fails, but appropriate check commands return exit 0 when executed manually. Logs [1] tells us that the following command fails inside gnocchi_statsd container: ss -lnp | grep -qE ":8125.*,pid=7," But I get exit 0 code when I try to execute same command manually. I kindly ask engineering to help me isolate the problem. [1] Oct 28 07:26:22 controller-1 systemd[1]: Starting gnocchi_statsd healthcheck... Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ : 10 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ : curl-healthcheck Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ : '\n%{http_code}' '%{remote_ip}:%{remote_port}' '%{time_total}' 'seconds\n' Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ : /dev/null Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + process=gnocchi-statsd Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ get_config_val /etc/gnocchi/gnocchi.conf statsd port 8125 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ crudini --get /etc/gnocchi/gnocchi.conf statsd port Oct 28 07:26:22 controller-1 podman[981161]: 2020-10-28 07:26:22.883372853 +0000 UTC m=+0.236581960 container exec c3383a200100259e5a77d018105a1798272b4648f973d8ab688ba5021dfe7e8b (image=redhat_osp_rhel_8-gnocchi-statsd:16.1-51, name=gnocchi_statsd) Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ echo 8125 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + bind_port=8125 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + healthcheck_listen gnocchi-statsd 8125 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + process=gnocchi-statsd Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + shift 1 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + args=8125 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + ports=8125 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ pgrep -d '|' -f gnocchi-statsd Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + pids=7 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + ss -lnp Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + grep -qE ':(8125).*,pid=(7),' Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + echo 'There is no gnocchi-statsd process listening on ports 8125 in the container.' Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: There is no gnocchi-statsd process listening on ports 8125 in the container. Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + exit 1 Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: Error: non zero exit code: 1: OCI runtime error Oct 28 07:26:22 controller-1 systemd[1]: tripleo_gnocchi_statsd_healthcheck.service: Main process exited, code=exited, status=1/FAILURE Oct 28 07:26:22 controller-1 systemd[1]: tripleo_gnocchi_statsd_healthcheck.service: Failed with result 'exit-code'. Oct 28 07:26:22 controller-1 systemd[1]: Failed to start gnocchi_statsd healthcheck.
This is happening, because the healthcheck is executed as root user: [root@controller-0 ~]# ps -ef | grep gnocchi root 339870 1 0 16:21 ? 00:00:00 /usr/bin/podman exec --user root gnocchi_metricd /openstack/healthcheck root 339874 1 1 16:21 ? 00:00:00 /usr/bin/podman exec --user root gnocchi_statsd /openstack/healthcheck root 339876 1 1 16:21 ? 00:00:00 /usr/bin/podman exec --user root gnocchi_api /openstack/healthcheck <snip> And as can be seen below, the output of ss is different when executed as root and as proper user: [root@controller-0 ~]# podman exec -it gnocchi_statsd bash ()[gnocchi@controller-0 /]$ ss -lnp | grep 8125 udp UNCONN 0 0 0.0.0.0:8125 0.0.0.0:* users:(("gnocchi-statsd",pid=6,fd=8)) ()[gnocchi@controller-0 /]$ exit exit [root@controller-0 ~]# podman exec -uroot -it gnocchi_statsd bash ()[root@controller-0 /]# ss -lnp | grep 8125 udp UNCONN 0 0 0.0.0.0:8125 0.0.0.0:* Sadly usage of sudo in patch [1] was not implemented in healthcheck_listen, so we gonna need to fix that too now. [1] https://github.com/openstack/tripleo-common/commit/d03401438c22e59d4f51cedfd0af6d7d48328d45
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483