Bug 1902679 - healthcheck_gnocchi_statsd fails while active gnocchi-statsd process is running and listening on correct port
Summary: healthcheck_gnocchi_statsd fails while active gnocchi-statsd process is runni...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: z1
: 16.2 (Train on RHEL 8.4)
Assignee: Martin Magr
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-30 11:53 UTC by Alex Stupnikov
Modified: 2024-12-20 19:25 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-common-11.5.1-2.20210213010022.36ad9a1.el8ost.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-15 07:10:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1907485 0 None None None 2020-12-17 11:29:41 UTC
OpenStack gerrit 767293 0 None MERGED Switch gnocchi-statsd HC to healthcheck_port 2021-01-29 15:28:06 UTC
OpenStack gerrit 771554 0 None NEW Switch gnocchi-statsd HC to healthcheck_port 2021-01-29 15:28:06 UTC
Red Hat Bugzilla 1778881 0 high CLOSED Sorry, user {cinder,nova,heat} is not allowed to execute '/usr/sbin/ss -ntuap' as ... on controller-0. 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1878191 0 medium CLOSED [RFE] Add PluginInstanceFormat to puppet-collectd 2024-06-13 23:03:50 UTC
Red Hat Issue Tracker OSP-662 0 None None None 2022-01-20 18:36:32 UTC
Red Hat Issue Tracker OSP-663 0 None None None 2022-01-20 18:38:49 UTC
Red Hat Issue Tracker OSP-664 0 None None None 2022-01-20 18:38:51 UTC
Red Hat Issue Tracker OSP-670 0 None None None 2022-01-20 18:38:46 UTC
Red Hat Product Errata RHEA-2021:3483 0 None None None 2021-09-15 07:10:51 UTC

Description Alex Stupnikov 2020-11-30 11:53:10 UTC
Description of problem:

At this point Red Hat still supports legacy telemetry services in RHOSP 16.1.

healthcheck_gnocchi_statsd fails, but appropriate check commands return exit 0 when executed manually. Logs [1] tells us that the following command fails inside gnocchi_statsd container:
ss -lnp | grep -qE ":8125.*,pid=7,"

But I get exit 0 code when I try to execute same command manually. I kindly ask engineering to help me isolate the problem.

[1]
Oct 28 07:26:22 controller-1 systemd[1]: Starting gnocchi_statsd healthcheck...
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ : 10
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ : curl-healthcheck
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ : '\n%{http_code}' '%{remote_ip}:%{remote_port}' '%{time_total}' 'seconds\n'
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ : /dev/null
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + process=gnocchi-statsd
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ get_config_val /etc/gnocchi/gnocchi.conf statsd port 8125
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ crudini --get /etc/gnocchi/gnocchi.conf statsd port
Oct 28 07:26:22 controller-1 podman[981161]: 2020-10-28 07:26:22.883372853 +0000 UTC m=+0.236581960 container exec c3383a200100259e5a77d018105a1798272b4648f973d8ab688ba5021dfe7e8b (image=redhat_osp_rhel_8-gnocchi-statsd:16.1-51, name=gnocchi_statsd)
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ echo 8125
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + bind_port=8125
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + healthcheck_listen gnocchi-statsd 8125
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + process=gnocchi-statsd
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + shift 1
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + args=8125
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + ports=8125
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: ++ pgrep -d '|' -f gnocchi-statsd
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + pids=7
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + ss -lnp
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + grep -qE ':(8125).*,pid=(7),'
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + echo 'There is no gnocchi-statsd process listening on ports 8125 in the container.'
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: There is no gnocchi-statsd process listening on ports 8125 in the container.
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: + exit 1
Oct 28 07:26:22 controller-1 healthcheck_gnocchi_statsd[981161]: Error: non zero exit code: 1: OCI runtime error
Oct 28 07:26:22 controller-1 systemd[1]: tripleo_gnocchi_statsd_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Oct 28 07:26:22 controller-1 systemd[1]: tripleo_gnocchi_statsd_healthcheck.service: Failed with result 'exit-code'.
Oct 28 07:26:22 controller-1 systemd[1]: Failed to start gnocchi_statsd healthcheck.

Comment 4 Martin Magr 2020-12-15 09:23:25 UTC
This is happening, because the healthcheck is executed as root user:

[root@controller-0 ~]# ps -ef | grep gnocchi                                                                                                                                                                                           
root      339870       1  0 16:21 ?        00:00:00 /usr/bin/podman exec --user root gnocchi_metricd /openstack/healthcheck                                                                                                            
root      339874       1  1 16:21 ?        00:00:00 /usr/bin/podman exec --user root gnocchi_statsd /openstack/healthcheck                                                                                                             
root      339876       1  1 16:21 ?        00:00:00 /usr/bin/podman exec --user root gnocchi_api /openstack/healthcheck      
<snip>

And as can be seen below, the output of ss is different when executed as root and as proper user:

[root@controller-0 ~]# podman exec -it gnocchi_statsd bash
()[gnocchi@controller-0 /]$ ss -lnp | grep 8125
udp                UNCONN              0                    0                                                                                           0.0.0.0:8125                 0.0.0.0:*          users:(("gnocchi-statsd",pid=6,fd=8))   
()[gnocchi@controller-0 /]$ exit
exit
[root@controller-0 ~]# podman exec -uroot -it gnocchi_statsd bash
()[root@controller-0 /]# ss -lnp | grep 8125
udp                UNCONN              0                    0                                                                                           0.0.0.0:8125                                            0.0.0.0:*

Sadly usage of sudo in patch [1] was not implemented in healthcheck_listen, so we gonna need to fix that too now. 


[1] https://github.com/openstack/tripleo-common/commit/d03401438c22e59d4f51cedfd0af6d7d48328d45

Comment 8 errata-xmlrpc 2021-09-15 07:10:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:3483


Note You need to log in before you can comment on or make changes to this bug.