Bug 1725900

Summary: healthcheck for nova-api-metadata checks with `pgrep -f nova-metadata` but should pgrep for `nova-api-metadata`
Product: Red Hat OpenStack Reporter: Andreas Karis <akaris>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED DUPLICATE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 14.0 (Rocky)CC: dasmith, eglynn, jhakimra, kchamart, mschuppe, sbauza, sgordon, vromanso
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-02 07:14:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2019-07-01 17:11:38 UTC
Description of problem:
healthcheck for nova-api-metadata checks with `pgrep -f nova-metadata` but should pgrep for `nova-api-metadata`

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Upstream, this looks like this:
https://github.com/openstack/tripleo-common/blob/master/healthcheck/nova-metadata
~~~
#!/bin/bash

. ${HEALTHCHECK_SCRIPTS:-/usr/share/openstack-tripleo-common/healthcheck}/common.sh

check_url=$(get_url_from_vhost /etc/httpd/conf.d/10-nova_metadata_wsgi.conf)
healthcheck_curl ${check_url}
~~~

In OSP 13 and 14, these seem to run from the same image:
~~~
bad44d99e919        172.19.3.254:8787/rhosp14/openstack-nova-api:14.0-109.1560457048                    "kolla_start"            About an hour ago   Up About an hour (unhealthy)                        nova_metadata
bd00770fca9d        172.19.3.254:8787/rhosp14/openstack-nova-api:14.0-109.1560457048                    "kolla_start"            About an hour ago   Up About an hour (healthy)                          nova_api
~~~

In OSP 13, this looks like this:
~~~
[root@overcloud-controller-0 ~]# docker exec -it nova_metadata cat /openstack/healthcheck
#!/bin/sh

. ${HEALTHCHECK_SCRIPTS:-/usr/share/openstack-tripleo-common/healthcheck}/common.sh

check_url=$(get_url_from_vhost /etc/httpd/conf.d/10-nova_api_wsgi.conf)
healthcheck_curl ${check_url}
~~~

Within the OSP 14 container, I see:
~~~
()[nova@overcloud-controller-0 /]$ cat /openstack/healthcheck 
#!/bin/sh

. ${HEALTHCHECK_SCRIPTS:-/usr/share/openstack-tripleo-common/healthcheck}/common.sh


if pgrep -f nova-metadata; then
    check_url=$(get_url_from_vhost /etc/httpd/conf.d/10-nova_metadata_wsgi.conf)
else
    check_url=$(get_url_from_vhost /etc/httpd/conf.d/10-nova_api_wsgi.conf)
fi

healthcheck_curl ${check_url}
~~~

However, the test in the if condition is bad, because:
~~~
()[nova@overcloud-controller-0 /]$ pgrep -f nova-metadata
()[nova@overcloud-controller-0 /]$ ps aux | grep nova-metadat
nova         287  0.0  0.0   9088   668 ?        S+   16:49   0:00 grep nova-metadat
()[nova@overcloud-controller-0 /]$ ps aux | grep nova
nova           1  1.6  1.1 341896 94836 ?        Ss   16:42   0:06 /usr/bin/python2 /usr/bin/nova-api-metadata
nova         240  0.1  0.0  11816  1780 ?        Ss   16:48   0:00 /bin/bash
nova         294  0.0  0.0  51740  1740 ?        R+   16:49   0:00 ps aux
nova         295  0.0  0.0   9088   672 ?        S+   16:49   0:00 grep nova
()[nova@overcloud-controller-0 /]$ 
~~~

However, within the nova-metadata container on my system, I see the API file as well, and hence the health check reports as healthy:
~~~
()[nova@overcloud-controller-0 /]$ ls /etc/httpd/conf.d/10-nova_api_wsgi.conf
/etc/httpd/conf.d/10-nova_api_wsgi.conf
~~~

But in a customer environment, this fails on the healthcheck ...

Comment 1 Andreas Karis 2019-07-01 17:51:25 UTC
Hi,

Sorry - I ran an OSP 14 container within OSP 13 and just asked the customer to provide:

[root@overcloud-controller-0 ~]# docker exec -it nova_metadata /usr/bin/bash
()[root@overcloud-controller-0 /]# ps aux | grep nova
nova          17  0.0  0.0 636696 100416 ?       Sl   Jun28   0:21 nova_metadata_w -DFOREGROUND
nova          18  0.0  0.0 636696 100408 ?       Sl   Jun28   0:22 nova_metadata_w -DFOREGROUND
nova          19  0.0  0.0 636696 100384 ?       Sl   Jun28   0:22 nova_metadata_w -DFOREGROUND
nova          20  0.0  0.0 636696 100380 ?       Sl   Jun28   0:21 nova_metadata_w -DFOREGROUND
nova          21  0.0  0.0 636696 100380 ?       Sl   Jun28   0:22 nova_metadata_w -DFOREGROUND
nova          22  0.0  0.0 636696 100404 ?       Sl   Jun28   0:21 nova_metadata_w -DFOREGROUND
nova          23  0.0  0.0 636696 100380 ?       Sl   Jun28   0:22 nova_metadata_w -DFOREGROUND
nova          24  0.0  0.0 636696 100380 ?       Sl   Jun28   0:21 nova_metadata_w -DFOREGROUND
nova          25  0.0  0.0 636696 100380 ?       Sl   Jun28   0:22 nova_metadata_w -DFOREGROUND
nova          26  0.0  0.0 636696 100380 ?       Sl   Jun28   0:22 nova_metadata_w -DFOREGROUND
nova          27  0.0  0.0 636696 100384 ?       Sl   Jun28   0:22 nova_metadata_w -DFOREGROUND
nova          28  0.0  0.0 636696 100380 ?       Sl   Jun28   0:22 nova_metadata_w -DFOREGROUND
root      179258  0.0  0.0 112708   976 ?        S+   17:46   0:00 grep --color=auto nova

The customer modified the healthcheck script last week to replace the - in nova-metadata with _ in the pgrep line which results in the health check script returning the expected result.

So in either case, the healthcheck is wrong and needs to be modified, we'll just have to make sure with what ;-)

Comment 2 Andreas Karis 2019-07-01 17:53:26 UTC
And one more clarification: the customer runs OSP 14 with an OSP 14 container ;-)

Comment 3 Martin Schuppert 2019-07-02 07:14:10 UTC
Hi Andreas, this is a duplicate of 1700760 and is waiting to be released.

*** This bug has been marked as a duplicate of bug 1700760 ***