Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2010466

Summary: swift_rsync healthcheck failing intermittently
Product: Red Hat OpenStack Reporter: Jean-Francois Beaudoin <jbeaudoi>
Component: openstack-tripleo-commonAssignee: Adriano Petrich <apetrich>
Status: CLOSED DUPLICATE QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: cschwede, derekh, hyunpark, mburns, slinaber, zaitcev
Target Milestone: z8   
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-05 06:07:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jean-Francois Beaudoin 2021-10-04 17:52:37 UTC
Description of problem:
swift_rsync healthcheck failing intermittently with "There is no rsync process, listening on port(s) 873, running in the container."

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.6 GA (Train)
puppet-swift-15.4.1-1.20201113214014.cc79e4a.el8ost.noarch  Mon Jul 26 05:41:22 2021
python3-swiftclient-3.8.1-1.20201116095744.72b90fe.el8ost.noarch Mon Jul 26 05:41:41 2021
puppet-rsync-1.1.1-1.20201113174003.a7d4f84.el8ost.noarch   Mon Jul 26 05:41:11 2021
rsync-3.1.3-7.el8.x86_64                                    Mon Jul 26 05:40:32 2021


How reproducible:
It happens intermittently, it can be reproduced by running the healthcheck a few time.

Steps to Reproduce:
1.Run /usr/bin/podman exec --user root swift_rsync /openstack/healthcheck
2.
3.

Actual results:
swift_rsync healthcheck failing intermittently.

Expected results:
The swift_rsync healthcheck not failing.

Additional info:
Oct  1 15:10:13 director_lab healthcheck_swift_rsync[139438]: Error: non zero exit code: 1: OCI runtime error
Oct  1 15:38:28 director_lab healthcheck_swift_rsync[264766]: Error: non zero exit code: 1: OCI runtime error

Sep 10 09:55:25 director_lab healthcheck_swift_rsync[479105]: There is no rsync process, listening on port(s) 873, running in the container.
Sep 10 09:55:25 director_lab podman[479104]: 2021-09-10 09:55:25.285049838 -0500 CDT m=+0.641654817 container exec dffd68fe5438157114378d03791a33b92d0aac5cfbac4b2f13cda071315d808b (image=xxxxx:5000/xxxxxx_rhosp_16_1-osp16_1-nova-api:16.1, name=nova_api)
Sep 10 09:55:25 director_lab systemd[1]: Starting glance_api healthcheck...
Sep 10 09:55:25 director_lab healthcheck_swift_rsync[479105]: Error: non zero exit code: 1: OCI runtime error
Sep 10 09:55:25 director_lab systemd[1]: tripleo_swift_rsync_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Sep 10 09:55:25 director_lab systemd[1]: tripleo_swift_rsync_healthcheck.service: Failed with result 'exit-code'.
Sep 10 09:55:25 director_lab systemd[1]: Failed to start swift_rsync healthcheck.


[stack@director_lab ~]$ sudo /usr/bin/podman exec --user root -it swift_rsync bash
()[root@director_lab /]# ss -tulpn | grep -i 873
tcp     LISTEN   0        5          10.150.155.13:873            0.0.0.0:*      users:(("rsync",pid=7,fd=4))

Looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1979784
We've tried to apply the workaround but it didn't resolve the issue.
~~~
[root@director_lab ~]# grep -A10 healthcheck_listen /usr/share/openstack-tripleo-common/healthcheck/common.sh
healthcheck_listen () {
    process=$1

    shift 1
    args=$@
    ports=${args// /|}
    pids=$(pgrep -d '|' -f $process)
#    ss -lnp | grep -qE ":($ports).*,pid=($pids),"
    ss -lnp | grep -E ":($ports).*,pid=($pids),"
}
~~~