Bug 2010466
| Summary: | swift_rsync healthcheck failing intermittently | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jean-Francois Beaudoin <jbeaudoi> |
| Component: | openstack-tripleo-common | Assignee: | Adriano Petrich <apetrich> |
| Status: | CLOSED DUPLICATE | QA Contact: | David Rosenfeld <drosenfe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.1 (Train) | CC: | cschwede, derekh, hyunpark, mburns, slinaber, zaitcev |
| Target Milestone: | z8 | ||
| Target Release: | 16.1 (Train on RHEL 8.2) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-05 06:07:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: swift_rsync healthcheck failing intermittently with "There is no rsync process, listening on port(s) 873, running in the container." Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 16.1.6 GA (Train) puppet-swift-15.4.1-1.20201113214014.cc79e4a.el8ost.noarch Mon Jul 26 05:41:22 2021 python3-swiftclient-3.8.1-1.20201116095744.72b90fe.el8ost.noarch Mon Jul 26 05:41:41 2021 puppet-rsync-1.1.1-1.20201113174003.a7d4f84.el8ost.noarch Mon Jul 26 05:41:11 2021 rsync-3.1.3-7.el8.x86_64 Mon Jul 26 05:40:32 2021 How reproducible: It happens intermittently, it can be reproduced by running the healthcheck a few time. Steps to Reproduce: 1.Run /usr/bin/podman exec --user root swift_rsync /openstack/healthcheck 2. 3. Actual results: swift_rsync healthcheck failing intermittently. Expected results: The swift_rsync healthcheck not failing. Additional info: Oct 1 15:10:13 director_lab healthcheck_swift_rsync[139438]: Error: non zero exit code: 1: OCI runtime error Oct 1 15:38:28 director_lab healthcheck_swift_rsync[264766]: Error: non zero exit code: 1: OCI runtime error Sep 10 09:55:25 director_lab healthcheck_swift_rsync[479105]: There is no rsync process, listening on port(s) 873, running in the container. Sep 10 09:55:25 director_lab podman[479104]: 2021-09-10 09:55:25.285049838 -0500 CDT m=+0.641654817 container exec dffd68fe5438157114378d03791a33b92d0aac5cfbac4b2f13cda071315d808b (image=xxxxx:5000/xxxxxx_rhosp_16_1-osp16_1-nova-api:16.1, name=nova_api) Sep 10 09:55:25 director_lab systemd[1]: Starting glance_api healthcheck... Sep 10 09:55:25 director_lab healthcheck_swift_rsync[479105]: Error: non zero exit code: 1: OCI runtime error Sep 10 09:55:25 director_lab systemd[1]: tripleo_swift_rsync_healthcheck.service: Main process exited, code=exited, status=1/FAILURE Sep 10 09:55:25 director_lab systemd[1]: tripleo_swift_rsync_healthcheck.service: Failed with result 'exit-code'. Sep 10 09:55:25 director_lab systemd[1]: Failed to start swift_rsync healthcheck. [stack@director_lab ~]$ sudo /usr/bin/podman exec --user root -it swift_rsync bash ()[root@director_lab /]# ss -tulpn | grep -i 873 tcp LISTEN 0 5 10.150.155.13:873 0.0.0.0:* users:(("rsync",pid=7,fd=4)) Looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1979784 We've tried to apply the workaround but it didn't resolve the issue. ~~~ [root@director_lab ~]# grep -A10 healthcheck_listen /usr/share/openstack-tripleo-common/healthcheck/common.sh healthcheck_listen () { process=$1 shift 1 args=$@ ports=${args// /|} pids=$(pgrep -d '|' -f $process) # ss -lnp | grep -qE ":($ports).*,pid=($pids)," ss -lnp | grep -E ":($ports).*,pid=($pids)," } ~~~