Bug 1965233 - [FFU 13 -> 16.x] xinetd is running after upgrade, blocking swift_rsync container
Summary: [FFU 13 -> 16.x] xinetd is running after upgrade, blocking swift_rsync container
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z2
: 16.2 (Train on RHEL 8.4)
Assignee: Jesse Pretorius
QA Contact: Jason Grosso
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-27 09:04 UTC by Jose Luis Franco
Modified: 2022-03-23 22:29 UTC (History)
4 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20210625224810.1be1855.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-23 22:28:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 796824 0 None MERGED Remove xinetd when leapp upgrading the Undercloud. 2021-06-24 10:44:14 UTC
OpenStack gerrit 797768 0 None MERGED Do not run Swift rsync container in single replica mode 2021-09-21 15:45:57 UTC
Red Hat Issue Tracker OSP-4206 0 None None None 2022-01-28 15:41:01 UTC
Red Hat Issue Tracker UPG-3095 0 None None None 2021-09-21 15:48:34 UTC
Red Hat Product Errata RHSA-2022:0995 0 None None None 2022-03-23 22:29:04 UTC

Description Jose Luis Franco 2021-05-27 09:04:16 UTC
Description of problem:

After performing the Undercloud upgrade from 13 to 16.1 in all our CI jobs, we can observe that the swift_rsync container is Exited with exit code 10:

http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.1-from-13-latest_cdn-3cont_3db_3msg_2net_3hci-ipv6-ovs_dvr/127/undercloud-0/var/log/extra/podman/podman_allinfo.log.gz

eccf1f9f4dd6  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20210430.1               kolla_start           17 minutes ago  Exited (10) 17 minutes ago         swift_rsync

The exit code seems to be comming form the rsync command itself: https://lxadm.com/Rsync_exit_codes

10     Error in socket I/O

And when checking the journal messages:
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.1-from-13-latest_cdn-3cont_3db_3msg_2net_3hci-ipv6-ovs_dvr/127/undercloud-0/var/log/messages.gz

May 26 19:15:04 undercloud-0 systemd[1]: Started swift_rsync container.
May 26 19:15:04 undercloud-0 systemd[1]: Reloading.
May 26 19:15:04 undercloud-0 rsyncd[8]: rsyncd version 3.1.3 starting, listening on port 873
May 26 19:15:04 undercloud-0 rsyncd[8]: bind() failed: Address already in use (address-family 2)
May 26 19:15:04 undercloud-0 rsyncd[8]: unable to bind any inbound sockets on port 873
May 26 19:15:04 undercloud-0 rsyncd[8]: rsync error: error in socket IO (code 10) at socket.c(555) [Receiver=3.1.3]
May 26 19:15:04 undercloud-0 systemd[1]: libpod-eccf1f9f4dd661c2366656cad958131ac4627f5352ea6d1fde0d275840d170e4.scope: Consumed 335ms CPU time
May 26 19:15:04 undercloud-0 systemd[1]: Started swift_rsync container healthcheck.
May 26 19:15:04 undercloud-0 systemd[1]: Reloading.
May 26 19:15:04 undercloud-0 podman[65691]: 2021-05-26 19:15:04.892379584 +0000 UTC m=+0.085187997 container died eccf1f9f4dd661c2366656cad958131ac4627f5352ea6d1fde0d275840d170e4 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20210430.1, name=swift_rsync)
May 26 19:15:05 undercloud-0 podman[65691]: 2021-05-26 19:15:05.002947385 +0000 UTC m=+0.195755580 container cleanup eccf1f9f4dd661c2366656cad958131ac4627f5352ea6d1fde0d275840d170e4 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20210430.1, name=swift_rsync)
May 26 19:15:05 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Main process exited, code=exited, status=10/n/a
May 26 19:15:05 undercloud-0 systemd[1]: Reloading.
May 26 19:15:05 undercloud-0 podman[65738]: eccf1f9f4dd661c2366656cad958131ac4627f5352ea6d1fde0d275840d170e4
May 26 19:15:05 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Failed with result 'exit-code'.
May 26 19:15:05 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Service RestartSec=100ms expired, scheduling restart.
May 26 19:15:05 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Scheduled restart job, restart counter is at 1.
May 26 19:15:05 undercloud-0 systemd[1]: Stopped swift_rsync container.


And the container keeps restarting over and over:


61187 still running (3555)
May 26 19:15:21 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Service RestartSec=100ms expired, scheduling restart.
May 26 19:15:21 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Scheduled restart job, restart counter is at 15.

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
1. Run any FFU CI job and check the UC running containers
2.
3.

Actual results:

swift_rsync container keeps restarting

Expected results:

swift_rsync container runs successfully

Additional info:

Comment 2 Christian Schwede (cschwede) 2021-06-17 09:23:46 UTC
xinetd is running after the upgrade on port 873 on the undercloud - and this conflicts with the rsync container, also running on port 873.
However, the Swift rsync container does not need to run on the undercloud, because there is only a single replica. 

Proposed a fix upstream to not run Swift rsync if in single replica mode.

Might need another check why xinetd is running after the upgrade.

Comment 4 Jose Luis Franco 2021-06-17 10:40:41 UTC
So, after Jesse's comment I checked our automation code and it seems we are not running the same command we have in the docs:

- name: Remove old RHEL7 packages
  # Remove all el7ost packages except those which could imply the removal
  # (direct or indirect) of the leapp and subscription-manager packages.
  shell: >-
     yum -y remove
     *el7ost*
     galera*
     haproxy*
     httpd
     mysql*
     pacemaker*
     python-jsonpointer
     qemu-kvm-common-rhev
     qemu-img-rhev
     rabbit*
     redis*
     python3*
     --
     -*openvswitch*
     -python-docker
     -python-PyMySQL
     -python-pysocks
     -python2-asn1crypto
     -python2-babel
     -python2-cffi
     -python2-cryptography
     -python2-dateutil
     -python2-idna
     -python2-ipaddress
     -python2-jinja2
     -python2-jsonpatch
     -python2-markupsafe
     -python2-pyOpenSSL
     -python2-requests
     -python2-six
     -python2-urllib3
     -python-httplib2
     -python-passlib
     -python2-netaddr
     -ceph-ansible
     -python2-chardet

https://github.com/openstack/tripleo-upgrade/blob/stable/train/tasks/common/undercloud_os_upgrade.yaml#L23-L60

I will update the tripleo-upgrade code and relaunch the job to see if it helps.

Thanks for having a look.

Comment 18 errata-xmlrpc 2022-03-23 22:28:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenStack Platform 16.2 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0995


Note You need to log in before you can comment on or make changes to this bug.