Description of problem: After performing the Undercloud upgrade from 13 to 16.1 in all our CI jobs, we can observe that the swift_rsync container is Exited with exit code 10: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.1-from-13-latest_cdn-3cont_3db_3msg_2net_3hci-ipv6-ovs_dvr/127/undercloud-0/var/log/extra/podman/podman_allinfo.log.gz eccf1f9f4dd6 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20210430.1 kolla_start 17 minutes ago Exited (10) 17 minutes ago swift_rsync The exit code seems to be comming form the rsync command itself: https://lxadm.com/Rsync_exit_codes 10 Error in socket I/O And when checking the journal messages: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.1-from-13-latest_cdn-3cont_3db_3msg_2net_3hci-ipv6-ovs_dvr/127/undercloud-0/var/log/messages.gz May 26 19:15:04 undercloud-0 systemd[1]: Started swift_rsync container. May 26 19:15:04 undercloud-0 systemd[1]: Reloading. May 26 19:15:04 undercloud-0 rsyncd[8]: rsyncd version 3.1.3 starting, listening on port 873 May 26 19:15:04 undercloud-0 rsyncd[8]: bind() failed: Address already in use (address-family 2) May 26 19:15:04 undercloud-0 rsyncd[8]: unable to bind any inbound sockets on port 873 May 26 19:15:04 undercloud-0 rsyncd[8]: rsync error: error in socket IO (code 10) at socket.c(555) [Receiver=3.1.3] May 26 19:15:04 undercloud-0 systemd[1]: libpod-eccf1f9f4dd661c2366656cad958131ac4627f5352ea6d1fde0d275840d170e4.scope: Consumed 335ms CPU time May 26 19:15:04 undercloud-0 systemd[1]: Started swift_rsync container healthcheck. May 26 19:15:04 undercloud-0 systemd[1]: Reloading. May 26 19:15:04 undercloud-0 podman[65691]: 2021-05-26 19:15:04.892379584 +0000 UTC m=+0.085187997 container died eccf1f9f4dd661c2366656cad958131ac4627f5352ea6d1fde0d275840d170e4 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20210430.1, name=swift_rsync) May 26 19:15:05 undercloud-0 podman[65691]: 2021-05-26 19:15:05.002947385 +0000 UTC m=+0.195755580 container cleanup eccf1f9f4dd661c2366656cad958131ac4627f5352ea6d1fde0d275840d170e4 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20210430.1, name=swift_rsync) May 26 19:15:05 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Main process exited, code=exited, status=10/n/a May 26 19:15:05 undercloud-0 systemd[1]: Reloading. May 26 19:15:05 undercloud-0 podman[65738]: eccf1f9f4dd661c2366656cad958131ac4627f5352ea6d1fde0d275840d170e4 May 26 19:15:05 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Failed with result 'exit-code'. May 26 19:15:05 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Service RestartSec=100ms expired, scheduling restart. May 26 19:15:05 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Scheduled restart job, restart counter is at 1. May 26 19:15:05 undercloud-0 systemd[1]: Stopped swift_rsync container. And the container keeps restarting over and over: 61187 still running (3555) May 26 19:15:21 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Service RestartSec=100ms expired, scheduling restart. May 26 19:15:21 undercloud-0 systemd[1]: tripleo_swift_rsync.service: Scheduled restart job, restart counter is at 15. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Run any FFU CI job and check the UC running containers 2. 3. Actual results: swift_rsync container keeps restarting Expected results: swift_rsync container runs successfully Additional info:
xinetd is running after the upgrade on port 873 on the undercloud - and this conflicts with the rsync container, also running on port 873. However, the Swift rsync container does not need to run on the undercloud, because there is only a single replica. Proposed a fix upstream to not run Swift rsync if in single replica mode. Might need another check why xinetd is running after the upgrade.
The docs say to remove it: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#removing-red-hat-openstack-platform-director-packages Perhaps something is adding it back after the install?
So, after Jesse's comment I checked our automation code and it seems we are not running the same command we have in the docs: - name: Remove old RHEL7 packages # Remove all el7ost packages except those which could imply the removal # (direct or indirect) of the leapp and subscription-manager packages. shell: >- yum -y remove *el7ost* galera* haproxy* httpd mysql* pacemaker* python-jsonpointer qemu-kvm-common-rhev qemu-img-rhev rabbit* redis* python3* -- -*openvswitch* -python-docker -python-PyMySQL -python-pysocks -python2-asn1crypto -python2-babel -python2-cffi -python2-cryptography -python2-dateutil -python2-idna -python2-ipaddress -python2-jinja2 -python2-jsonpatch -python2-markupsafe -python2-pyOpenSSL -python2-requests -python2-six -python2-urllib3 -python-httplib2 -python-passlib -python2-netaddr -ceph-ansible -python2-chardet https://github.com/openstack/tripleo-upgrade/blob/stable/train/tasks/common/undercloud_os_upgrade.yaml#L23-L60 I will update the tripleo-upgrade code and relaunch the job to see if it helps. Thanks for having a look.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenStack Platform 16.2 (openstack-tripleo-heat-templates) security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0995