Bug 1493298

Summary: OSP11 -> OSP12 upgrade: swift_rsync container on controller nodes is in Restarting state post upgrade
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Christian Schwede (cschwede) <cschwede>
Status: CLOSED ERRATA QA Contact: Mike Abrams <mabrams>
Severity: urgent Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: cschwede, dbecker, jschluet, mburns, mcornea, morazi, pgrist, rhel-osp-director-maint, scohen, thiago
Target Milestone: rcKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: openstack-tripleo-heat-templates-7.0.3-11.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 22:10:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Marius Cornea 2017-09-19 20:51:54 UTC
Description of problem:
OSP11 -> OSP12 upgrade: swift_rsync container running on controller nodes is in Restarting state post upgrade

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Deploy monolithic OSP11 with 3 controllers, 2 compute, 3 ceph nodes
2. Upgrade environment to OSP12
3. Log in to one of the controllers and check swift_rsync container status:

sudo docker inspect --format="{{.State.Status }}" swift_rsync

Actual results:

Expected results:

Additional info:

[root@controller-0 ~]# docker ps | grep swift_rsync
1a6e2ef615b2         "kolla_start"            3 hours ago         Restarting (10) About an hour ago                       swift_rsync

Output of docker logs swift_rsync:

Comment 3 Christian Schwede (cschwede) 2017-09-20 11:50:04 UTC
Thanks Marius, I was able to find the reason for this.

After upgrading from a non-containerized deployment to a containerized xinetd is still running, and using port 873 (rsync). Thus the swift_rsync container can't start, because the port is still in use.

Therefore the upgrade tasks needs to stop the xinetd service as well.

Proposed patch: https://review.openstack.org/#/c/505606/
Upstream bug report: https://bugs.launchpad.net/tripleo/+bug/1718403

Comment 4 Christian Schwede (cschwede) 2017-10-25 07:13:33 UTC
Upstream patch merged, moving to POST.

Comment 6 Jon Schlueter 2017-11-01 20:41:44 UTC
stable/pike cherry-pick is proposed but not yet landed

Comment 7 Christian Schwede (cschwede) 2017-11-15 12:34:30 UTC
Upstream backport just merged.

Comment 8 Jon Schlueter 2017-11-21 21:25:44 UTC

Comment 14 Marius Cornea 2017-11-26 13:16:04 UTC

The converge step basically does a stack update the nova upgrade_levels. At this point the services have been upgraded and migrated into containers. If the swift_rsync container gets into Restarting state at that point I suspect the same issue would show up while doing a stack update of a fresh OSP12 deployment so it's probably not related to the patch which addresses BZ#1493298. Checking the logs on your machine we can see:

[root@controller-0 heat-admin]# docker logs --tail 5 swift_rsync
INFO:__main__:Deleting /etc/rsyncd.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/rsyncd.conf to /etc/rsyncd.conf
INFO:__main__:Writing out command to execute
failed to create pid file /var/run/rsyncd.pid: File exists
Running command: '/usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf'

After removing the existing rsync pid file the container is able to start:
[root@controller-0 heat-admin]# mv /var/run/rsyncd.pid /var/run/rsyncd.pid.orig
[root@controller-0 heat-admin]# docker restart swift_rsync
[root@controller-0 heat-admin]# docker ps | grep swift_rsync
a2e2c07ef6e8        rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-swift-object-docker:20171122.1              "kolla_start"            18 minutes ago      Up About a minute (healthy)                       swift_rsync

Based on this data I'd say this is a new issue which is not related to the initial report in 1493298 where you can see the log info for the restarting container is different.

Comment 19 errata-xmlrpc 2017-12-13 22:10:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.