Description of problem: I am running some tests on RHOS-16.1-RHEL-8-20200625.n.0 and found that swift_rsync is only running on controller-1. It has status "exited(11)" on controller-0 and controller-2 One of the tests I run performs a hard reboot on the three controller nodes, executed in parallel. I reviewed test logs and it seems the issue started happenning rigt after these reboots. I tried to run a soft reboot after this, but did not fix the issue. When restarting this container with podman restart, the following error is shown (tailing /var/log/containers/stdouts/swift_rsync.log file): 2020-06-30T16:23:00.775833402+00:00 stderr F failed to create pid file /var/run/rsyncd.pid: File exists I did not really find that file either on the main file system or under swift_proxy container. I have not seen any other error under /var/log/containers/swift/ I am testing on two envs with the same OSP16.1 versions and the issue is only reproduced on one of them. Let me explain the differences: 1) NO ISSUE Virtualized environment (all OC nodes are VM) Installed with RHOS-16.1-RHEL-8-20200625.n.0 directly 2) ISSUE Hybrid env (two compute nodes are BM servers) Installed with core_puddle: RHOS-16.1-RHEL-8-20200610.n.0 and then updated to RHOS-16.1-RHEL-8-20200625.n.0 (I did not test this with RHOS-16.1-RHEL-8-20200610.n.0) Version-Release number of selected component (if applicable): RHOS-16.1-RHEL-8-20200625.n.0 openstack-swift-proxy-2.23.2-0.20200505123431.2e50b58.el8ost.noarch python3-swift-2.23.2-0.20200505123431.2e50b58.el8ost.noarch puppet-swift-15.4.1-0.20200524163422.cc79e4a.el8ost.noarch How reproducible: 1/2 (see notes about two different envs above) Steps to Reproduce: 1. apparently it happened after compute nodes were hard rebooted in parallel 2. 3. Actual results: Only 1/3 swift_rsync running Expected results: 3/3 swift_rsync running
I suspect it's the same underlying problem as this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1517548 The daemon configuration needs to have the pid file disabled (because it's in a container anyway). Christian, do you mind looking at this?
Can't be merged until the puppetlabs-rsync module is updated to include https://github.com/puppetlabs/puppetlabs-rsync/pull/120
Looking at the sosreports 2 of the 3 rsyncd.conf files have the pid setting still applied (node 0 and 2). Both files are also older than the one on node 1, likely missing a config file update. Node 1 does not have that setting and behaves as expected, because we added a workaround in t-h-t 2 years ago that fixes the setting: https://review.opendev.org/#/c/577403 The question is now why hasn't step 3 been applied on node 0 and 2? Any ideas?
Are node 0 and 2 pre-deployed nodes? In that case the workaround would look like this: 1. Execute the following command on all controller nodes that are pre-deployed: for d in $(podman inspect swift_rsync | jq '.[].GraphDriver.Data.UpperDir'| xargs) /var/lib/config-data/puppet-generated/swift; do sed -i -e '/pid file/d' $d/etc/rsyncd.conf; done That should fix this issue until we have a permanent fix merged.
Hi Christian (and all), Can you review the doc text draft that I added for technical accuracy? Thanks, Naomi ______________________________________________ There is currently a known issue with the Object Storage service (swift). If you are using pre-deployed nodes, you might encounter the following error message in /var/log/containers/stdouts/swift_rsync.log: "failed to create pid file /var/run/rsyncd.pid: File exists" Workaround: Enter the following command on all Controller nodes that are pre-deployed: for d in $(podman inspect swift_rsync | jq '.[].GraphDriver.Data.UpperDir'| xargs) /var/lib/config-data/puppet-generated/swift; do sed -i -e '/pid file/d' $d/etc/rsyncd.conf; done ______________________________________________
(In reply to ndeevy from comment #24) > Can you review the doc text draft that I added for technical accuracy? Thanks Naomi; Pete just stumbled upon a minor issue - the "|xargs" is not needed in the command, I removed that part in the doc entry. > ______________________________________________ > > There is currently a known issue with the Object Storage service (swift). If > you are using pre-deployed nodes, you might encounter the following error > message in /var/log/containers/stdouts/swift_rsync.log: > > "failed to create pid file /var/run/rsyncd.pid: File exists" > > Workaround: Enter the following command on all Controller nodes that are > pre-deployed: > > for d in $(podman inspect swift_rsync | jq '.[].GraphDriver.Data.UpperDir'| > xargs) /var/lib/config-data/puppet-generated/swift; do sed -i -e '/pid > file/d' $d/etc/rsyncd.conf; done
Verifyed on : RHOS-16.1-RHEL-8-20201130.n.0 [heat-admin@controller-0 ~]$ rpm -qa | grep puppet-rsync puppet-rsync-1.1.1-0.20200311051621.a7d4f84.el8ost.noarch 1. "grep pid_file /var/lib/config-data/puppet generated/swift/etc/rsyncd.conf" returnded nothing on all of controllers. 2. swift_rsync is up and running after rebooting the controllers 3. no errors at /var/log/containers/stdouts/swift_rsync.log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5413