Description of problem: On executing rsync command,I receive an error: *** buffer overflow detected ***: terminated rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(644) [sender=3.1.3] on executing command: /usr/bin/rsync --delay-updates -F --compress --delete-after --archive --no-owner --no-group --rsh=/usr/bin/ssh -S none -o Port=22 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null /tmp/toolbox zuul-worker:~/src/ Version-Release number of selected component (if applicable): On host, where the rsync is executed: rsync-3.1.3-19.el8_7.1.x86_64 Remote host (Fedora Rawhide): rsync-3.2.7-4.fc39.x86_64 Steps to Reproduce: * By using shell: 1. git clone https://github.com/containers/toolbox /tmp/toolbox 2. /usr/bin/rsync --delay-updates -F --compress --delete-after --archive --no-owner --no-group --rsh=/usr/bin/ssh -S none -o Port=22 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null /tmp/toolbox zuul-worker:~/src/ * By using Ansible (2.9.27): 1. git clone https://github.com/containers/toolbox /tmp/toolbox 2. Create playbook + inventory: cat << EOF > test.yaml --- - name: test hosts: test.dev gather_facts: false tasks: - name: Synchronize src repos to workspace directory. synchronize: delete: true dest: "~/src/" recursive: true src: "/tmp/toolbox" owner: no group: no EOF cat << EOF > inventory.yaml --- all: hosts: test.dev: ansible_port: 22 ansible_host: <Fedora Rawhide ip address> ansible_user: zuul-worker EOF 3. ansible-playbook -i invenory.yaml test.yaml Actual results: rsync: connection unexpectedly closed (15 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.3] Expected results: Everything is synced.
The affected product has been set to rhel8 however the issue is likely to have been introduced with the last versions of rsync into Fedora rawhide.
Hi, so you say that this started with version 3.2.7 but this version is in Fedora for over 10 months, was this really the first time this started to happen? This is definitely something in the new version, it crashes even between 3.2.7 and 3.2.7 on both sides. Can you just try to compare the directory structures on both sides even after the crash? Seems to me that everything is actually transferred but it crashes after the transfer. Regards, Michal
Correction, everything is not sent, I was probably looking at wrong output. I would like to ask you why are you using the -F option? Do you need it to filter something? Looking at the definition: -F The -F option is a shorthand for adding two --filter rules to your command. The first time it is used is a shorthand for this rule: ‐‐filter=’dir‐merge /.rsync‐filter’ This tells rsync to look for per‐directory .rsync‐filter files that have been sprinkled through the hierarchy and use their rules to filter the files in the transfer. I don't see any such files in the source location and without -F there is no crash. There is definitely a bug here but this might be a suggestion for temporary workaround.
Hi, So it seems that even if the rsync is the same version then the base system introduce a behavior change when using the "--delete-after" option on rawhide. Here is the log of a new investigation: Sender node: rsync-3.1.2-12.el7_9.x86_64 Receiver node: rsync-3.2.7-4.fc39.x86_64 Running: git clone https://src.fedoraproject.org/rpms/python-gear /usr/bin/rsync --delay-updates -F --compress --delete-after --archive --no-owner --no-group python-gear zuul-worker.83.xxx:/tmp/test-1 *** buffer overflow detected ***: terminated ^CKilled by signal 2. rsync error: unexplained error (code 255) at rsync.c(638) [sender=3.1.2] On both side the output of "find python-gear | wc -l" is similar (48) then it seems the transfer was complete. Also note that: /usr/bin/rsync -v --delay-updates -F --compress --archive --no-owner --no-group python-gear zuul-worker.83.xxx:/tmp/test-4 Running the same command but without the "--delete-after" option the rsync command complete with success. Running the same rsync command but the receiver in now (same sender): $ cat /etc/fedora-release Fedora release 38 (Thirty Eight) $ rpm -qa | grep rsync rsync-3.2.7-2.fc38.x86_64 /usr/bin/rsync --delay-updates -F --compress --delete-after --archive --no-owner --no-group python-gear zuul-worker.83.yyy:/tmp/test-1 The command run with success.
Yes removing "-F" or "--delete-after" avoid the overflow issue. All this options are set in the test command because those options are set by the Ansible synchronize module as used by our CI: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/prepare-workspace/tasks/main.yaml
This quite interesting. There is absolutely no difference in code between rsync-3.2.7-2 and rsync-3.2.7-4 which makes this even more interesting.
Most likely the same bug already filed in Upstream - https://github.com/WayneD/rsync/issues/511
Hi, thanks Michal for checking. I guess Fabien add enough information in that bug. Let me know if you need some more details. Dan
FEDORA-2023-563d5c4a26 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-563d5c4a26
Sometimes it takes a while for fixes to be accepted in rsync Upstream so I went ahead and pushed this.
Thanks for all the hard work on this, Michal! :)
Proposing this as an FE for F39 Beta, as the Beta freeze is in effect. I think it makes sense to give this an FE to avoid problems in Fedora CI tests.
FEDORA-2023-563d5c4a26 has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-563d5c4a26` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-563d5c4a26 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
This also affects upstream Toolbx CI running on Fedora 39 and Rawhide. These pull requests were where the problem first showed up: https://github.com/containers/toolbox/pull/1344 https://github.com/containers/toolbox/pull/1331
Since the patched package landed on the rawhide repository our CI jobs are working as expected [1]. Thanks for the fix ! [1]. https://fedora.softwarefactory-project.io/zuul/builds?job_name=rpm-install-test&branch=rawhide&skip=0&limit=100
Yes, it works! See how the tests running on Fedora Rawhide nodes actually get run again, instead of hitting RETRY_LIMIT: https://github.com/containers/toolbox/pull/1344 Some of the tests still fail on Fedora Rawhide because of other changes in Rawhide, but that's not related to this bug.
+5 in https://pagure.io/fedora-qa/blocker-review/issue/1182 , marking accepted.
FEDORA-2023-563d5c4a26 has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.