Bug 2021853 - rsync sometime becomes cpu-bound when rsyncing large qemu-img in sparse mode
Summary: rsync sometime becomes cpu-bound when rsyncing large qemu-img in sparse mode
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: rsync
Version: 37
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Ruprich
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-10 10:23 UTC by Fabrice Bellet
Modified: 2023-12-05 21:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-05 21:02:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github WayneD rsync issues 217 0 None open rsync hangs with some specific files, using 100% of 1 CPU core but without disk nor network activity 2023-03-09 13:52:53 UTC

Description Fabrice Bellet 2021-11-10 10:23:50 UTC
Sometimes, rsync becomes completely stalled while rsyncing large qemu-img files with sparse file detection enabled:

# rsync -e "ssh -c aes256-gcm" -aSv --progress --numeric-ids server:/var/lib/libvirt/images/vm.qcow2 2021-11-09/

On the rsync client side, the transfer is almost completely stopped:

[...]
receiving incremental file list
vm.qcow2
 67,859,808,256  28%    1.41kB/s 34384:58:00   

And on the server side, the rsync process, and only it, is cpu bound:

top - 11:14:06 up 36 days, 21:53,  2 users,  load average: 1.85, 2.07, 2.10
Tasks: 1001 total,   2 running, 999 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.9 us,  0.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.1 hi,  0.0 si,  0.0 st
MiB Mem : 515579.2 total,   3623.8 free, 192430.4 used, 319525.0 buff/cache
MiB Swap:  32768.0 total,  17821.2 free,  14946.8 used. 319795.2 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                          
 399401 root      20   0  307872  85844   3836 R  99.7   0.0 160:16.14 rsync                                            
  27513 qemu      20   0 9322052   4.1g  21580 S  16.2   0.8   4045:26 qemu-kvm                                         
   6186 qemu      20   0 9178828   2.2g  21596 S   6.3   0.4   3459:07 qemu-kvm                                         
  35055 qemu      20   0   13.0g   7.8g  21472 S   6.3   1.5   4061:08 qemu-kvm                                         
   5641 qemu      20   0   12.9g   7.5g  21588 S   4.6   1.5   2771:54 qemu-kvm                                         

[root@server images]# strace -tt -p 399401
strace: Process 399401 attached
11:15:24.019398 select(2, [0], [1], [0], {tv_sec=60, tv_usec=0}) = 1 (out [1], left {tv_sec=59, tv_usec=999996})
11:15:24.019675 write(1, "\4\200\0\7\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1"..., 32776) = 32776
11:16:19.416139 read(3, "\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1"..., 98304) = 98304
11:16:19.416470 select(2, [0], [1], [0], {tv_sec=60, tv_usec=0}) = 1 (out [1], left {tv_sec=59, tv_usec=999994})
11:16:19.416567 write(1, "\4\200\0\7\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1"..., 32776) = 32776
[...]


When attaching a debugger on this running rsync process, I notice that the computation seems to be concentrated in function hash_search:

[root@server ~]# gdb  /usr/bin/rsync `pidof rsync`
[...]
0x000055682f19e9b0 in memcmp@plt ()
Missing separate debuginfos, use: dnf debuginfo-install libacl-2.2.53-9.fc33.x86_64 libattr-2.4.48-10.fc33.x86_64 libzstd-1.5.0-1.fc33.x86_64 lz4-libs-1.9.1-3.fc33.x86_64 openssl-libs-1.1.1l-1.fc33.x86_64 popt-1.18-2.fc33.x86_64 zlib-1.2.11-23.fc33.x86_64
(gdb) bt
#0  0x000055682f19e9b0 in memcmp@plt ()
#1  0x000055682f1c139e in hash_search (len=<optimized out>, buf=<optimized out>, s=<optimized out>, f=<optimized out>)
    at match.c:233
#2  match_sums (f=f@entry=1, s=s@entry=0x55682ffd40f0, buf=buf@entry=0x55682ffd4120, len=242028707840) at match.c:396
#3  0x000055682f1b58a1 in send_files (f_in=0, f_out=<optimized out>) at sender.c:392
#4  0x000055682f1bf77a in do_server_sender (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:953
#5  start_server (f_in=0, f_out=1, argc=<optimized out>, argv=0x55682ffd3c50) at main.c:1251
#6  0x000055682f1a007f in main (argc=<optimized out>, argv=<optimized out>) at main.c:1809
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007f925af8c2a7 in md5_block_asm_data_order () from /lib64/libcrypto.so.1.1
(gdb) bt
#0  0x00007f925af8c2a7 in md5_block_asm_data_order () from /lib64/libcrypto.so.1.1
#1  0x00007f925aed054c in MD5_Update () from /lib64/libcrypto.so.1.1
#2  0x000055682f1c02ce in get_checksum2 (buf=0x7f92486679ac '\001' <repeats 200 times>..., len=len@entry=131072, 
    sum=sum@entry=0x7fff57377820 "mev\247\324\356\v\242\060\237\313NK*l\315\001") at checksum.c:233
#3  0x000055682f1c16f3 in hash_search (len=<optimized out>, buf=<optimized out>, s=<optimized out>, f=<optimized out>)
    at match.c:229
#4  match_sums (f=f@entry=1, s=s@entry=0x55682ffd40f0, buf=buf@entry=0x55682ffd4120, len=242028707840) at match.c:396
#5  0x000055682f1b58a1 in send_files (f_in=0, f_out=<optimized out>) at sender.c:392
#6  0x000055682f1bf77a in do_server_sender (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:953
#7  start_server (f_in=0, f_out=1, argc=<optimized out>, argv=0x55682ffd3c50) at main.c:1251
#8  0x000055682f1a007f in main (argc=<optimized out>, argv=<optimized out>) at main.c:1809
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x000055682f1c13aa in hash_search (len=<optimized out>, buf=<optimized out>, s=<optimized out>, f=<optimized out>)
    at match.c:233
233				if (memcmp(sum2,s->sums[i].sum2,s->s2length) != 0) {
(gdb) bt
#0  0x000055682f1c13aa in hash_search (len=<optimized out>, buf=<optimized out>, s=<optimized out>, f=<optimized out>)
    at match.c:233
#1  match_sums (f=f@entry=1, s=s@entry=0x55682ffd40f0, buf=buf@entry=0x55682ffd4120, len=242028707840) at match.c:396
#2  0x000055682f1b58a1 in send_files (f_in=0, f_out=<optimized out>) at sender.c:392
#3  0x000055682f1bf77a in do_server_sender (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:953
#4  start_server (f_in=0, f_out=1, argc=<optimized out>, argv=0x55682ffd3c50) at main.c:1251
#5  0x000055682f1a007f in main (argc=<optimized out>, argv=<optimized out>) at main.c:1809
(gdb)

Comment 1 Fabrice Bellet 2021-11-10 12:22:09 UTC
It seems related to https://github.com/WayneD/rsync/issues/217

Comment 2 Ben Cotton 2021-11-30 16:11:36 UTC
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 3 Michal Ruprich 2022-02-09 12:23:21 UTC
Setting Version to 35 to prevent another EOL in the future. I am sort-of waiting on some resolution from the upstream since this is more likely a design flaw and the upstream seems to be looking into it.

Comment 4 Ben Cotton 2023-04-25 18:22:58 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 5 Aoife Moloney 2023-11-23 00:07:04 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 6 Aoife Moloney 2023-12-05 21:02:33 UTC
Fedora Linux 37 entered end-of-life (EOL) status on None.

Fedora Linux 37 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.