From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 Description of problem: When I'm backing up my server, I use rsync to copy the differences between the running file system and the image file system. I run rsync with the following command (where /tmp/backup_from/ is an LVM snapshot, and /tmp/target-backup is a loop mounted image file): rsync -aHEAXxv --stats --delete --ignore-errors /tmp/backup_from/ /tmp/target-backup rsync will occasionally fail with the message: rsync: connection unexpectedly closed (20444 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(453) [sender=2.6.9] ps showed two rsync processes were running. When I attached gdb to the parent process (as shown by pstree), I only received the message above on the terminal. I tried attaching gdb to the child process the second time, and this time gdb captured a SIGSEGV message. I copied the following output from a backtrace: #0 set_acl (fname=0x0, file=0x2aaaae92f780, sxp=0x7fff645b9a50) at acls.c:170 count = <value optimized out> ida = <value optimized out> errfun = <value optimized out> entry = (acl_entry_t) 0x0 cur_mode = 0 duo_item = (acl_duo *) 0x856580 ndx = <value optimized out> unchanged = 1 type = 32768 ndx_ptr = 0x2aaaae92f7cd "LC_COLLATE" #1 0x0000000000404f3d in itemize (file=0x2aaaae92f780, ndx=45469, statret=<value optimized out>, sxp=0x7fff645b9a50, iflags=20488, fnamecmp_type=0 '\0', xname=0x7fff645b9b90 "usr/lib/locale/tr_CY.utf8/LC_COLLATE") at generator.c:364 No locals. #2 0x000000000041fb8c in hard_link_one (file=0x2aaaae92f780, ndx=45469, fname=0x6589a0 "usr/lib/locale/tr_TR.utf8/LC_COLLATE", statret=0, sxp=0x7fff645b9a50, toname=0x7fff645b9b90 "usr/lib/locale/tr_CY.utf8/LC_COLLATE", terse=0, itemizing=1, code=<value optimized out>) at hlink.c:282 No locals. #3 0x000000000041fe25 in hard_link_cluster (file=0x2aaaae92f780, master=<value optimized out>, itemizing=1, code=FLOG) at hlink.c:319 hlink1 = "usr/lib/locale/tr_CY.utf8/LC_COLLATE\000p/binding\000\000\000ly_usage_200703.png\000\000sty\000c.css\000ml\000html\000\0005.txt\000xt\000m:2,S\000b.com\000\000\000,\000m:2,\000\000.com:2,Sa\000\000\000jlab.com:2,Sa\000\000\000jdjlab.com:2,Sb\000ml\000\000\000\0000111\000about.html\000\000xsd\000txt\000\000sd\000e"... hlink2 = 0x6589a0 "usr/lib/locale/tr_TR.utf8/LC_COLLATE" sx = {st = {st_dev = 1792, st_ino = 492834, st_nlink = 1, st_mode = 33188, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 0, st_size = 105214, st_blksize = 4096, st_blocks = 224, st_atim = { tv_sec = 1168734642, tv_nsec = 0}, st_mtim = {tv_sec = 1168016861, tv_nsec = 0}, st_ctim = {tv_sec = 1184716900, tv_nsec = 0}, __unused = { 0, 0, 0}}, acc_acl = 0x6, def_acl = 0x2aaaae1b1848, xattr = 0x8544f0} st = {st_dev = 1792, st_ino = 492674, st_nlink = 1, st_mode = 33188, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 0, st_size = 105214, st_blksize = 4096, st_blocks = 224, st_atim = {tv_sec = 1184716900, tv_nsec = 0}, st_mtim = {tv_sec = 1183583685, tv_nsec = 0}, st_ctim = { tv_sec = 1184716900, tv_nsec = 0}, __unused = {0, 0, 0}} statret = 1683724880 ndx = 45469 #4 0x0000000000404ba6 in check_for_finished_hlinks (itemizing=1, code=FLOG) at generator.c:611 file = (struct file_struct *) 0x0 ndx = -1366100096 #5 0x00000000004079b2 in generate_files (f_out=1, flist=0x8544f0, local_name=0x0) at generator.c:1531 file = (struct file_struct *) 0x2aaaae934110 i = 45882 fbuf = "usr/lib/locale/zh_TW.utf8/LC_PAPER\000C\000\000NT\000ON\000C_MESSAGES\000S\000\000\000nt-lpr.so\000\000o\000d}/install.rdf\000\000.txt\000\000d.h\000p\000\000\000RlcjpJb3dhOlVT.pem\000W93YTpVUw==.pem\000TpVUw==.pem\000\000\000ip", '\0' <repeats 1647 times>, "0º[dÿ\177\000\000ÿa\200á7\000\000\000ÿa\200á7\000\000\000\r\000\000\000\000\000\000\000ë5dÔ7"... itemizing = 1 maybe_ATTRS_REPORT = 1 code = FLOG need_retouch_dir_times = 1 need_retouch_dir_perms = 0 save_ignore_existing = 0 save_ignore_non_existing = 0 save_do_progress = 0 save_make_backups = 0 dir_tweaking = 1 #6 0x000000000041058d in do_recv (f_in=0, f_out=1, flist=0x8544f0, local_name=0x0) at main.c:764 pid = 4002 exit_code = 0 error_pipe = {3, 4} #7 0x0000000000410ce9 in start_server (f_in=0, f_out=1, argc=1, argv=0x7fff645bbdb8) at main.c:871 No locals. #8 0x0000000000411d55 in child_main (argc=33188, argv=0x7fff645b9a50) at main.c:878 No locals. #9 0x0000000000427ad5 in local_child (argc=2, argv=0x7fff645bbdb0, f_in=0x7fff645bdd9c, f_out=0x7fff645bdd98, child_main=0x411d40 <child_main>) at pipe.c:149 pid = 0 to_child_pipe = {3, 4} from_child_pipe = {5, 6} __PRETTY_FUNCTION__ = "local_child" #10 0x0000000000411663 in main (argc=2, argv=0x851230) at main.c:438 dummy1 = 0x436290 "L\211d$àL\211l$èL\215%_]!" dummy2 = 2 ret = <value optimized out> orig_argc = 7 orig_argv = <value optimized out> sigmask = {__val = {84483, 0 <repeats 15 times>}} #11 0x00000037d461d8a4 in __libc_start_main () from /lib64/libc.so.6 No symbol table info available. #12 0x0000000000403999 in _start () No symbol table info available. I also dumped a core file. If I could be told how to give a ~85MB core file, I'll be happy to do so. Version-Release number of selected component (if applicable): rsync-2.6.9-2 How reproducible: Sometimes Steps to Reproduce: 1. rsync large amounts of files Actual Results: I received: rsync: connection unexpectedly closed (20444 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(453) [sender=2.6.9] Expected Results: rsync should successfully copied all the files Additional info: I believe rsync is only failing on hard linked files. A few recent logs support this, and the gdb-traced session failed on a hard link. A stat of the file is below: File: `LC_NUMERIC' Size: 54 Blocks: 16 IO Block: 4096 regular file Device: fd00h/64768d Inode: 524910 Links: 39 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2007-07-13 14:56:06.000000000 -0500 Modify: 2007-07-04 16:11:55.000000000 -0500 Change: 2007-07-13 14:55:42.000000000 -0500 The assumption that large number of files need to be present for rsync to fail may only be because package updates change a large number of files, and certain packages use hard linking files extensively in order to save space.
Created attachment 159728 [details] GDB full backtrace
I downloaded the rsync package from koji and recompiled it without the optimization options (I noticed some values were optimized out, and it appeared some functions were inlined). I received another segfault and I've got a backtrace file submitted above and another core dump file which I can send to someone if I can be told where and how. A simple check shows that rsync again failed on a hard linked file. Here's the stat of the file: File: `/var/cache/yum/core/packages/NetworkManager-glib-0.6.4-5.fc6.i386.rpm' Size: 27791 Blocks: 64 IO Block: 4096 regular file Device: fd00h/64768d Inode: 9994978 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2007-07-20 14:50:43.000000000 -0500 Modify: 2006-10-05 16:04:13.000000000 -0500 Change: 2007-07-20 14:44:01.000000000 -0500
Can you try the patch provided in 244564 and update that bug if this fixes your problem? This report has exactly the same backtrace so I am making it a duplicate of 244564 *** This bug has been marked as a duplicate of 244564 ***