Bug 248663 - rsync segfaults on large transfers
Summary: rsync segfaults on large transfers
Keywords:
Status: CLOSED DUPLICATE of bug 244564
Alias: None
Product: Fedora
Classification: Fedora
Component: rsync
Version: 6
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Simo Sorce
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-18 01:04 UTC by Justin Willmert
Modified: 2007-11-30 22:12 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-10 17:46:49 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
GDB full backtrace (11.34 KB, text/plain)
2007-07-21 18:36 UTC, Justin Willmert
no flags Details

Description Justin Willmert 2007-07-18 01:04:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4

Description of problem:
When I'm backing up my server, I use rsync to copy the differences between the running file system and the image file system. I run rsync with the following command (where /tmp/backup_from/ is an LVM snapshot, and /tmp/target-backup is a loop mounted image file):

rsync -aHEAXxv --stats --delete --ignore-errors /tmp/backup_from/ /tmp/target-backup

 rsync will occasionally fail with the message:

rsync: connection unexpectedly closed (20444 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(453) [sender=2.6.9]

ps showed two rsync processes were running. When I attached gdb to the parent process (as shown by pstree), I only received the message above on the terminal.  I tried attaching gdb to the child process the second time, and this time gdb captured a SIGSEGV message. I copied the following output from a backtrace:

#0  set_acl (fname=0x0, file=0x2aaaae92f780, sxp=0x7fff645b9a50) at acls.c:170
	count = <value optimized out>
	ida = <value optimized out>
	errfun = <value optimized out>
	entry = (acl_entry_t) 0x0
	cur_mode = 0
	duo_item = (acl_duo *) 0x856580
	ndx = <value optimized out>
	unchanged = 1
	type = 32768
	ndx_ptr = 0x2aaaae92f7cd "LC_COLLATE"
#1  0x0000000000404f3d in itemize (file=0x2aaaae92f780, ndx=45469, 
    statret=<value optimized out>, sxp=0x7fff645b9a50, iflags=20488, 
    fnamecmp_type=0 '\0', 
    xname=0x7fff645b9b90 "usr/lib/locale/tr_CY.utf8/LC_COLLATE")
    at generator.c:364
No locals.
#2  0x000000000041fb8c in hard_link_one (file=0x2aaaae92f780, ndx=45469, 
    fname=0x6589a0 "usr/lib/locale/tr_TR.utf8/LC_COLLATE", statret=0, 
    sxp=0x7fff645b9a50, 
    toname=0x7fff645b9b90 "usr/lib/locale/tr_CY.utf8/LC_COLLATE", terse=0, 
    itemizing=1, code=<value optimized out>) at hlink.c:282
No locals.
#3  0x000000000041fe25 in hard_link_cluster (file=0x2aaaae92f780, 
    master=<value optimized out>, itemizing=1, code=FLOG) at hlink.c:319
	hlink1 = "usr/lib/locale/tr_CY.utf8/LC_COLLATE\000p/binding\000\000\000ly_usage_200703.png\000\000sty\000c.css\000ml\000html\000\0005.txt\000xt\000m:2,S\000b.com\000\000\000,\000m:2,\000\000.com:2,Sa\000\000\000jlab.com:2,Sa\000\000\000jdjlab.com:2,Sb\000ml\000\000\000\0000111\000about.html\000\000xsd\000txt\000\000sd\000e"...
	hlink2 = 0x6589a0 "usr/lib/locale/tr_TR.utf8/LC_COLLATE"
	sx = {st = {st_dev = 1792, st_ino = 492834, st_nlink = 1, 
    st_mode = 33188, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 0, 
    st_size = 105214, st_blksize = 4096, st_blocks = 224, st_atim = {
      tv_sec = 1168734642, tv_nsec = 0}, st_mtim = {tv_sec = 1168016861, 
      tv_nsec = 0}, st_ctim = {tv_sec = 1184716900, tv_nsec = 0}, __unused = {
      0, 0, 0}}, acc_acl = 0x6, def_acl = 0x2aaaae1b1848, xattr = 0x8544f0}
	st = {st_dev = 1792, st_ino = 492674, st_nlink = 1, st_mode = 33188, 
  st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 0, st_size = 105214, 
  st_blksize = 4096, st_blocks = 224, st_atim = {tv_sec = 1184716900, 
    tv_nsec = 0}, st_mtim = {tv_sec = 1183583685, tv_nsec = 0}, st_ctim = {
    tv_sec = 1184716900, tv_nsec = 0}, __unused = {0, 0, 0}}
	statret = 1683724880
	ndx = 45469
#4  0x0000000000404ba6 in check_for_finished_hlinks (itemizing=1, code=FLOG)
    at generator.c:611
	file = (struct file_struct *) 0x0
	ndx = -1366100096
#5  0x00000000004079b2 in generate_files (f_out=1, flist=0x8544f0, 
    local_name=0x0) at generator.c:1531
	file = (struct file_struct *) 0x2aaaae934110
	i = 45882
	fbuf = "usr/lib/locale/zh_TW.utf8/LC_PAPER\000C\000\000NT\000ON\000C_MESSAGES\000S\000\000\000nt-lpr.so\000\000o\000d}/install.rdf\000\000.txt\000\000d.h\000p\000\000\000RlcjpJb3dhOlVT.pem\000W93YTpVUw==.pem\000TpVUw==.pem\000\000\000ip", '\0' <repeats 1647 times>, "0º[dÿ\177\000\000ÿa\200á7\000\000\000ÿa\200á7\000\000\000\r\000\000\000\000\000\000\000ë5dÔ7"...
	itemizing = 1
	maybe_ATTRS_REPORT = 1
	code = FLOG
	need_retouch_dir_times = 1
	need_retouch_dir_perms = 0
	save_ignore_existing = 0
	save_ignore_non_existing = 0
	save_do_progress = 0
	save_make_backups = 0
	dir_tweaking = 1
#6  0x000000000041058d in do_recv (f_in=0, f_out=1, flist=0x8544f0, 
    local_name=0x0) at main.c:764
	pid = 4002
	exit_code = 0
	error_pipe = {3, 4}
#7  0x0000000000410ce9 in start_server (f_in=0, f_out=1, argc=1, 
    argv=0x7fff645bbdb8) at main.c:871
No locals.
#8  0x0000000000411d55 in child_main (argc=33188, argv=0x7fff645b9a50)
    at main.c:878
No locals.
#9  0x0000000000427ad5 in local_child (argc=2, argv=0x7fff645bbdb0, 
    f_in=0x7fff645bdd9c, f_out=0x7fff645bdd98, 
    child_main=0x411d40 <child_main>) at pipe.c:149
	pid = 0
	to_child_pipe = {3, 4}
	from_child_pipe = {5, 6}
	__PRETTY_FUNCTION__ = "local_child"
#10 0x0000000000411663 in main (argc=2, argv=0x851230) at main.c:438
	dummy1 = 0x436290 "L\211d$àL\211l$èL\215%_]!"
	dummy2 = 2
	ret = <value optimized out>
	orig_argc = 7
	orig_argv = <value optimized out>
	sigmask = {__val = {84483, 0 <repeats 15 times>}}
#11 0x00000037d461d8a4 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#12 0x0000000000403999 in _start ()
No symbol table info available.

I also dumped a core file. If I could be told how to give a ~85MB core file, I'll be happy to do so.

Version-Release number of selected component (if applicable):
rsync-2.6.9-2

How reproducible:
Sometimes


Steps to Reproduce:
1. rsync large amounts of files

Actual Results:
I received:
rsync: connection unexpectedly closed (20444 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(453) [sender=2.6.9]


Expected Results:
rsync should successfully copied all the files

Additional info:
I believe rsync is only failing on hard linked files. A few recent logs support this, and the gdb-traced session failed on a hard link.  A stat of the file is below:

  File: `LC_NUMERIC'
  Size: 54              Blocks: 16         IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 524910      Links: 39
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2007-07-13 14:56:06.000000000 -0500
Modify: 2007-07-04 16:11:55.000000000 -0500
Change: 2007-07-13 14:55:42.000000000 -0500

The assumption that large number of files need to be present for rsync to fail may only be because package updates change a large number of files, and certain packages use hard linking files extensively in order to save space.

Comment 1 Justin Willmert 2007-07-21 18:36:05 UTC
Created attachment 159728 [details]
GDB full backtrace

Comment 2 Justin Willmert 2007-07-21 18:38:27 UTC
I downloaded the rsync package from koji and recompiled it without the
optimization options (I noticed some values were optimized out, and it appeared
some functions were inlined). I received another segfault and I've got a
backtrace file submitted above and another core dump file which I can send to
someone if I can be told where and how.

A simple check shows that rsync again failed on a hard linked file. Here's the
stat of the file:
  File: `/var/cache/yum/core/packages/NetworkManager-glib-0.6.4-5.fc6.i386.rpm'
  Size: 27791           Blocks: 64         IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 9994978     Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2007-07-20 14:50:43.000000000 -0500
Modify: 2006-10-05 16:04:13.000000000 -0500
Change: 2007-07-20 14:44:01.000000000 -0500


Comment 3 Simo Sorce 2007-08-10 17:46:49 UTC
Can you try the patch provided in 244564 and update that bug if this fixes your
problem?
This report has exactly the same backtrace so I am making it a duplicate of 244564

*** This bug has been marked as a duplicate of 244564 ***


Note You need to log in before you can comment on or make changes to this bug.