Bug 160974

Summary: lengthy rsync copy through ssh from fc2 to fc4 produced Oops
Product: [Fedora] Fedora Reporter: Lennart Rolland <chimenigfx>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-04 13:54:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lennart Rolland 2005-06-19 12:04:31 UTC
Description of problem:
Wile trying to copy 280gb of data from a fedora core 2 based computer to a 
fedora core 4 based computer using an rsync command produced a complete crash 
after copying roughly 74gb.


Version-Release number of selected component (if applicable):

FC4 computer(destination) versions:
-----------------------------------

#uname -a
Linux tiger 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:16:33 EDT 2005 x86_64 
x86_64 x86_64 GNU/Linux

#rsync --version
rsync  version 2.6.4  protocol version 29
Copyright (C) 1996-2005 by Andrew Tridgell and others
<http://rsync.samba.org/>
Capabilities: 64-bit files, socketpairs, hard links, ACLs, xattrs, symlinks, 
batchfiles,
              inplace, IPv6, 64-bit system inums, 64-bit internal inums

# ssh -V
OpenSSH_4.0p1, OpenSSL 0.9.7f 22 Mar 2005

FC2 computer(dource) versions:
------------------------------

#uname -a
Linux bingo 2.4.22-1.2115.nptl #1 Wed Oct 29 15:42:51 EST 2003 i686 i686 i386 
GNU/Linux

#rsync --version
rsync  version 2.5.7  protocol version 26
Copyright (C) 1996-2002 by Andrew Tridgell and others
<http://rsync.samba.org/>
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles,
              IPv6, 64-bit system inums, 64-bit internal inums

#ssh -V
OpenSSH_3.6.1p2, SSH protocols 1.5/2.0, OpenSSL 0x0090701f


How reproducible:
Uncertain. Im not risking so i have divided the copy operation into less than 
60gb chunks from now on..

Steps to Reproduce:
1.Made sure sshd was running on fc4 computer
2.Made sure there was a path over the network between the computers
3.ran the following command:
rsync -Pav -e ssh -r <source dir> root.0.2:<dest dir>
(rsync -Pav -e ssh -r dat/ins root.0.2:/mnt/den/dat/)

  
Actual results:

File copying went great, I went to bed (it was late), and next morning the 
console was filled with the following errormessages:

CODE 41 8B 45 48 FF C8 /E 45 48 C7 83 F0 01 00 00 00 00 00 00 48
RIP mm_release+86
Badness in do_unblank_screen at drivers/char/vt.c:2822(not tainted)
do_unblank_screen+100
bust_spin_lock+28

Expected results:
Normal copy.

Additional info:

I controlled both computers from anoter via ssh. none of them actually had a 
keyboard or mouse attatched. First i noticed that all ssh connections had timed 
out. The errormessages after the Oops were discovered when i attatched the 
screen to fc4 computer.

I used reiserfs 3.6 on the source drive and xfs on the destination drive.

System:
http://tyan.com/products/html/gx28b2881.html
1 x Tyan Transport GX28 1U Barebone, Dual Opteron
4 x Seagate 250gb sata disk
2 x AMD Opteron DP Server Model 242 (1.6GHz)
1 x 3ware 8506-4LP (4 channel hardware sata raid, raid 5)
4 x Corsair 512MB PC3200 ECC REG Low profile

Latest bios for both tyan mobo and 3ware controller.
Tyna mobo configured with the least amount of devices. No sata controllers, 
serial ports or lpt. 3ware configured for one raid5 array that spans all 4 disks 
completely.

Comment 1 Lennart Rolland 2005-06-19 15:08:19 UTC
Discovered some typos.

Replace
CODE 41 8B 45 48 FF C8 /E 45 48 C7 83 F0 01 00 00 00 00 00 00 48
With
CODE 41 8B 45 48 FF C8 7E 45 48 C7 83 F0 01 00 00 00 00 00 00 48

Comment 2 Dave Jones 2005-07-07 23:46:54 UTC
unfortunatly, the useful part of the oops is above all this code.

And this part..

RIP mm_release+86
Badness in do_unblank_screen at drivers/char/vt.c:2822(not tainted)
do_unblank_screen+100
bust_spin_lock+28

Is very likely a symptom of the earlier oops rather than something useful.

If you have the chance to reproduce this, please make sure that the console
blanking is disabled (setterm -blank 0 -a)
It'll also be useful to get more lines of text on the console, so boot with
something like vga=791 (or vga=1 if your monitor doesnt like that mode).

You're also playing with fire a little with XFS, as it's known to have problems
with 4KB stacks.  If any subsequent oopses you capture show XFS or reiserfs in
the traces, I recommend filing bugs upstream at http://bugme.osdl.org as those
filesystems are unsupported by fedora, and bugs in them get little to no
attention from folks at Red Hat.

I'll leave this open for now, but if it hasn't been reproduced after a while,
there's not much else I can do but close this.


Comment 3 Dave Jones 2005-07-15 21:08:36 UTC
[This comment has been added as a mass update for all FC4 kernel bugs.
 If you have migrated this bug from an FC3 bug today, ignore this comment.]

Please retest your problem with todays 2.6.12-1.1398_FC4 update.

If your problem involved being unable to boot, or some hardware not being
detected correctly, please make sure your /etc/modprobe.conf is correct *BEFORE*
installing any kernel updates.
If in doubt, you can recreate this file using..

mv /etc/sysconfig/hwconf /etc/sysconfig/hwconf.bak
mv /etc/modprobe.conf /etc/modprobe.conf.bak
kudzu


Thank you.


Comment 4 Dave Jones 2005-09-30 06:13:19 UTC
Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.


Comment 5 Dave Jones 2005-11-10 19:10:57 UTC
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.


Comment 6 Dave Jones 2006-02-03 06:34:06 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 7 John Thacker 2006-05-04 13:54:45 UTC
Closing per previous comment.