Bug 26777 - Processes suspended when rsync large date to other host
Processes suspended when rsync large date to other host
Status: CLOSED RAWHIDE
Product: Red Hat Linux
Classification: Retired
Component: rsync (Show other bugs)
7.1
i386 Linux
high Severity high
: ---
: ---
Assigned To: Bill Nottingham
Brock Organ
Florence Gold
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-02-08 22:28 EST by Bill Huang
Modified: 2014-03-16 22:18 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-06-19 05:51:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
hwinfo of the test machine (7.07 KB, text/plain)
2001-02-19 11:29 EST, Bill Huang
no flags Details

  None (edit)
Description Bill Huang 2001-02-08 22:28:28 EST
I received a bug report from Keio University(A famous university in Japan).

When they used rsync to upload large data(about 50GB) from a SMP host(with
kernel2.4 installed) to other 
host,they found almost all the processes got suspended.They tried to use
top to check the CPU status,it told that 
99%-100% CPU was occupied by system processes.Though it seemed a little
strange that if telnet from another 
host to that SMP host,processes can work fine for a while and then they
became suspended again.

They can provide further information if you need.
Comment 1 Glen Foster 2001-02-09 18:18:28 EST
This defect is considered MUST-FIX for Florence Gold release
Comment 2 Michael K. Johnson 2001-02-14 16:28:03 EST
Exactly which kernel version are they using?
Comment 3 Michael K. Johnson 2001-02-14 16:31:52 EST
Oh, and what "system processes"?  rsync or something else?
pasting the output of top in here might help.
Comment 4 Bill Huang 2001-02-14 17:19:14 EST
I have asked Keio project people,now waiting for their response...
As for version of kernel,since they told me that they used beta2,I guess that
they used kernel that was included in beta2.
it was kernel-2.4.0-0.43.12.i386.rpm.

As for "system processess",I guess that CPU status indicated "99%-100% system".
Anyway,they will send us the output of top.
Comment 5 Bill Huang 2001-02-19 11:29:43 EST
Created attachment 10380 [details]
hwinfo of the test machine
Comment 6 Bill Huang 2001-02-19 11:31:31 EST
(Response from Keio university)
They installed Fisher and found the same error occured.

The kernel version they are using is 2.4.0-0.99.11enterprise.

The test network enviroment:
client                      server
+--------+     rsync        +--------+
| hfs-24 |------------------| hfs-25 |
+--------+                  +--------+
     NIC: e1000           NIC: e1000

== script for testing rsync(rsync-test2.sh) ==
#!/bin/bash
j=0
while [ $j -lt 1000 ] ; do
  echo === COUNT $j START ===
  /usr/local2/bin/rsync-test1.sh
  echo === COUNT $j START ===
  j=`expr $j + 1`
done

==script for testing rsync (rsync-test1.sh) ==
#! /bin/bash
i=100001
while [ $i -lt 100070 ] ; do
  echo s${i} === `date` ===
  rsync -axvW hfs-25::rsync-test/s${i}/ /var/rsync-test/s{i}
  i=`expr $i + 1`
done


==script run on hfs-24==
# nohup /usr/local2/bin/rsync-test2.sh 2>&1 1>test1.log &

---------------------------------------------------------------

The system slept after test was started an hour.

== test1.log(mainly) ==

s100055 === Fri Feb 16 22:13:44 JST 2001 ===
receiving file list ... done
wrote 88 bytes  read 150918 bytes  302012.00 bytes/sec
total size is 101023993  speedup is 669.01
s100056 === Fri Feb 16 22:13:45 JST 2001 ===
receiving file list ... done
wrote 88 bytes  read 150918 bytes  302012.00 bytes/sec
total size is 101023993  speedup is 669.01
s100057 === Fri Feb 16 22:13:45 JST 2001 ===
receiving file list ...

==========================

the output of top

# top
  9:10am  up 2 days, 18:50,  2 users,  load average: 1.00, 1.15, 1.15
78 processes: 76 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states:  0.0% user, 100.0% system,  0.0% nice,  0.0% idle
CPU1 states:  0.0% user,  0.3% system,  0.0% nice, 99.2% idle
Mem:  2059740K av, 1395248K used,  664492K free,       0K shrd,  136376K
buff
Swap: 4192440K av,     396K used, 4192044K free                  499168K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
26723 root      20   0     8    8     0 R    99.9  0.0   2:19 find
26727 root      11   0  1052 1052   828 R     0.3  0.0   0:00 top
26575 root      10   0   836  836   696 S     0.1  0.0   0:00 in.telnetd
    1 root       8   0   544  544   480 S     0.0  0.0   0:06 init


The hwinfo file is attached.
Comment 7 Michael K. Johnson 2001-02-21 11:38:47 EST
OK, pasting the output of "ps auxwwwf" will show us the state of the rsync
processes.
Comment 8 Bill Huang 2001-02-21 12:07:35 EST
Had asked Keio people...
Comment 9 Arjan van de Ven 2001-02-21 13:18:00 EST
I bet both rsync's are waiting in select(), waiting for eachother.
Timing out every 60 seconds but doing nothing.
Comment 10 Michael K. Johnson 2001-02-21 14:17:28 EST
That has happened to me with large numbers of small files on
the 2.2 kernel as well.  I think it's an rsync protocol problem,
but I figured ps output would be useful before reassigning this
to rsync...
Comment 11 Michael K. Johnson 2001-02-28 22:34:29 EST
Not hearing any feedback yet, I'm assigning this to rsync because we
think the bug is there.  If it's proved to be a kernel problem, it
can always be re-assigned back.
Comment 12 Bill Nottingham 2001-06-12 15:52:13 EDT
closing, lack of input.

Please reopen with the requested info if this persists on the current release &
kernel.
Comment 13 Tim Waugh 2001-06-19 05:51:03 EDT
<Pine.LNX.4.33.0106191040230.9988-100000@xpc1.ast.cam.ac.uk> from the 
linux-kernel list points to a patch at 
http://www.clari.net/~wayne/rsync-nohang.patch which is reported to fix this 
problem.
Comment 14 Bill Nottingham 2001-06-19 17:06:32 EDT
patch added in 2.4.6-3.

Note You need to log in before you can comment on or make changes to this bug.