Description of problem: From at least 60GB into a 220GB rsync over ssh retrieve, one CPU core is being used at 99%-100%. This stays this way many hours after the big rsync has completed. Version-Release number of selected component (if applicable): # rpm -qa --queryformat "%{NAME}#%{ARCH}#%{VERSION}#%{RELEASE}\n"|grep -E 'kernel|gamin|nautilus|openssh|rsync'|sort gamin#i386#0.1.7#7.fc6 gamin#x86_64#0.1.7#7.fc6 kernel#x86_64#2.6.17#1.2630.fc6 kernel#x86_64#2.6.18#1.2798.fc6 nautilus-cd-burner#i386#2.16.0#3.fc6 nautilus-cd-burner#x86_64#2.16.0#3.fc6 nautilus-extensions#i386#2.16.0#5.fc6 nautilus-extensions#x86_64#2.16.0#5.fc6 nautilus-sendto-bluetooth#x86_64#0.7#5.fc6 nautilus-sendto#x86_64#0.7#5.fc6 nautilus#x86_64#2.16.0#5.fc6 openssh-askpass#x86_64#4.3p2#10 openssh-clients#x86_64#4.3p2#10 openssh-server#x86_64#4.3p2#10 openssh#x86_64#4.3p2#10 rsync#x86_64#2.6.8#3.1 # uname -a Linux mirror-offsite 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16 14:39:22 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Only seen once. Steps to Reproduce: 1. using vnc to the actual machine. 2. start a script: essentially rsync --bwlimit=10000000 --progress --stats -e 'ssh -p 30722 -i /root/sshkey-mirror-offsite' -az --exclude-from="/root/rsync-scripts/rsync-offsite-full-exclude.lst" server:/ /home/mirror/server/>>$logfilename 3. Wait for the bwlimited dl to complete {first time sync} 4. tail -f $logfilename 5. gkrellm open viewing cpu+disk+network activity 6. top running. 7. nautilus {always browse} using the Actual results: When checked 60GB into the copy, noted that one core of the AMD AM2 dual-core processor is consuming 100% CPU. The attachment contains a screenshot of the gkrellm + top many hours after the rsync has completed. Expected results: Since the process is bw limited, and the network speed is much slower than both the hard disk and CPU speeds, CPU should be rather unutilized during the process, and should be completetly idle after comletion. Additional info: a. While web searching for bug, noticed that occasionally the CPU use jumps to the other processor. b. The machine was a FC6T3 install the previous day onto a software raid 1 on 750GB sata drives. The raid build for some of the volumes was continuing when this rsync was started. As of writing this md2_raid1 process had completed, however, it had taken: {sorted by cumulative cpu use} top - 12:48:38 up 1 day, 3:18, 4 users, load average: 1.02, 1.06, 1.07 Tasks: 143 total, 2 running, 141 sleeping, 0 stopped, 0 zombie Cpu0 : 17.2%us, 20.3%sy, 0.1%ni, 53.8%id, 7.5%wa, 0.3%hi, 0.7%si, 0.0%st Cpu1 : 22.8%us, 26.6%sy, 0.1%ni, 44.7%id, 2.3%wa, 0.0%hi, 3.5%si, 0.0%st Mem: 1024916k total, 986216k used, 38700k free, 336432k buffers Swap: 8385880k total, 184k used, 8385696k free, 201180k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3013 root 25 0 420m 31m 20m R 100 3.2 1215:32 nautilus 3494 root 10 -5 0 0 0 S 0 0.0 6:10.44 kjournald 416 root 10 -5 0 0 0 S 0 0.0 4:27.62 md2_raid1 2950 root 15 0 54868 22m 4120 S 0 2.3 3:51.37 Xvnc 225 root 15 0 0 0 0 S 0 0.0 2:11.83 pdflush 226 root 10 -5 0 0 0 S 0 0.0 2:10.89 kswapd0 224 root 15 0 0 0 0 S 0 0.0 2:07.82 pdflush c. The machine has vmware-server and server-console installed, and configured for the running kernel, however, no virt machines have been created. d. Trying to start a File Browser from the menu does not start it. Possibly related: bug 207836 : Nautilus taking 70% processor time after a large copy however, nautilus was not used to initiate the copy (started the rsync script from a gnome-terminal). bug 187970 : gamin is crazy busy when idle. however, gamin is not showing in top, but is running in ps: root 3082 0.0 0.1 12988 1236 ? S Oct20 0:00 /usr/libexec/gam_server
Created attachment 139044 [details] screen capture of gkrellm and top while this bug is occuring
Created attachment 139046 [details] strace of nautilus while in this state. # ps aux|grep naut root 3013 75.5 3.1 430780 32616 ? Rs Oct20 1239:35 nautilus --no-default-window --sm-client-id default3 root 6566 0.0 0.6 254520 6192 ? S 12:18 0:00 nautilus --no-desktop --browser root 7182 0.0 0.6 254520 6192 ? S 12:57 0:00 nautilus --no-desktop --browser root 7446 0.0 0.0 60264 692 pts/3 R+ 13:12 0:00 grep naut ie: it is the desktop nautilus that is troublesome. attachment shows cut down: strace -p 3013 -o strace-nautilus-5secs.txt and others (ltrace etc). To me the traces are probably not of much use. I will leave the machine in thsis state for a while, so please request any "while faulting" information that might be needed.
Could you get a bunch of gdb backtraces to see what nautilus is doing. First install nautilus-debuginfo, eel-debuginfo, gnome-vfs2-debuginfo, gtk2-debuginfo, and glib2-debuginfo, then run "gdb attach <pid of nautilus>". In gdb, type "thread apply all bt" to get a backtrace, then "c" to continue, after a (short) while, press ctrl-c and get a new backtrace. Continue this a few times and collect the backtraces, that way we'll be able to get a feel for what the app is doing (because its not doing much i/o, so it doesn't show up in strace).
Created attachment 139146 [details] a coupla runs of gdb) bt Had errors with the thread bt version, so tried just bt. It seems the trace is missing some functions, so I tried to install some more guesses of debuginfo, but must not have been the correct ones.
Woa. Very strange that it didn't get a lot of symbols. However, all the backtraces are from the same place, you need to do "c" (continue executing) and ctrl-c to get a different backtrace. I wonder what all the warning: .dynamic section for "/usr/lib64/libgnomevfs-2.so.0" is not at the expected address lines are. Maybe prelink ran causing this or something. Could be the cause of the symbols missing in the backtrace, although some of the pointer values in the backtrace look weird too.
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I never saw this issue again, and I don't have access to the hardware where it presented itself, to try to confirm it with current F9. I changed the version back to F6, and was going to close it, unless someone wants it open for tracking purposes etc ?
David, thanks for helping us with this, but Fedora 6 is way after the End of Support (actually, Fedora 8 is now the last supported Fedora distro). Closing this bug as CANTFIX