155021 – uninterruptible sleep processes when copying from file-roller to nautilus

Bug 155021 - uninterruptible sleep processes when copying from file-roller to nautilus

Summary: uninterruptible sleep processes when copying from file-roller to nautilus

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	nfs-utils
Sub Component:
Version:	3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-04-15 18:36 UTC by Erik Sjölund
Modified:	2008-02-08 03:07 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-02-08 03:07:01 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
the file "/tmp/r/a.tar.gz" I opened in file-roller (10.50 KB, application/x-gzip) 2005-04-15 18:44 UTC, Erik Sjölund	no flags	Details
sysrq trace back (118.46 KB, text/plain) 2005-04-15 18:51 UTC, Erik Sjölund	no flags	Details
screenshot taken just before the drag and drop of the files into the nautilus window (120.62 KB, image/png) 2005-04-15 18:56 UTC, Erik Sjölund	no flags	Details
sysrq_traceback-2.txt (120.53 KB, text/plain) 2005-05-13 15:59 UTC, Erik Sjölund	no flags	Details
sysrq_traceback-3.txt (119.42 KB, text/plain) 2005-05-13 16:00 UTC, Erik Sjölund	no flags	Details
messages from syslog at the nfsclient (54.94 KB, application/x-bzip) 2005-09-23 16:52 UTC, Erik Sjölund	no flags	Details
messages.nfsserver.bz2 (17.64 KB, application/x-bzip) 2005-09-23 16:55 UTC, Erik Sjölund	no flags	Details
tethereal.nfsclient.2005-09-23.1.bz2 (123.23 KB, application/x-bzip) 2005-09-23 16:57 UTC, Erik Sjölund	no flags	Details
tethereal.nfsserver.2005-09-23.1.bz2 (121.87 KB, application/x-bzip) 2005-09-23 16:58 UTC, Erik Sjölund	no flags	Details
View All

Description Erik Sjölund 2005-04-15 18:36:31 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1

Description of problem:
An up to date Fedora Core 3 machine mounts my home directory from a NFSv4-server. When "draging and drop"-copying 100 files at the same time from a file /tmp/r/a.tar.gz opened in file-roller to a nfs-directory ~/nfscrash10 opened in nautilus, the copying stops after about half of the files showing a pop-up window with a progress bar.

windows title:

"Copying files."

window content:

"Files copied: 43 of 100
Copying: a47
From: /tmp/fr-XzJiHU/a
To: /nfs/home/others/esjolund/nfscrash10
"
and there is a cancel push-button.

Pressing cancel just makes the window not being repainted.
Any process accesing my nfs home directory hangs from now on. For instance
typing "ls ~" on the command line will just leave the ls-process hanging.

Executing the command "ps auxw", I found out that many of the processes with my uid are in the "D"-state ( uninterruptible sleep ).

The only way to get out of the situation tend to be to reboot the machine.


Version-Release number of selected component (if applicable):
nfs-utils-1.0.6-52

How reproducible:
Always

Steps to Reproduce:
1. login with kde
2. on the command line type: 
   mkdir ~/nfscrash10
3. on the command line type:
   nautilus ~/nfscrash10
4. on the command line type:
   file-roller /tmp/r/a.tar.gz 
 
  You now see that file-roller expands the a.tar.gz file. File-roller
  will show you the directory "a" which resides in a.tar.gz.
5. open the directory "a" in file-roller. Now the 100 files from a.tar.gz is visible in file-roller.
6. ctrl-a (select all)
7. press left mouse button and drag the files to the nautilus window.
8. release left mouse button
   A pop up window shows you the progress of the copying. After copying about 
   half of the files, the copying seems to stop. 
9. Because nothing seems to happen I press cancel in the pop up window.
  


Actual Results:  Processes accessing nfs home directory now hangs. Some processes are in the 
"D" running state ( uninterruptible sleep ).


Expected Results:  Processes should not hang.



Additional info:

Oxygen is the hostname of my machine ( the nfs client ).

[esjolund@oxygen ~]$ grep nfs /etc/auto.nfs
home  -rw,fstype=nfs4,hard,intr,nosuid,tcp  nfs.sbc.su.se:/home

[esjolund@oxygen ~]$ uname -a
Linux oxygen 2.6.11-1.14_FC3 #1 Thu Apr 7 19:23:49 EDT 2005 i686 i686 i386 GNU/Linux

I also did 

echo t > /proc/sysrq-trigger

The resulting log from /var/log/messages will be attached to the bug report.

Comment 1 Erik Sjölund 2005-04-15 18:44:00 UTC

Created attachment 113245 [details]
the file "/tmp/r/a.tar.gz"  I opened in file-roller

Comment 2 Erik Sjölund 2005-04-15 18:51:34 UTC

Created attachment 113246 [details]
sysrq trace back

The results from the command

echo t > /proc/sysrq-trigger

found in /var/log/messages

Comment 3 Erik Sjölund 2005-04-15 18:56:43 UTC

Created attachment 113247 [details]
screenshot taken just before the drag and drop of the files into the nautilus window

Comment 4 Erik Sjölund 2005-04-15 19:56:20 UTC

I also did a test doing the drag and drop copying, but staying inside the /tmp
file system ( only local harddrive ). In other words: Instead of copying the 100 
files from /tmp/r/a.tar.gz to ~/nfscrash10 
I did the copying from
/tmp/r/a.tar.gz to /tmp/r2
The copying with drag and drop from file-roller to nautilus now succeeded
without any problems.

Comment 5 Steve Dickson 2005-04-16 13:49:08 UTC

What kernel version is this happening on and who is the nfsv4 servers?

Looking at the system trace it appears everybody is either
waiting for and nfsv4 state lock or a response from the
server. Its not clear but its definitely possible those two
conditions are causing the the hang.

So all the nautilus processes are trying to open a.tar.gz?

Also note, that the nfsv4 code does get better in later 
kernel versions. So upgrading to the latest version (if
you have not already) might help...

Comment 6 Erik Sjölund 2005-04-16 18:01:31 UTC

the client is running Fedora Core 3  
 
kernel-2.6.11-1.14_FC3 
nfs-utils-1.0.6-52 
 
server is running Fedora Core 3  
 
kernel-2.6.10-1.770_FC3 
nfs-utils-1.0.6-52 
 
The client is fully up to date. But the server needs some 
updates ( it was fully up to date one month ago ). If it is possible  
to temporarily shut down the nfs server and do "yum update" and 
reboot into the new kernel without even considering what is happening 
at the client side, I could do it right away. But maybe the clients 
need to be rebooted at the same time as the reboot of the server? If that's  
the case I will have to plan that maneuver first. 
 
As I described earlier. I open a file-roller window 
( "file-roller /tmp/r/a.tar.gz" on the command line ). The file-roller program 
seems to autmagically untar and unzip the a.tar.gz file because it shows the 
100 files which resides inside the a.tar.gz. I then mark all those 100 files 
and drag them into an open nautilus window representing a directory in nfs. 
Then the hanging starts.... 
 
On the nfs server sides it looks like this: 
# cat /etc/sysconfig/nfs 
SECURE_NFS="no" 
MOUNTD_NFS_V2="no" 
MOUNTD_NFS_V3="no" 
 
# grep oxygen /etc/exports 
/big/export    oxygen.sbc.su.se(rw,no_subtree_check,secure,fsid=0,sync)

Comment 7 Erik Sjölund 2005-05-13 15:54:04 UTC

The bug still exists when both the client and server are running kernel 
2.6.11-1.14_FC3 
 
To reproduce the bug I followed the steps ( 1 - 8 ) mentioned before.  
At step 8, the copying this time succeeded. ( This is different to what I 
described in the first bug description ). 
 
To try it once more I did the steps 2,3,5,6,7,8 again but now with another 
directory name, ~/nfscrash15  
 
This time the copying stopped in the middle ( just like it happenend when I 
first reported the bug ). 
I then did 
 
echo t > /proc/sysrq-trigger 
( I will attach the corresponding log as sysrq_traceback-2.txt ) 
I then did 
 
ls ~ 
 
about three times. They all succeeded. 
Then I did step 9 ( clicking the cancel push button ) 
and typed 
 
ps auxw | less 
 
That command was left hanging. 
I then typed 
 
ls ~ 
 
which also was left hanging. 
I then typed  
 
echo t > /proc/sysrq-trigger 
( I will attach the corresponding log as sysrq_traceback-3.txt ) 
 
A conclusion of this test is that the bug doesn't happen every time as I 
thought previously.

Comment 8 Erik Sjölund 2005-05-13 15:59:05 UTC

Created attachment 114340 [details]
sysrq_traceback-2.txt

referred to in comment #7

Comment 9 Erik Sjölund 2005-05-13 16:00:39 UTC

Created attachment 114341 [details]
sysrq_traceback-3.txt

referred to in comment #7

Comment 10 Steve Dickson 2005-05-19 12:17:09 UTC

Well both system traces show that the nautilus process
*seem to be* hung in TCP code (tcp_write_xmit to be exact)
but that could be a red herring... 

Now the the tarfile has 100 files in a diretory called 'a'?

Comment 11 Erik Sjölund 2005-05-19 15:08:31 UTC

The tar file a.tar.gz is attached to comment #1. 
The tar file consists of a directory "a" and in the directory there are 100 
files. 
 
$ tar tfvz a.tar.gz | wc -l 
101 
$ tar tfvz a.tar.gz | head -5 
drwxr-xr-x esjolund/others   0 2005-04-15 14:32:39 a/ 
-rw-r--r-- esjolund/others 40960 2005-04-15 14:32:39 a/a55 
-rw-r--r-- esjolund/others 40960 2005-04-15 14:32:39 a/a57 
-rw-r--r-- esjolund/others 40960 2005-04-15 14:32:39 a/a71 
-rw-r--r-- esjolund/others 40960 2005-04-15 14:32:39 a/a78

Comment 12 Erik Sjölund 2005-05-19 15:28:55 UTC

About 40 nfs clients are connected to the nfs server. The average failure rate 
is about 1-2 clients per day. By failure, I mean that the client machine was 
hanging unable to access the nfs server and hence needed to be rebooted. The 
symptoms of the hanging client are usually something similar as desbribed in 
this bug. The clients are used as desktops. Don't know if those failures are 
all due to this bug or if there are other causes.

Comment 13 Steve Dickson 2005-05-20 11:57:53 UTC

hmm... Whats still not clear is if the client hung waiting for the server
or hung waiting for memory or none of the above.... :)

Would it be possible to get a bzip2 binary tethereal trace
(i.e. tethereal -w) of this, making sure packets are captured
after the hang occurs?

Also does this hang only occur with v4? Does v3 over tcp work?

Comment 16 Erik Sjölund 2005-07-26 14:32:39 UTC

Tomorrow I'm going on vacation. I will look into this again when I come back  
in about a month.

Comment 18 Erik Sjölund 2005-09-23 16:47:17 UTC

I made a new test on some other pc hardware ( two Dell Dimension 5000 ).
Both computers are running Fedora Core 4 with the latest updates.
The test got hit by the same bug as before. The test was only NFSv4. I haven't
tested other NFS versions.

I found some debugging tips at
http://wiki.linux-nfs.org/index.php/General_troubleshooting_recommendations
that I used in this test.

----------

At the nfs server, hostname=laila, ip=10.0.0.1 :

[root@laila ~]# uname -r
2.6.12-1.1456_FC4smp
[root@laila ~]# cat /etc/exports
/export   10.0.0.2(rw,no_subtree_check,fsid=0,sync)
[root@laila ~]# cat /etc/sysconfig/nfs
SECURE_NFS="no"
MOUNTD_NFS_V2="no"
MOUNTD_NFS_V3="no"
[root@laila ~]# grep /mnt/tmpfs /etc/fstab
tmpfs     /mnt/tmpfs      tmpfs   size=300m,mode=1777     0 0
[root@laila ~]# grep /mnt/tmpfs /etc/syslog.conf
*.info;mail.none;authpriv.none;cron.none      /mnt/tmpfs/messages
[root@laila ~]# tethereal -w tethereal.nfsserver.2005-09-23.1


At the nfs client, hostname=kent, ip=10.0.0.2 :

[root@kent ~]# uname -r
2.6.12-1.1456_FC4smp
[root@kent ~]# grep mnt /etc/syslog.conf
*.info;mail.none;authpriv.none;cron.none       -/mnt/tmpfs/messages
[root@kent ~]# sysctl -w sunrpc.nfs_debug=3
[root@kent ~]# grep /mnt/tmpfs /etc/fstab
tmpfs      /mnt/tmpfs     tmpfs   size=300m,mode=1777     0 0
[root@kent ~]# tethereal -w /mnt/tmpfs/tethereal.nfsclient.2005-09-23.1
[root@kent ~]# mount -t nfs4 -o rw,intr,hard,nosuid 10.0.0.1:/ /mnt/nfs


[erik@kent ~]$ echo foo > /mnt/nfs/bar
[erik@kent ~]$ cat /mnt/nfs/bar
foo
[erik@kent ~]$ rm /mnt/nfs/bar

( Ok! basic nfs file operations seems to work )


[erik@kent ~]$ cat create_tar_ball.sh
#!/bin/sh
i=1
mkdir /tmp/b
cd /tmp
while [ $i -le 1000 ]; do
dd if=/dev/zero of=/tmp/b/b$i count=64 bs=1024
i=`expr $i + 1`
done
tar cfz /tmp/b.tar.gz b
[erik@kent ~]$ sh create_tar_ball.sh
[erik@kent ~]$ mkdir /mnt/nfs/dir4
[erik@kent ~]$ nautilus /mnt/nfs/dir4
[erik@kent ~]$ file-roller /tmp/b.tar.gz

Then drag and drop all files at once from file-roller to the nautilus directory.
This time the progress window
halted during copying the 15th file.

If I recall correctly, I now did
[root@kent ~]# echo t > /proc/sysrq-trigger
[root@laila ~]# echo t > /proc/sysrq-trigger
I made some sysrq tracebacks later too, but I don't remember exactly when.

[erik@kent ~]$ ls /mnt/nfs/dir4
b1   b100   b101  b103  b105  b107  b109  b110
b10  b1000  b102  b104  b106  b108  b11   b111

Ok, nfs still works, but after that I tried

[erik@kent ~]$ touch /mnt/nfs/dir4/just_testing

The "touch" command was left in an uninterruptible state.

After that, I tested once more

[erik@kent ~]$ ls /mnt/nfs/dir4

This time the "ls" command didn't return.

Then I typed this command on the nfs server:

[root@laila ~]# exportfs -v
/export         10.0.0.2(rw,wdelay,root_squash,no_subtree_check,fsid=0)

Comment 19 Erik Sjölund 2005-09-23 16:52:07 UTC

Created attachment 119191 [details]
messages from syslog at the nfsclient

referred to in comment #18

Comment 20 Erik Sjölund 2005-09-23 16:55:14 UTC

Created attachment 119192 [details]
messages.nfsserver.bz2

Messages from syslog at the nfs server.
Referred to in comment #18

Comment 21 Erik Sjölund 2005-09-23 16:57:24 UTC

Created attachment 119193 [details]
tethereal.nfsclient.2005-09-23.1.bz2

tethereal dump at the nfs client.
Referred to in comment #18

Comment 22 Erik Sjölund 2005-09-23 16:58:39 UTC

Created attachment 119194 [details]
tethereal.nfsserver.2005-09-23.1.bz2

tethereal dump at the nfs server.
Referred to in comment #18

Comment 23 Matthew Miller 2006-07-10 20:45:52 UTC

Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.

Thank you!

Comment 24 petrosyan 2008-02-08 03:07:01 UTC

Fedora Core 3 is not maintained anymore.

Setting status to "INSUFFICIENT_DATA". If you can reproduce this bug in the
current Fedora release, please reopen this bug and assign it to the
corresponding Fedora version.

Note You need to log in before you can comment on or make changes to this bug.