On an NFSv4 client system, if you have a directory with a few files and then run "cp -a <dir1> <dir2>" and then in <dir2> copy one of the files to a new name the file copy "cp" will hang. Doing a "cp -r" does not have this problem, so it probably is todo with NFSv4 file attributes. There are obviously other cases of file hangs. I also see some GUI clients occasionally not being able to write a file saying something like "resource temporarily unavailable". This is with kernel-3.14.4-100.fc19.i686.PAE on both systems. It happens all the time and on two different networked environments (home and work).
(In reply to Terry Barnaby from comment #0) > On an NFSv4 client system, if you have a directory with a few files and then > run "cp -a <dir1> <dir2>" and then in <dir2> copy one of the files to a new > name the file copy "cp" will hang. > Doing a "cp -r" does not have this problem, so it probably is todo with > NFSv4 file attributes. There are obviously other cases of file hangs. > I also see some GUI clients occasionally not being able to write a file > saying something like "resource temporarily unavailable". > > This is with kernel-3.14.4-100.fc19.i686.PAE on both systems. > It happens all the time and on two different networked environments (home > and work). Who is the server and would it be possible got get a binary network trace using tshark -o /tmp/datap.pcap ; bzip /tmp/data.pcap?
The servers and clients are all Fedora19 updated to current versions. Kernel versions 3.13.x were fine as far as I was aware. I know that delegation code was added in 3.14.x ... (See Bug 1082586) I attach a network trace. This is during: cp -a tmp tmp2 cd tmp2 ls cp -a t1.cpp t1.cpp.3
Created attachment 898873 [details] Network trace
(In reply to Terry Barnaby from comment #3) > Created attachment 898873 [details] > Network trace Something is going on here... the server seems to be returning a "[Malformed Packet]" when the client sends a SETATTR setting an ACL. Then the server start replying with a NFS4ERR_DELAY on the final open. Interesting...
If I look at frame 188 with the wireshark in F20 (wireshark-gnome-1.10.7-1.fc20.x86_64), I see "[Malformed Packet]". With a version I built myself from development (wireshark-1.11.11-5455-g58bb472), it parses fine. Looking at the bytes, I'm pretty sure the latter is right, so this is just a bug in F20's wireshark.
So the hang probably starts with the write OPEN in frame 341, which gets a NFS4ERR_DELAY in frame 342. It may be a delegation problem, in which case you can work around it with "echo 0 >/proc/sys/fs/leases-enable" on the server. It'd be interesting to know if that helps. What's odd is it's an OPEN4_CREATE/EXCLUSIVE4 open for directory/filename 0x26c7788f/t1.ccp.3, following just a millisecond or so after a LOOKUP of the same thing in frame 334 which got an NFS4ERR_NOENT reply in frame 335. So unless there's something else going on at the same time (e.g. a process on the server that just jumped in and created a file under that name), the OPEN that's returning a NFS4ERR_DELAY is an open of a newly created file. It's not attempting to set any attributes here (an EXCLUSIVE4 open can't). It could also be interesting to know whether the file does in fact exist or not at this point. I guess one way to check would be to watch for the hanging create with either wireshark or strace, then check on the server side to see if a file with that name already exists.
(In reply to J. Bruce Fields from comment #6) > So the hang probably starts with the write OPEN in frame 341, which gets a > NFS4ERR_DELAY in frame 342. > > It may be a delegation problem, in which case you can work around it with > "echo 0 >/proc/sys/fs/leases-enable" on the server. It'd be interesting to > know if that helps. It did not, at least in my testing. > > What's odd is it's an OPEN4_CREATE/EXCLUSIVE4 open for directory/filename > 0x26c7788f/t1.ccp.3, following just a millisecond or so after a LOOKUP of > the same thing in frame 334 which got an NFS4ERR_NOENT reply in frame 335. > > So unless there's something else going on at the same time (e.g. a process > on the server that just jumped in and created a file under that name), the > OPEN that's returning a NFS4ERR_DELAY is an open of a newly created file. > > It's not attempting to set any attributes here (an EXCLUSIVE4 open can't). > > It could also be interesting to know whether the file does in fact exist or > not at this point. I guess one way to check would be to watch for the > hanging create with either wireshark or strace, then check on the server > side to see if a file with that name already exists. This is a bit bizarre.... when doing the directories cp -a tmp1 tmp2 the NFS4ERR_DELAY will happen only when tmp2 exists. Its just the opposite for the files cp -a t1.cpp t1.cpp.2 will only hang when t1.cpp.2 does not exist. bizarro!! :-)
Is anything happening with this bug ? It is pretty series for any NFS network server ...
(In reply to Steve Dickson from comment #7) > (In reply to J. Bruce Fields from comment #6) > > So the hang probably starts with the write OPEN in frame 341, which gets a > > NFS4ERR_DELAY in frame 342. > > > > It may be a delegation problem, in which case you can work around it with > > "echo 0 >/proc/sys/fs/leases-enable" on the server. It'd be interesting to > > know if that helps. > It did not, at least in my testing. Note you probably need to turn of leases before starting the nfs server. > > What's odd is it's an OPEN4_CREATE/EXCLUSIVE4 open for directory/filename > > 0x26c7788f/t1.ccp.3, following just a millisecond or so after a LOOKUP of > > the same thing in frame 334 which got an NFS4ERR_NOENT reply in frame 335. > > > > So unless there's something else going on at the same time (e.g. a process > > on the server that just jumped in and created a file under that name), the > > OPEN that's returning a NFS4ERR_DELAY is an open of a newly created file. > > > > It's not attempting to set any attributes here (an EXCLUSIVE4 open can't). > > > > It could also be interesting to know whether the file does in fact exist or > > not at this point. I guess one way to check would be to watch for the > > hanging create with either wireshark or strace, then check on the server > > side to see if a file with that name already exists. > This is a bit bizarre.... when doing the directories > > cp -a tmp1 tmp2 the NFS4ERR_DELAY will happen only when tmp2 exists. > > Its just the opposite for the files > > cp -a t1.cpp t1.cpp.2 will only hang when t1.cpp.2 does not exist. > > > bizarro!! :-) Yeah, I don't have an explanation yet. What filesystem are you exporting? Do you see anything interesting in the logs when this happens?
This now appears to have been fixed by some update in Fedora 19 kernel 3.14.7-100.fc19.i686.PAE