Bug 150526

Summary: NFS write/chown fails, strange strace output
Product: [Fedora] Fedora Reporter: Tim Niemueller <tim>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: redhat
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-06 09:08:28 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Tim Niemueller 2005-03-07 17:53:00 EST
Description of problem:
I have setup an NFS4 server and client roughly following
http://www.vanemery.com/Linux/NFSv4/NFSv4-no-rpcsec.html. Writing to a file for
a second time fails. Strange things happen in strace output that I have no idea
where they come from.



Version-Release number of selected component (if applicable):
[root@lechuck home]# rpm -q nfs-utils
nfs-utils-1.0.6-52
[root@lechuck home]# uname -a
Linux lechuck.informatik.rwth-aachen.de 2.6.10-1.770_FC3smp #1 SMP Thu Feb 24
14:20:06 EST 2005 i686 i686 i386 GNU/Linux

Server and client up2date via yum.

How reproducible:
Always

Steps to Reproduce:
1. Mount via NFS4
2. Open a file for the first time with vim, writing _may_ happen if there is no
.viminfo file that exists involved
3. Reopen the file in vim, save it. Vim will hang.
  
Actual results:
Strace shows something like:
[root@nemo ~]# strace -p 5313
Process 5313 attached - interrupt to quit
select(1, [0], NULL, [0], NULL)         = 1 (in [0])
read(0, ":", 250)                       = 1
select(1, [0], NULL, [0], {0, 0})       = 0 (Timeout)
select(1, [0], NULL, [0], {0, 0})       = 0 (Timeout)
write(1, "\33[?25l\33[24;63H\33[K\33[24;1H:\33[?12l\33"..., 37) = 37
select(1, [0], NULL, [0], {4, 0})       = 1 (in [0], left {3, 840000})
select(1, [0], NULL, [0], NULL)         = 1 (in [0])
read(0, "w", 250)                       = 1
select(1, [0], NULL, [0], {0, 0})       = 0 (Timeout)
select(1, [0], NULL, [0], {0, 0})       = 0 (Timeout)
write(1, "w\33[?25l\33[?12l\33[?25h", 19) = 19
select(1, [0], NULL, [0], {4, 0})       = 1 (in [0], left {3, 931000})
select(1, [0], NULL, [0], NULL)         = 1 (in [0])
read(0, "q", 250)                       = 1
select(1, [0], NULL, [0], {0, 0})       = 0 (Timeout)
select(1, [0], NULL, [0], {0, 0})       = 0 (Timeout)
write(1, "q\33[?25l\33[?12l\33[?25h", 19) = 19
select(1, [0], NULL, [0], {4, 0})       = 1 (in [0], left {2, 714000})
select(1, [0], NULL, [0], NULL)         = 1 (in [0])
read(0, "\r", 250)                      = 1
select(1, [0], NULL, [0], {0, 0})       = 0 (Timeout)
write(1, "\r", 1)                       = 1
write(1, "\33[?25l", 6)                 = 6
ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
getcwd("/home/tmp", 1024)               = 10
write(1, "\"test8.txt\"", 11)           = 11
stat64("test8.txt", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
access("test8.txt", W_OK)               = 0
getxattr("test8.txt", "system.posix_acl_access", 0xbffe42a0, 132) = -1 EOPNOTSUP
P (Operation not supported)
lstat64("test8.txt", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
stat64("4913", 0xbffe4670)              = -1 ENOENT (No such file or directory)
open("4913", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0100644) = 3
close(3)                                = 0
chown32("4913", 65534, 65534

Here it hangs. 65534 is nfsnobody, this is OK. When trying as regular user the
correct id is shown. But where das "4913" come from!? It seems to be always that
number. I have no clue why. There is no PID 4913 or whatsoever.

Expected results:
Should save the file and return, chown should suceed.

Additional info:
Server side: FC3, all updates, 1.1 TB RAID via SCSI as one drive, single ext3
partition created with anaconda
Client side: FC3, all updates, basic install, using server as NIS server and NFS
server. NIS works, NFS not.
Comment 1 Tim Niemueller 2005-03-07 18:20:59 EST
Further investigation results: Disabling SELinux does not have any effect,
setting locale to de_DE@Euro (instead of de_DE.UTF-8) does not help. Reverting
to nfs 3 (by just changing nfs4 to nfs in client fstab) cures the problem. I
would like to go with NFS4 if possible, any idea?
Comment 2 Steve Dickson 2005-03-09 20:49:53 EST
It sounds like it might be an issue with rpc.imapd daemon
which NFSv4 uses to map usernames into uids. Make sure
you have the latest nfs-utils (1.0.6-52) and kernel. If the problem
persist, turn on rpc.idmapd debugging by adding '-vvv' to the
/etc/init.d/rpcidmapd init script. Hopefully there will be some
error messages explaining why everything is getting mapped
to nfsnobody
Comment 3 Tim Niemueller 2005-03-10 05:01:43 EST
As I stated in the bug report I am using the newest version of nfs-utils. Also
the user mapping works just fine, just the root user is mapped to nfsnobody
(root squash) which is also fine as I mentioned. What is strange is the chown32
call to a file named "4913" and not "test8.txt" as it should. I guess you were
just in a hurry and didn't parse all the information given?
It hangs at that chown call. I could now sit there forever waiting for this
chown t succeed, it will never do...
Comment 4 Steve Dickson 2005-03-10 09:05:24 EST
Yea I guess I did skim this a bit too lightly....

Whould it be possible to post a bzip2-ed
ethereal trace (tethereal -w /tmp/data.pcap)
of this? 

Does mounting with the 'intr' flag allow 
you to interrupt out of the chown?
Comment 5 W. Michael Petullo 2005-03-23 13:24:15 EST
I encountered a similar problem when performing a mv over NFSv4.  I am using:

Server (x86):

kernel-2.6.10-1.760_FC3
nfs-utils-1.0.6-44

Client (PowerPC):

kernel-2.6.11-1.1191_FC4
nfs-utils-1.0.7-1

All of the data is copied to the NFS server.  However, the mv process hangs when
it tries to make a chmod call:

[mike@imp mail]$ strace mv sent-mail /nfs/flyn_mind/docs/mail/sent-mail_2005_03_17
[...]
write(4, "SYuvmr+ejRvjLlC6R3ebYbq7\nID2WwSH"..., 30987) = 30987
read(3, "", 32768)                      = 0
close(4)                                = 0
close(3)                                = 0
utimes("/nfs/flyn_mind/docs/mail/sent-mail_2005_03_17", {1111503059, 0}) = 0
chown("/nfs/flyn_mind/docs/mail/sent-mail_2005_03_17", 500, 500 <unfinished ...>

The data has been copied at this point but the chown hangs.  Note that the path
"/nfs/flyn_mind/docs/mail/sent-mail_2005_03_17" IS correct.

Also:

[root@imp mail]# strace chown root:root sent-mail_2005_03_17.gz
[...]
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=692, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x30244000
read(3, "root:x:0:root\nbin:x:1:root,bin,d"..., 4096) = 692
close(3)                                = 0
munmap(0x30244000, 4096)                = 0
lstat64("sent-mail_2005_03_17.gz", {st_mode=S_IFREG|0644, st_size=24281150,
...}) = 0
open(".", O_RDONLY|O_LARGEFILE)         = 3
fchdir(3)                               = 0
chown("sent-mail_2005_03_17.gz", 0, 0

Hangs here.  Sent-mail_2005_03_17.gz is on the same NFS volume, mounted at
/nfs/flyn_mind.
Comment 6 Tim Niemueller 2005-03-24 08:24:36 EST
My mount flags (in fstab) are: rw,hard,intr,proto=tcp,port=2049
I can kill the process hanging with the chown (haven't tried if I could do that
without the intr flag).
Comment 7 Steve Dickson 2005-03-28 10:04:24 EST
hmm... this does appear to be a problem with rpc.imapd in nfs-utils.1.0.6-52. I 
tried to chown a file on a nfs mount that was *not* exported with 
no_root_squash and the chown hung with the server logging the following error:

rpc.idmapd: nfsdcb: write(/proc/net/rpc/nfs4.nametoid/chan nel) failed: errno 22
(Invalid argument)

umounting the filesystem and restarting rpc.idmapd on the server seem
to stop the chown from hanging....

Note: with nfs-utils-1.0.7.1 I did not see this problem. I put the latest
nfs-utils in http://people.redhat.com/steved/bz150526/ please give
this a try.
Comment 8 Orion Poplawski 2005-04-05 13:35:46 EDT
nfs-utils-1.0.7-1 appears to fix the problem for me.