Description of problem: I have setup an NFS4 server and client roughly following http://www.vanemery.com/Linux/NFSv4/NFSv4-no-rpcsec.html. Writing to a file for a second time fails. Strange things happen in strace output that I have no idea where they come from. Version-Release number of selected component (if applicable): [root@lechuck home]# rpm -q nfs-utils nfs-utils-1.0.6-52 [root@lechuck home]# uname -a Linux lechuck.informatik.rwth-aachen.de 2.6.10-1.770_FC3smp #1 SMP Thu Feb 24 14:20:06 EST 2005 i686 i686 i386 GNU/Linux Server and client up2date via yum. How reproducible: Always Steps to Reproduce: 1. Mount via NFS4 2. Open a file for the first time with vim, writing _may_ happen if there is no .viminfo file that exists involved 3. Reopen the file in vim, save it. Vim will hang. Actual results: Strace shows something like: [root@nemo ~]# strace -p 5313 Process 5313 attached - interrupt to quit select(1, [0], NULL, [0], NULL) = 1 (in [0]) read(0, ":", 250) = 1 select(1, [0], NULL, [0], {0, 0}) = 0 (Timeout) select(1, [0], NULL, [0], {0, 0}) = 0 (Timeout) write(1, "\33[?25l\33[24;63H\33[K\33[24;1H:\33[?12l\33"..., 37) = 37 select(1, [0], NULL, [0], {4, 0}) = 1 (in [0], left {3, 840000}) select(1, [0], NULL, [0], NULL) = 1 (in [0]) read(0, "w", 250) = 1 select(1, [0], NULL, [0], {0, 0}) = 0 (Timeout) select(1, [0], NULL, [0], {0, 0}) = 0 (Timeout) write(1, "w\33[?25l\33[?12l\33[?25h", 19) = 19 select(1, [0], NULL, [0], {4, 0}) = 1 (in [0], left {3, 931000}) select(1, [0], NULL, [0], NULL) = 1 (in [0]) read(0, "q", 250) = 1 select(1, [0], NULL, [0], {0, 0}) = 0 (Timeout) select(1, [0], NULL, [0], {0, 0}) = 0 (Timeout) write(1, "q\33[?25l\33[?12l\33[?25h", 19) = 19 select(1, [0], NULL, [0], {4, 0}) = 1 (in [0], left {2, 714000}) select(1, [0], NULL, [0], NULL) = 1 (in [0]) read(0, "\r", 250) = 1 select(1, [0], NULL, [0], {0, 0}) = 0 (Timeout) write(1, "\r", 1) = 1 write(1, "\33[?25l", 6) = 6 ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 getcwd("/home/tmp", 1024) = 10 write(1, "\"test8.txt\"", 11) = 11 stat64("test8.txt", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0 access("test8.txt", W_OK) = 0 getxattr("test8.txt", "system.posix_acl_access", 0xbffe42a0, 132) = -1 EOPNOTSUP P (Operation not supported) lstat64("test8.txt", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0 stat64("4913", 0xbffe4670) = -1 ENOENT (No such file or directory) open("4913", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0100644) = 3 close(3) = 0 chown32("4913", 65534, 65534 Here it hangs. 65534 is nfsnobody, this is OK. When trying as regular user the correct id is shown. But where das "4913" come from!? It seems to be always that number. I have no clue why. There is no PID 4913 or whatsoever. Expected results: Should save the file and return, chown should suceed. Additional info: Server side: FC3, all updates, 1.1 TB RAID via SCSI as one drive, single ext3 partition created with anaconda Client side: FC3, all updates, basic install, using server as NIS server and NFS server. NIS works, NFS not.
Further investigation results: Disabling SELinux does not have any effect, setting locale to de_DE@Euro (instead of de_DE.UTF-8) does not help. Reverting to nfs 3 (by just changing nfs4 to nfs in client fstab) cures the problem. I would like to go with NFS4 if possible, any idea?
It sounds like it might be an issue with rpc.imapd daemon which NFSv4 uses to map usernames into uids. Make sure you have the latest nfs-utils (1.0.6-52) and kernel. If the problem persist, turn on rpc.idmapd debugging by adding '-vvv' to the /etc/init.d/rpcidmapd init script. Hopefully there will be some error messages explaining why everything is getting mapped to nfsnobody
As I stated in the bug report I am using the newest version of nfs-utils. Also the user mapping works just fine, just the root user is mapped to nfsnobody (root squash) which is also fine as I mentioned. What is strange is the chown32 call to a file named "4913" and not "test8.txt" as it should. I guess you were just in a hurry and didn't parse all the information given? It hangs at that chown call. I could now sit there forever waiting for this chown t succeed, it will never do...
Yea I guess I did skim this a bit too lightly.... Whould it be possible to post a bzip2-ed ethereal trace (tethereal -w /tmp/data.pcap) of this? Does mounting with the 'intr' flag allow you to interrupt out of the chown?
I encountered a similar problem when performing a mv over NFSv4. I am using: Server (x86): kernel-2.6.10-1.760_FC3 nfs-utils-1.0.6-44 Client (PowerPC): kernel-2.6.11-1.1191_FC4 nfs-utils-1.0.7-1 All of the data is copied to the NFS server. However, the mv process hangs when it tries to make a chmod call: [mike@imp mail]$ strace mv sent-mail /nfs/flyn_mind/docs/mail/sent-mail_2005_03_17 [...] write(4, "SYuvmr+ejRvjLlC6R3ebYbq7\nID2WwSH"..., 30987) = 30987 read(3, "", 32768) = 0 close(4) = 0 close(3) = 0 utimes("/nfs/flyn_mind/docs/mail/sent-mail_2005_03_17", {1111503059, 0}) = 0 chown("/nfs/flyn_mind/docs/mail/sent-mail_2005_03_17", 500, 500 <unfinished ...> The data has been copied at this point but the chown hangs. Note that the path "/nfs/flyn_mind/docs/mail/sent-mail_2005_03_17" IS correct. Also: [root@imp mail]# strace chown root:root sent-mail_2005_03_17.gz [...] fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 fstat64(3, {st_mode=S_IFREG|0644, st_size=692, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x30244000 read(3, "root:x:0:root\nbin:x:1:root,bin,d"..., 4096) = 692 close(3) = 0 munmap(0x30244000, 4096) = 0 lstat64("sent-mail_2005_03_17.gz", {st_mode=S_IFREG|0644, st_size=24281150, ...}) = 0 open(".", O_RDONLY|O_LARGEFILE) = 3 fchdir(3) = 0 chown("sent-mail_2005_03_17.gz", 0, 0 Hangs here. Sent-mail_2005_03_17.gz is on the same NFS volume, mounted at /nfs/flyn_mind.
My mount flags (in fstab) are: rw,hard,intr,proto=tcp,port=2049 I can kill the process hanging with the chown (haven't tried if I could do that without the intr flag).
hmm... this does appear to be a problem with rpc.imapd in nfs-utils.1.0.6-52. I tried to chown a file on a nfs mount that was *not* exported with no_root_squash and the chown hung with the server logging the following error: rpc.idmapd: nfsdcb: write(/proc/net/rpc/nfs4.nametoid/chan nel) failed: errno 22 (Invalid argument) umounting the filesystem and restarting rpc.idmapd on the server seem to stop the chown from hanging.... Note: with nfs-utils-1.0.7.1 I did not see this problem. I put the latest nfs-utils in http://people.redhat.com/steved/bz150526/ please give this a try.
nfs-utils-1.0.7-1 appears to fix the problem for me.