Bug 794780

Summary: Files on NFS4 become unwritable, but OK after explicit stat
Product: [Fedora] Fedora Reporter: Rik Theys <rik.theys>
Component: kernelAssignee: nfs-maint
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: gansalmon, itamar, jlayton, jonathan, kernel-maint, madhu.chinakonda, orion, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 789298 Environment:
Last Closed: 2012-03-23 10:55:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg none

Description Rik Theys 2012-02-17 16:02:41 UTC
+++ This bug was initially created as a clone of Bug #789298 +++

Description of problem:

Hi,

Our home directories here are mounted over NFS4. When I log in to machine A and run

vim
:q

and then log into machine B and do:

vim
:q

I get E137: Viminfo file is not writable: /users/system/rtheys/.viminfo

Every invocation of 'vim and :q' will trigger this.

Explicitely doing a stat of the file fixes this.

Doing the vim :q thing on machine A while it has been triggered on B works: A never shows this message, while B continues to show it until the file is stat'ed.

The file server is a RHEL 6.2 machine.

I can trigger this bug on RHEL 6 clients, but NOT on RHEL 5 clients.

This seems to have been discussed here:
http://comments.gmane.org/gmane.linux.nfs/37230

According to that thread, vim tries to do a stat on the ~/.viminfo file and receives incorrect st_uid/st_gid information (-2)? If it gets the wrong information I assume this is a kernel bug. Why is the information incorrect?

The easiest way to reproduce this is with vim, but I've seen similar strange behaviour when modifying my ~/.ssh/authorized_keys file and then trying to ssh to other machines. They say they are ignoring the file because the permissions are not OK. This looks like the same bug to me, but is harder to reproduce here.

This bug has been introduced between 2.6.31 and 2.6.32 by commit 80e52aced138bb41b045a8595a87510f27d8d8c5

commit 80e52aced138bb41b045a8595a87510f27d8d8c5
Author: Trond Myklebust <Trond.Myklebust>
Date:   Sun Aug 9 15:06:19 2009 -0400

    NFSv4: Don't do idmapper upcalls for asynchronous RPC calls

    We don't want to cause rpciod to hang...

    Signed-off-by: Trond Myklebust <Trond.Myklebust>


The 3.3-rc kernels have an upstream fix that works if applied to a pristine 3.2.5 kernel:

From: Trond Myklebust <Trond.Myklebust>
Date: Sat, 7 Jan 2012 13:22:46 -0500
Subject: NFSv4: Save the owner/group name string when doing open

commit 6926afd1925a54a13684ebe05987868890665e2b upstream.

...so that we can do the uid/gid mapping outside the asynchronous RPC
context.
This fixes a bug in the current NFSv4 atomic open code where the client
isn't able to determine what the true uid/gid fields of the file are,
(because the asynchronous nature of the OPEN call denies it the ability
to do an upcall) and so fills them with default values, marking the
inode as needing revalidation.
Unfortunately, in some cases, the VFS will do some additional sanity
checks on the file, and may override the server's decision to allow
the open because it sees the wrong owner/group fields.

Signed-off-by: Trond Myklebust <Trond.Myklebust>
Signed-off-by: Jonathan Nieder <jrnieder>

Please consider backporting this fix to a RHEL 6 update.

Regards, 

Rik


Version-Release number of selected component (if applicable):

all RHEL6 kernels

How reproducible:

always

Steps to Reproduce:
1. open a terminal and ssh to machine A on which your home directory is NFS4 mounted
2. open a terminal and ssh to machine B on which that same homedir is also NFS4 mounted
3. open vim and :q on machine A
4. open vim and :q on machine B
5. Machine B will keep on giving this error message unless the file is explicitly stat'ed. After opening vim on A again, the error message will return.

Actual results:

error about not being able to write

Expected results:

no error from vim

Additional info:

Comment 1 Orion Poplawski 2012-02-29 22:57:57 UTC
I seem to have at least one user here seeing the vim message.

Comment 2 Steve Dickson 2012-03-15 15:15:10 UTC
Here is a scratch build of a f16 kernel with the requested patch
   http://koji.fedoraproject.org/koji/taskinfo?taskID=3897772

Does anybody have any cycles to give this kernel a test run to ensure the problem is solved?

Comment 3 Orion Poplawski 2012-03-15 18:04:34 UTC
That seems to have broken some things.  I'm seeing:

Mar 15 11:59:15 zabbix sshd[1852]: Authentication refused: bad ownership or modes for file /home/orion/.ssh/authorized_keys
Mar 15 11:59:15 zabbix sshd[1852]: Authentication refused: bad ownership or modes for file /home/orion/.ssh/authorized_keys
Mar 15 11:59:44 zabbix request-key: Cannot find command to construct key 219132399
Mar 15 11:59:44 zabbix request-key: Cannot find command to construct key 292067578
Mar 15 12:00:43 zabbix request-key: Cannot find command to construct key 1025275063
Mar 15 12:00:43 zabbix request-key: Cannot find command to construct key 656536239
# ls -l /home/orion/.ssh/authorized_keys
-rw-r--r--. 1 4294967294 4294967294 1043 Aug 25  2008 /home/orion/.ssh/authorized_keys

All files in home dir are uid/gid 4294967294.

Comment 4 Orion Poplawski 2012-03-15 20:11:45 UTC
Created attachment 570428 [details]
dmesg

Comment 5 Orion Poplawski 2012-03-15 22:52:58 UTC
Turns out CONFIG_NFS_USE_NEW_IDMAPPER=y got turned on in this kernel.  We are awaiting a new kernel with that turned off to test.

Comment 6 Steve Dickson 2012-03-19 13:59:48 UTC
Another scratch build: 
    http://koji.fedoraproject.org/koji/taskinfo?taskID=3909612
that does not have CONFIG_NFS_USE_NEW_IDMAPPER set...

tia!

Comment 7 Orion Poplawski 2012-03-19 17:05:38 UTC
I no longer get the error with vim with this kernel.  Thanks.

For the record, to test I did:

- open vim on machine A
- open vim on machine B
- :q on machine A
- :q on machine B

Comment 8 Steve Dickson 2012-03-19 17:42:42 UTC
Thanks for taking the time!!

Comment 9 Fedora Update System 2012-03-20 19:22:34 UTC
kernel-2.6.42.12-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/FEDORA-2012-3715/kernel-2.6.42.12-1.fc15

Comment 10 Dave Jones 2012-03-22 17:03:05 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 11 Dave Jones 2012-03-22 17:06:14 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 12 Dave Jones 2012-03-22 17:17:19 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 13 Rik Theys 2012-03-23 08:18:10 UTC
Hi,

I've tested it with the 3.3 kernel and it seems to resolve the issue.

Regards,

Rik

Comment 14 Jeff Layton 2012-03-23 10:55:10 UTC
Great, thanks for testing. Closing bug...

Comment 15 Fedora Update System 2012-03-26 18:02:39 UTC
kernel-2.6.42.12-1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.