Bug 511278 - /proc/self/exe reports wrong path after fstat on NFSv4
/proc/self/exe reports wrong path after fstat on NFSv4
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: Jeff Layton
Red Hat Kernel QE team
: Regression
Depends On:
Blocks: 533192 525215 526775 526950
  Show dependency treegraph
 
Reported: 2009-07-14 10:43 EDT by Rhu
Modified: 2011-09-26 06:41 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 03:43:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch -- fix regression in nfs_open_revalidate (1.39 KB, patch)
2009-08-21 08:48 EDT, Jeff Layton
no flags Details | Diff

  None (edit)
Description Rhu 2009-07-14 10:43:22 EDT
Description of problem:
When attempting to mount an NFSv4 share from a server on a RHEL5 client, the file system exhibits odd behaviour, primarily visible through the /proc/self/exe symlink. After an fstat on a file from an NFSv4 share, the phrase ' (deleted)' is appended to what the file system thinks the files name is. Thus, /proc/self/exe for '/path/to/process' points to '/path/to/process (deleted)' (which is an invalid symlink). 

This is particularly problematic when running programs off of an NFSv4 share, as programs (such as the java JDK) attempt to use /proc/self/exe to work out where they are, and fail when the (deleted) is added. A temporary solution is to copy the file with the phrase (deleted) in the filename, but this is far from ideal.

The problem is client side, and not server side, and appears to be in the kernel (see additional info). A Fedora 10 client can mount the server without problems. Likewise, a Fedora 10 server (or any other setup running an NFsv4 server) mounted on a RHEL client produces the same problem.

How reproducible:
Always on a RHEL5 client - accross various different versions and configutrations of Red Hat

Steps to Reproduce:
1. Set up an NFSv4 server and export, for example, /test
2. Mount /test on a RHEL5 client using NFSv4
3. Execute a program that demonstrates the broken behaviour (see below)

Demonstrating the broken behaviour: 

The broken behaviour can easily be demonstrated with a simple C program:
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>

int main()
{
        char buf[1024];
        int len = readlink("/proc/self/exe", buf, sizeof(buf));
        printf("/proc/self/exe = %s\n",buf);
}

Compiling and running the program produces the following output (note how it alternates between the correct response and broken response)
test@tester /test $ ./a.out
/proc/self/exe = /test/a.out (deleted)·
test@tester /test $ ./a.out
/proc/self/exe = /test/a.out
test@tester /test $ ./a.out
/proc/self/exe = /test/a.out (deleted)·
test@tester /test $ ./a.out
/proc/self/exe = /test/a.out
test@tester /test $ ./a.out
/proc/self/exe = /test/a.out (deleted)·

The broken behaviour can also be seen with java. For example, extract the java JDK into /test and attempt to run 'java' or 'javac'. The first execution will fail, subsequent executions until a stat will work, then failure will again occur.

test@tester test $ ./java -version
execve(): No such file or directory
Error trying to exec test/java (deleted).
Check if file exists and permissions are set correctly.

test@tester test $ ./java -version
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode)
test@tester test $ ./java -version
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode)
test@tester test $ ./java -version
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode)

test@tester test $ stat ./java > /dev/null

test@tester test $ ./java -version
execve(): No such file or directory
Error trying to exec test/java (deleted).
Check if file exists and permissions are set correctly.

test@tester test $ ./java -version
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode)

---
  
Actual results: /path/to/file (deleted)


Expected results: /path/to/file


Additional info: It seems that when the problem occurs, no request to the NFSv4 server is made from the client (when deleted is appended). When the attempt occurs again, a request is made and it works (no deleted). There does not seem to be anything else obvious or apparent to the problem in the NFS server log or client log different to a working system. 

The problem appears to be in the kernel, not in the software based nfs-utils. Using the same nfs-utils that work on a Fedora 10 system on the RHEL system still results in the same problem.

The problem does not occur on a Fedora 10 client mounting the same server.
Comment 1 Rhu 2009-07-14 10:46:58 EDT
As an additional comment, the problem only occurs when mounting as NFSv4. Mouting through a previous version of NFS works fine.
Comment 2 Jeff Layton 2009-07-27 09:45:01 EDT
Looks like this is probably being done by __d_path(). The "(deleted)" string means that the dentry has been unhashed. nfs4 does do some d_drops in the open codepath so that's probably where this needs to be resolved.
Comment 3 Jeff Layton 2009-08-06 08:54:01 EDT
This is also fixed in rawhide. Trawling through the changesets to see if anything stands out at me...
Comment 4 Jeff Layton 2009-08-06 08:55:48 EDT
...also not a problem in RHEL4.
Comment 5 Jeff Layton 2009-08-06 10:03:28 EDT
...and only occurs on RHEL5 on every 2nd run of the reproducer.
Comment 6 Jeff Layton 2009-08-20 13:20:49 EDT
I believe this is a regression that was probably introduced by the patch for bug 321111. -77.el5 doesn't show this behavior, -79.el5 does. The only NFS patch that went in during that period was the patch for bug 321111.
Comment 7 Jeff Layton 2009-08-21 08:48:52 EDT
Created attachment 358236 [details]
patch -- fix regression in nfs_open_revalidate

I think I found the bug. This seems to fix it and a quick run with cthon didn't show any regressions. I'll add it to my test kernels to get it some testing exposure, but I think it's correct.
Comment 9 Jeff Layton 2009-09-01 07:53:40 EDT
I've added the above patch to my test kernels here:

http://people.redhat.com/jlayton

...could you test it and report back whether it fixes the issue for you (and whether you see any other problems)?
Comment 11 Rhu 2009-09-07 08:17:06 EDT
Thankyou for your work in fixing this bug. I have tested the test kernel you supplied (kernel-2.6.18-164.el5.jtltest.84.x86_64.rpm) and I can confirm that the problem seems to be gone. No unexpected problems arose, NFSv4 worked as expected, and I was unable to reproduce the problem. This seems to fix the problem with no other side affects.

Many thanks for your time
Comment 12 RHEL Product and Program Management 2009-09-25 13:36:49 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 13 Don Zickus 2009-10-06 15:38:33 EDT
in kernel-2.6.18-168.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 16 Chris Ward 2010-02-11 05:30:05 EST
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.
Comment 19 errata-xmlrpc 2010-03-30 03:43:18 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Note You need to log in before you can comment on or make changes to this bug.