Description of problem: lsof output of NFS mounts is incoherent Version-Release number of selected component (if applicable): RHEL5 (lsof-4.78-3) How reproducible: Setup NFS server with two exports from same file system. Then mount these on a client. Open files on the client so that they show up in lsof output. Steps to Reproduce: 1) /etc/exports /vol/sat_common/fw_otis_logs *(rw) /vol/sat_common/j2re_bin/linux *(rw) # ll /vol/sat_common/fw_otis_logs total 4 -rw-r--r-- 1 root root 0 Dec 10 14:33 file1 <------ # ll /vol/sat_common/j2re_bin/linux total 4 -rw-r--r-- 1 root root 0 Dec 10 14:33 file2 <------ 2) Output of 'mount' on Client : 10.65.210.49:/vol/sat_common/fw_otis_logs on /opt/ops_mgmt/services type nfs (rw,addr=10.65.210.49) 10.65.210.49:/vol/sat_common/j2re_bin/linux on /opt/java type nfs (rw,addr=10.65.210.49) Issue : On NFS client, Use 'vi' editor and open 'file1' and 'file2' in different terminal : # vi /opt/ops_mgmt/services/file1 # vi /opt/java/file2 In a third terminal run 'lsof -N' . COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME vi 20897 root 4u REG 0,25 12288 453393 /opt/ops_mgmt/services/.file1.swp (10.65.210.49:/vol/sat_common/j2re_bin/linux) <------ vi 20905 root 4u REG 0,25 4096 453394 /opt/java/.file2.swp (10.65.210.49:/vol/sat_common/j2re_bin/linux) Above output says 'file1.swp' is from '10.65.210.49:/vol/sat_common/j2re_bin/linux' which is wrong. Ideally it should shown as from 10.65.210.49:/vol/sat_common/fw_otis_logs Actual results: Output of NFS location is not correct. Expected results: Output of NFS location should show the proper location. Additional info: Appears to be related to how the kernel treats the super blocks.
More information about the problem: lsof relies on the device numbers returned by the stat call to determine the share that particular file belongs to. For NFS, the device numbers returned by the stat call are allocated within the superblock. Older versions of the Linux kernel allocated different super_blocks(kernel structures which represent a mount within the kernel) for each mount of the same NFS share. On newer versions of the kernel, the kernel uses the network address of the server and the fsid(provided by the NFS server) to detect if the NFS share which is being mounted is the same as a NFS share which is already mounted. If an existing NFS share is detected, the superblock stricture of the old NFS share is shared with the new NFS mount point. The side effect is that the device numbers shown would be the same for both NFS shares. This is technically correct because it is essentially the same device which is being exported from the NFS share. This change was introduced in version 2.6.18. The old behaviour was considered unsafe since the data cache is associated with the superblock. Earlier versions resulted in a new superblock each time. This meant that the same file and directories could have different data caches within the same machine which could go out of sync when any one are is updated. lsof trips up on later versions of the kernel because the NFS client allocates the same super_block structures which inturn use the same device numbers. From an NFS perspective, this particular behaviour is as per design. I would say that this is an lsof bug which should take this sharing of superblocks by NFS clients into consideration. The possible workarounds are 1) Use different fsids on the NFS server. This is the easiest and best option. 2) Use the nosharecache mount option 3) Replace lsof with another simple utility which returns the NFS mount points.
fuser exhibits similar behavior.
This seems to be the patch that triggered this issue: http://linux-nfs.org/Linux-2.6.x/2.6.18-rc4/linux-2.6.18-032-nfs-unify-sb.dif The patch includes this note: Signed-Off-By: David Howells <dhowells> Signed-off-by: Trond Myklebust <Trond.Myklebust>
I have tested fuser from the latest version of psmisc (22.13) and it still exhibits the same issue.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0206.html