+++ This bug was initially created as a clone of Bug #705424 +++ Description of problem: lsof output of NFS mounts is incoherent Version-Release number of selected component (if applicable): RHEL5 (lsof-4.78-3) How reproducible: Setup NFS server with two exports from same file system. Then mount these on a client. Open files on the client so that they show up in lsof output. Steps to Reproduce: 1) /etc/exports /vol/sat_common/fw_otis_logs *(rw) /vol/sat_common/j2re_bin/linux *(rw) # ll /vol/sat_common/fw_otis_logs total 4 -rw-r--r-- 1 root root 0 Dec 10 14:33 file1 <------ # ll /vol/sat_common/j2re_bin/linux total 4 -rw-r--r-- 1 root root 0 Dec 10 14:33 file2 <------ 2) Output of 'mount' on Client : 10.65.210.49:/vol/sat_common/fw_otis_logs on /opt/ops_mgmt/services type nfs (rw,addr=10.65.210.49) 10.65.210.49:/vol/sat_common/j2re_bin/linux on /opt/java type nfs (rw,addr=10.65.210.49) Issue : On NFS client, Use 'vi' editor and open 'file1' and 'file2' in different terminal : # vi /opt/ops_mgmt/services/file1 # vi /opt/java/file2 In a third terminal run 'lsof -N' . COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME vi 20897 root 4u REG 0,25 12288 453393 /opt/ops_mgmt/services/.file1.swp (10.65.210.49:/vol/sat_common/j2re_bin/linux) <------ vi 20905 root 4u REG 0,25 4096 453394 /opt/java/.file2.swp (10.65.210.49:/vol/sat_common/j2re_bin/linux) Above output says 'file1.swp' is from '10.65.210.49:/vol/sat_common/j2re_bin/linux' which is wrong. Ideally it should shown as from 10.65.210.49:/vol/sat_common/fw_otis_logs Actual results: Output of NFS location is not correct. Expected results: Output of NFS location should show the proper location. Additional info: Appears to be related to how the kernel treats the super blocks. --- Additional comment from sprabhu on 2011-05-18 05:47:45 EDT --- More information about the problem: lsof relies on the device numbers returned by the stat call to determine the share that particular file belongs to. For NFS, the device numbers returned by the stat call are allocated within the superblock. Older versions of the Linux kernel allocated different super_blocks(kernel structures which represent a mount within the kernel) for each mount of the same NFS share. On newer versions of the kernel, the kernel uses the network address of the server and the fsid(provided by the NFS server) to detect if the NFS share which is being mounted is the same as a NFS share which is already mounted. If an existing NFS share is detected, the superblock stricture of the old NFS share is shared with the new NFS mount point. The side effect is that the device numbers shown would be the same for both NFS shares. This is technically correct because it is essentially the same device which is being exported from the NFS share. This change was introduced in version 2.6.18. The old behaviour was considered unsafe since the data cache is associated with the superblock. Earlier versions resulted in a new superblock each time. This meant that the same file and directories could have different data caches within the same machine which could go out of sync when any one are is updated. lsof trips up on later versions of the kernel because the NFS client allocates the same super_block structures which inturn use the same device numbers. From an NFS perspective, this particular behaviour is as per design. I would say that this is an lsof bug which should take this sharing of superblocks by NFS clients into consideration. The possible workarounds are 1) Use different fsids on the NFS server. This is the easiest and best option. 2) Use the nosharecache mount option 3) Replace lsof with another simple utility which returns the NFS mount points. --- Additional comment from kelly.setzer on 2011-05-18 11:28:01 EDT --- fuser exhibits similar behavior.
This seems to be the patch that triggered this issue: http://linux-nfs.org/Linux-2.6.x/2.6.18-rc4/linux-2.6.18-032-nfs-unify-sb.dif The patch includes this note: Signed-Off-By: David Howells <dhowells> Signed-off-by: Trond Myklebust <Trond.Myklebust>
I have tested fuser from the latest version of psmisc (22.13) and it still exhibits the same issue.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate, in the next release of Red Hat Enterprise Linux.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in the last planned RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX. To request that Red Hat re-consider this request, please re-open the bugzilla via appropriate support channels and provide additional business and/or technical details about its importance to you.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).