Bug 706126 - fuser output of NFS mounts is incoherent
Summary: fuser output of NFS mounts is incoherent
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: psmisc
Version: 5.6
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jaromír Cápík
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On: 705424 772991
Blocks: 743405
TreeView+ depends on / blocked
 
Reported: 2011-05-19 14:58 UTC by Anton Mark
Modified: 2018-11-30 23:08 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 705424
Environment:
Last Closed: 2014-06-02 13:22:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Anton Mark 2011-05-19 14:58:44 UTC
+++ This bug was initially created as a clone of Bug #705424 +++

Description of problem:
lsof output of NFS mounts is incoherent

Version-Release number of selected component (if applicable):
RHEL5 (lsof-4.78-3) 

How reproducible:
Setup NFS server with two exports from same file system.  Then mount these on a client.  Open files on the client so that they show up in lsof output.

Steps to Reproduce:
1) /etc/exports
 
/vol/sat_common/fw_otis_logs *(rw)
/vol/sat_common/j2re_bin/linux *(rw)

# ll /vol/sat_common/fw_otis_logs
total 4
-rw-r--r-- 1 root root 0 Dec 10 14:33 file1 <------

# ll /vol/sat_common/j2re_bin/linux
total 4
-rw-r--r-- 1 root root 0 Dec 10 14:33 file2 <------

2) Output of 'mount' on Client :

10.65.210.49:/vol/sat_common/fw_otis_logs on /opt/ops_mgmt/services type nfs (rw,addr=10.65.210.49)
10.65.210.49:/vol/sat_common/j2re_bin/linux on /opt/java type nfs (rw,addr=10.65.210.49)

Issue :

On NFS client, Use 'vi' editor and open 'file1' and 'file2' in different terminal :

# vi /opt/ops_mgmt/services/file1
# vi /opt/java/file2

In a third terminal run 'lsof -N' . 

COMMAND   PID USER   FD   TYPE DEVICE  SIZE   NODE NAME
vi      20897 root    4u   REG   0,25 12288 453393 /opt/ops_mgmt/services/.file1.swp (10.65.210.49:/vol/sat_common/j2re_bin/linux) <------
vi      20905 root    4u   REG   0,25  4096 453394 /opt/java/.file2.swp (10.65.210.49:/vol/sat_common/j2re_bin/linux)

Above output says 'file1.swp' is from '10.65.210.49:/vol/sat_common/j2re_bin/linux' which is wrong. Ideally it should shown as from 10.65.210.49:/vol/sat_common/fw_otis_logs
  
Actual results:
Output of NFS location is not correct.

Expected results:
Output of NFS location should show the proper location.

Additional info:
Appears to be related to how the kernel treats the super blocks.

--- Additional comment from sprabhu on 2011-05-18 05:47:45 EDT ---

More information about the problem:

lsof relies on the device numbers returned by the stat call to determine the share that particular file belongs to. For NFS, the device numbers returned by the stat call are allocated within the superblock. 

Older versions of the Linux kernel allocated different super_blocks(kernel structures which represent a mount within the kernel) for each mount of the same NFS share. 

On newer versions of the kernel, the kernel uses the network address of the server and the fsid(provided by the NFS server) to detect if the NFS share which is being mounted is the same as a NFS share which is already mounted. If an existing NFS share is detected, the superblock stricture of the old NFS share is shared with the new NFS mount point. The side effect is that the device numbers shown would be the same for both NFS shares. This is technically correct because it is essentially the same device which is being exported from the NFS share. This change was introduced in version 2.6.18. 

The old behaviour was considered unsafe since the data cache is associated with the superblock. Earlier versions resulted in a new superblock each time. This meant that the same file and directories could have different data caches within the same machine which could go out of sync when any one are is updated.

lsof trips up on later versions of the kernel because the NFS client allocates the same super_block structures which inturn use the same device numbers. From an NFS perspective, this particular behaviour is as per design.

I would say that this is an lsof bug which should take this sharing of superblocks by NFS clients into consideration.

The possible workarounds are
1) Use different fsids on the NFS server. This is the easiest and best option.
2) Use the nosharecache mount option
3) Replace lsof with another simple utility which returns the NFS mount points.

--- Additional comment from kelly.setzer on 2011-05-18 11:28:01 EDT ---

fuser exhibits similar behavior.

Comment 1 Kelly Setzer 2011-05-19 21:24:12 UTC
This seems to be the patch that triggered this issue:  http://linux-nfs.org/Linux-2.6.x/2.6.18-rc4/linux-2.6.18-032-nfs-unify-sb.dif

The patch includes this note:
Signed-Off-By: David Howells <dhowells>
Signed-off-by: Trond Myklebust <Trond.Myklebust>

Comment 2 Kelly Setzer 2011-05-19 21:25:03 UTC
I have tested fuser from the latest version of psmisc (22.13) and it still exhibits the same issue.

Comment 5 RHEL Program Management 2011-09-23 00:33:25 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 8 RHEL Program Management 2012-04-02 10:47:29 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 15 RHEL Program Management 2013-05-01 06:49:27 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 19 RHEL Program Management 2014-03-07 12:17:33 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in the  last planned RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX. To request that Red Hat re-consider this request, please re-open the bugzilla via  appropriate support channels and provide additional business and/or technical details about its importance to you.

Comment 20 RHEL Program Management 2014-06-02 13:22:01 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).


Note You need to log in before you can comment on or make changes to this bug.