Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 624131

Summary:

First attempt at nfs mounting an ext3/ext4/xfs filesystem always fails with stale NFS handle

Product:

Red Hat Enterprise Linux 6

Reporter:

Barry Marson <bmarson>

Component:

kernel

Assignee:

J. Bruce Fields <bfields>

Status:

CLOSED WORKSFORME

QA Contact:

Filesystem QE <fs-qe>

Severity:

high

Docs Contact:

Priority:

low

Version:

6.0

CC:

bfields, esandeen, jlayton, kzhang, perfbz, rwheeler, steved

Target Milestone:

Keywords:

RHELNAK

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-09-14 12:23:20 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
ethereal file of failed mount attempt. cmd was: mount -t nfs -o nfsvers=3 sfss1:/sfs1 /mnt	none

Description Barry Marson 2010-08-13 18:52:34 UTC

Created attachment 438741 [details]
ethereal file of failed mount attempt. cmd was: mount -t nfs -o nfsvers=3 sfss1:/sfs1 /mnt

Description of problem:

Ive been running into a stale NFS file handle issue during client mount since the beginning of RHEL6 testing but now the failure seems to be happening more and effecting certain testing.

When I do my NFS server testing from test to test, the nfs service is stopped, the file systems are unmounted, they are then recreated (mkfs) and remounted, the networks supporting NFS are restarted and finally nfs is started.  This is the way I have been doing it for years.

The problem is, for ext3, ext4, xfs I get a stale NFS handle on the first mount attempt from a client.  ext2 and gfs2 do not fail.  If my test harness doesnt try this initially, the benchmark innards will fail.

My RHEL6 server has been updated to SNAP 10 and running the -59 kernel.

This issue never happened with RHEL5 server.  The clients were all running an old version of RHEL4.  In fact they were at 2.6.9-27.ELsmp.  I brought them up to 2.6.9-89.ELsmp yet the problem persists.

With steved's help, I captured the ethereal log attempt at mounting.  It is attached.

The reason I'm concerned so much now is I have been unable to test one of those specific file systems successfully because of "Stale NFS errors" shortly after the benchmark tries to start.

Version-Release number of selected component (if applicable):
RHEL6 - SNAP 10  -59 kernel

How reproducible:
every time

Steps to Reproduce:
1. Running SPECsfs on the BIGI testbed
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 RHEL Program Management 2010-08-13 19:17:50 UTC

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 RHEL Program Management 2010-08-18 21:24:00 UTC

Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.

Comment 4 Barry Marson 2010-08-19 12:41:17 UTC

While not formally bz'ed, this issue may be related to the problems where running into when executing the SPECsfs benchmark. Xfs presented filesystems on the NFS server return stale NFS handle to the clients within minutes (sometimes seconds) after starting.  This is the only presented filesystem type that does this ...

Barry

Comment 5 Eric Sandeen 2010-08-19 20:39:53 UTC

The xfs issue you ran into on SPECsfs, and the fix for it, were entirely xfs-specific; if you're seeing this problem across multiple filesystems I doubt that it's related to Dave's patch for bug #624860.

-Eric

Comment 6 RHEL Program Management 2011-01-07 04:26:57 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 7 Ric Wheeler 2011-01-07 17:55:42 UTC

Bruce, can you see if this still is an issue? If so, can we fix it for 6.1 or is this a 6.2 issue?

Comment 8 J. Bruce Fields 2011-01-08 00:58:33 UTC

The attached trace shows:

  client sends MNT for /sfs1
  server replies with filehandle
    01:00:06:00:00:00:08:00:00:00:00:00:00:00:00:00:00:00:00:00
  client sends FSINFO with that filehandle
  server replies with NFS3ERR_STALE

So, clearly a server bug.

I tried running

  mkfs.xfs -f /dev/vdb
  mount /dev/vdb /exports
  service nfs start
  exportfs -orw '*:/exports'
  mount -onfsvers3 localhost:/exports /mnt/
  umount /mnt/
  umount /exports

a few times in a loop on an rhel6 guest and didn't see any failures.

So I'm stuck for now.

Barry, are you still seeing this?

Comment 9 Barry Marson 2011-01-08 16:25:43 UTC

Bruce,

Im still seeing this ... at least with the -71 kernel.

I noticed that I had not re exportfs after building the filesystems like you did after bringing them nfs online.  Doing so had no effect, nor did a showmount -e from the client just before the mount attempt.

Barry

Comment 10 RHEL Program Management 2011-02-01 06:04:57 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 11 Ric Wheeler 2011-02-01 12:50:19 UTC

Is this still an issue with the latest 6.1 code?

Thanks!

Comment 12 Barry Marson 2011-02-01 15:47:58 UTC

While I should consider upgrading the client side kernel, I've locked it down for years for way back testing ...

As of now,  a 2.6.9-89.ELsmp client trying to mount a 2.6.32-105.el6.x86_64 still fails.

Barry

Comment 13 J. Bruce Fields 2011-02-01 17:22:25 UTC

Could I get a look at the exact scripts that are doing the mkfs, nfsd start, etc.?  I just want to make sure it's not doing anything unusual.

Comment 14 RHEL Program Management 2011-02-01 18:33:31 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 15 J. Bruce Fields 2011-02-01 21:55:43 UTC

Looking at /proc/net/rpc/nfsd.fh/content after a failed mount, it looks like mountd is failing to resolve the uuid; I wonder if this is the same problem as http://www.spinics.net/lists/linux-nfs/msg00876.html (or something similar).

Comment 16 J. Bruce Fields 2011-02-03 03:35:29 UTC

I found a similar problem on an RHEL6 test machine: if I shut down nfs, unmount /dev/vdb (which holds my exported filesystem), re-mkfs /dev/vdb, remount it, restart nfs, and try to mount it, the mount succeeds--but, interestingly, comparing 'blkid /dev/vdb' with the export cache (/proc/net/rpc/nfsd.export/content) shows that mountd is still using the uuid of the *old* filesystem.

However, if instead of doing "service nfs stop" and "service nfs start" to stop and start nfs, I *just* stop and start rpc.mountd by hand, then mountd gets updated information.

Stripping out code from /etc/init.d/nfs, I eventually replaced the "start" and "stop" cases by exactly the commands I was using to start and stop rpc.mountd by hand, and still saw the difference in behavior.

My only remaining idea was that it could be some selinux rule; and indeed: looking at strace's of rpc.mountd in both cases, I see that in one an open of /dev/vdb fails, and in the other it succeeds; and after "setenforce 0", everything works.  So in my case selinux appears to be preventing liblkid from getting a current uuid.  Perhaps it is in your case too.

Could you try turning off selinux and seeing if the problem is still reliably reproduceable?

Comment 17 Barry Marson 2011-02-03 05:19:43 UTC

selinux is disabled in the clients via /etc/selinux/config

the server has selinux=0 on the boot line

Barry

Comment 18 Ric Wheeler 2011-03-17 19:07:59 UTC

Looks like this is too late for 6.1...

Comment 19 J. Bruce Fields 2011-09-13 21:25:17 UTC

Sorry, I was never able to duplicate this or work out what's going on here; are you still seeing the problem?

Comment 20 Ric Wheeler 2011-09-13 23:26:53 UTC

If not, let's close this BZ until we see it again....