Bug 720712

Summary: ls hangs for a specific directory (nfsv3) in kernels starting at -157
Product: Red Hat Enterprise Linux 6 Reporter: Jeff Moyer <jmoyer>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Petr Beňas <pbenas>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: bfields, jiali, jlayton, kzhang, pbenas, pstehlik, rwheeler, steved, yanwang
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-170.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 13:49:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
capture showing what the server is sending none

Description Jeff Moyer 2011-07-12 15:06:22 UTC
Description of problem:
Starting with the -157 kernel, ls on ~jmoyer/News/drafts/drafts would hang.  This is how it is mounted:

homedirs.bos.redhat.com:/vol/data/home/boston /home/boston nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.16.255.101,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.16.255.101 0 0

Looking into it further, the ls process was stuck in a loop doing readdir(), iterating over the same list of entries infinitely.

I backported the following 3 commits, and the behaviour is better:
commit 8ef2ce3e16d9bec6cf015207c1c82a5b864046ac
        NFS: Detect loops in a readdir due to bad cookies
commit 480c2006ebb44ae03165695db7b3e38c04e0d102
        NFS: Create nfs_open_dir_context
commit e47c085afb3d16cbc6a4bfb10a3b074bb7c58998
        NFS: Ensure that we update the readdir filp->f_pos correctly

The output now looks like this:

$ ls ~/News/drafts/drafts/
ls: reading directory /home/boston/jmoyer/News/drafts/drafts/: Too many levels of symbolic links
1.orig   15       19~      24       29       33.orig  36~      40       44
11.orig  16       20       25       30       34       37       41       7
12       17       21       25.orig  31       35       37.orig  42       8.orig
12.orig  18       22       26       31.orig  35.orig  38       43
13       19       23       27       32       36       38.orig  43.orig
14       19.orig  23.orig  28       33       36.orig  39       43~

dmesg contains:
NFS: directory drafts/drafts contains a readdir loop.  Please contact your server vendor.  Offending cookie: 2

On older kernels, this message was never printed, so I'm left wondering whether there really is a server problem, or if we've introduced a bug.

Version-Release number of selected component (if applicable):
kernel-2.6.32-157.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. ls ~jmoyer/News/drafts/drafts
  
Actual results:
ls hangs, eating up memory until the OOM killer gets it

Expected results:
ls returns results

Additional info:

Comment 3 Jeff Layton 2011-07-12 15:22:40 UTC
The capture shows the server sending '.' and '..' in both readdir replies. So this is probably a server bug, but it's still a regression as the client no longer deals with this as well as it used to.

Comment 5 Jeff Layton 2011-07-12 18:33:14 UTC
(In reply to comment #3)
> The capture shows the server sending '.' and '..' in both readdir replies. So
> this is probably a server bug, but it's still a regression as the client no
> longer deals with this as well as it used to.

Disregard this. I didn't notice that these readdirs were for entirely different dir filehandles. I don't see a problem right offhand with the readdir reply. From what I can tell each entry has its own cookie.

Comment 6 Kyle McMartin 2011-07-20 14:33:38 UTC
Patch(es) available on kernel-2.6.32-170.el6

Comment 9 Jian Li 2011-07-25 09:41:15 UTC
set qa_ack+ as Jeffrey says 
"
How reproducible:
100%
"

Comment 13 errata-xmlrpc 2011-12-06 13:49:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html