RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 785931 - NFS regression - READDIRPLUS / uninterruptible sleep / rpc_wait_bit_killable / huge LOAD
Summary: NFS regression - READDIRPLUS / uninterruptible sleep / rpc_wait_bit_killable ...
Keywords:
Status: CLOSED DUPLICATE of bug 819891
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard: NFS
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-30 22:21 UTC by Marcus Alves Grando
Modified: 2018-11-29 20:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-11-06 01:51:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rpcdebug -m rpc -s all on 2.6.32-71.29.1.el6.x86_64 (1.61 MB, application/octet-stream)
2012-03-12 22:02 UTC, Marcus Alves Grando
no flags Details
rpcdebug -m rpc -s all on 2.6.32-244.el6.x86_64 (5.10 MB, application/octet-stream)
2012-03-12 22:03 UTC, Marcus Alves Grando
no flags Details

Description Marcus Alves Grando 2012-01-30 22:21:38 UTC
Hello guys,

DESCRIPTION:

We are running imap and pop3 servers on EL6.2 under NFS. Our NFS server are EMC (NS960 and VNX). Since EL6.1 almost all imap/pop3 process become D state and performance goes down. Latest kernel that works fine is 2.6.32-71.29.1.

PROBLEMS:

1. During a problem with 2.6.32-71 we changed to 2.6.32-220.4.1 and some problems happened.
Right now, with .71.29.1 kernel. Sometimes flush process eating all my CPU even without users. This problem seems like https://lkml.org/lkml/2011/9/26/387. I have applied those 2 commits on 2.6.32-220.4.1 and flush CPU problem goes away.

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=5547e8aac6f71505d621a612de2fca0dd988b439
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=59b7c05fffba030e5d9e72324691e2f99aa69b79

2. After that we found another problem with "NFS: directory XXX contains a readdir loop.  Please contact your server vendor.  Offending cookie: 999"
It seems like http://wiki.linux-nfs.org/wiki/index.php/NFS:_directory_XXX_contains_a_readdir_loop_seems_to_be_triggered_by_well-behaving_server
I have applied mentioned patch and problem goes away.

3. Back to load//D state process. I have tried to apply all those fixes:

- NFS: Fix a hang in the writeback path
- nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pages
- nfs: don't try to migrate pages with active requests

but none of them fixes my problem. Some outputs:

# echo 0 > /proc/sys/sunrpc/rpc_debug
-pid- flgs status -client- --rqstp- -timeout ---ops--
10065 0880      0 ffff8801af083a00 ffff8801e77d2260    15000 ffffffffa03334c0 nfsv3 LINK a:call_status q:xprt_pending
10032 0880      0 ffff8801f6104e00 ffff8801e77d2ee0    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
 9990 0880      0 ffff8801eab49800 ffff8801e77d20d0    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
10078 0880      0 ffff8801eab49800 ffff8801e77d1770    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
 9997 0880      0 ffff88020a782600 ffff880214220960    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
 9879 0880      0 ffff8801ff06e400 ffff8801c3650190    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
 9780 0880      0 ffff8801b40dc000 ffff8801c3650320    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
10057 0801      0 ffff88020e97cc00 ffff8801c36515e0    15000 ffffffffa03e09f0 nfsv3 READ a:call_status q:xprt_pending
 9845 0880      0 ffff8801fa10e200 ffff880210951770    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
 9999 0880      0 ffff880211c42c00 ffff8802109515e0    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
10016 0880      0 ffff880211c42c00 ffff880210950fa0    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
10013 0880      0 ffff8801f6104400 ffff880210bb0fa0    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending
10020 0880      0 ffff8801dd335000 ffff880210bb0320    15000 ffffffffa03334c0 nfsv3 READDIRPLUS a:call_status q:xprt_pending

# echo w > /proc/sysrq-trigger
trrimapd      D 0000000000000000     0  7953   1144 0x00000000
 ffff8803b1f51a08 0000000000000082 ffff8803b1f519c8 ffffffffa031bb6e
 0000000000000000 0000000000000000 ffff8803b1f519a8 0000000000000001
 ffff8803c23c4678 ffff8803b1f51fd8 000000000000f4e8 ffff8803c23c4678
Call Trace:
 [<ffffffffa031bb6e>] ? xs_send_kvec+0x8e/0xa0 [sunrpc]
 [<ffffffffa031ed30>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
 [<ffffffffa031ed54>] rpc_wait_bit_killable+0x24/0x40 [sunrpc]
 [<ffffffff814ec00f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03196d9>] ? xprt_release_xprt+0x89/0x90 [sunrpc]
 [<ffffffffa031ed30>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
 [<ffffffff814ec0b8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff81090770>] ? wake_bit_function+0x0/0x50
 [<ffffffffa0316c5c>] ? call_transmit+0x1ec/0x2c0 [sunrpc]
 [<ffffffffa031f369>] __rpc_execute+0x189/0x2a0 [sunrpc]
 [<ffffffffa031f4c3>] rpc_execute+0x43/0x50 [sunrpc]
 [<ffffffffa0317cc5>] rpc_run_task+0x75/0x90 [sunrpc]
 [<ffffffffa0317de2>] rpc_call_sync+0x42/0x70 [sunrpc]
 [<ffffffffa03bc565>] nfs3_rpc_wrapper.clone.0+0x35/0x80 [nfs]
 [<ffffffffa03bcff4>] nfs3_proc_readdir+0xd4/0x160 [nfs]
 [<ffffffffa03a59b4>] nfs_readdir_xdr_to_array+0x1d4/0x2b0 [nfs]
 [<ffffffff81126a08>] ? page_cache_sync_readahead+0x38/0x50
 [<ffffffffa03a5ab6>] nfs_readdir_filler+0x26/0xa0 [nfs]
 [<ffffffff8111227b>] do_read_cache_page+0x7b/0x170
 [<ffffffffa03a5a90>] ? nfs_readdir_filler+0x0/0xa0 [nfs]
 [<ffffffff81189900>] ? filldir+0x0/0xe0
 [<ffffffff81189900>] ? filldir+0x0/0xe0
 [<ffffffff811123b9>] read_cache_page_async+0x19/0x20
 [<ffffffff811123ce>] read_cache_page+0xe/0x20
 [<ffffffffa03a5c7a>] nfs_readdir+0x14a/0x580 [nfs]
 [<ffffffffa03bf0d0>] ? nfs3_decode_dirent+0x0/0x3d0 [nfs]
 [<ffffffff81189900>] ? filldir+0x0/0xe0
 [<ffffffff81189b70>] vfs_readdir+0xc0/0xe0
 [<ffffffff81189cf5>] sys_getdents+0x85/0xf0
 [<ffffffff814edc55>] ? page_fault+0x25/0x30
 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
trrimapd      D 0000000000000000     0  7955   1144 0x00000000
 ffff8803b1ff9a08 0000000000000082 0000000000000000 ffffffffa031bb6e
 0000000000000000 0000000000000000 ffff8803b1ff99a8 0000000000000001
 ffff8803c618f0f8 ffff8803b1ff9fd8 000000000000f4e8 ffff8803c618f0f8
Call Trace:
 [<ffffffffa031bb6e>] ? xs_send_kvec+0x8e/0xa0 [sunrpc]
 [<ffffffffa031ed30>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
 [<ffffffffa031ed54>] rpc_wait_bit_killable+0x24/0x40 [sunrpc]
 [<ffffffff814ec00f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03196d9>] ? xprt_release_xprt+0x89/0x90 [sunrpc]
 [<ffffffffa031ed30>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
 [<ffffffff814ec0b8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff81090770>] ? wake_bit_function+0x0/0x50
 [<ffffffffa0316c5c>] ? call_transmit+0x1ec/0x2c0 [sunrpc]
 [<ffffffffa031f369>] __rpc_execute+0x189/0x2a0 [sunrpc]
 [<ffffffffa031f4c3>] rpc_execute+0x43/0x50 [sunrpc]
 [<ffffffffa0317cc5>] rpc_run_task+0x75/0x90 [sunrpc]
 [<ffffffffa0317de2>] rpc_call_sync+0x42/0x70 [sunrpc]
 [<ffffffffa03bc565>] nfs3_rpc_wrapper.clone.0+0x35/0x80 [nfs]
 [<ffffffffa03bcff4>] nfs3_proc_readdir+0xd4/0x160 [nfs]
 [<ffffffffa03a59b4>] nfs_readdir_xdr_to_array+0x1d4/0x2b0 [nfs]
 [<ffffffffa03a5ab6>] nfs_readdir_filler+0x26/0xa0 [nfs]
 [<ffffffff8111227b>] do_read_cache_page+0x7b/0x170
 [<ffffffffa03a5a90>] ? nfs_readdir_filler+0x0/0xa0 [nfs]
 [<ffffffff81189900>] ? filldir+0x0/0xe0
 [<ffffffff811123b9>] read_cache_page_async+0x19/0x20
 [<ffffffff811123ce>] read_cache_page+0xe/0x20
 [<ffffffffa03a5c7a>] nfs_readdir+0x14a/0x580 [nfs]
 [<ffffffffa03bf0d0>] ? nfs3_decode_dirent+0x0/0x3d0 [nfs]
 [<ffffffff81189900>] ? filldir+0x0/0xe0
 [<ffffffff81189b70>] vfs_readdir+0xc0/0xe0
 [<ffffffff81189cf5>] sys_getdents+0x85/0xf0
 [<ffffffff814edc55>] ? page_fault+0x25/0x30
 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b

Supressed because all process are the same trace.

Anyone else to do?

Best regards

Comment 2 Marcus Alves Grando 2012-01-31 01:47:40 UTC
Hello,

Maybe it's related with this one: https://bugzilla.redhat.com/show_bug.cgi?id=688130

I'm trying run with "nordirplus". Until now, looks fine. Tomorrow during the load it's possible to confirm.

I don't know how exactly it's fixed in kernel.org.

There's another BZ related with my 3rd problem. Maybe close that one with duplicate of this one, because other problems that need to be backported.
https://bugzilla.redhat.com/show_bug.cgi?id=785830

Best regards

Comment 3 Marcus Alves Grando 2012-01-31 14:32:08 UTC
Ok. We are testing those cenarios:

2.6.32-71.29.1                              - Load +- 41
2.6.32-220.4.1 + patches above + nordirplus - Load +- 25
2.6.32-71                                   - Load +- 25

With nordirplus works fine. Do you need something else?

Best regards

Comment 6 Marcus Alves Grando 2012-03-12 22:01:23 UTC
Hello guys,

I testes another kernel 2.6.32-244.el6.x86_64 provided by RedHat support, and has the same problem with all others.

To make all things simple I tested only two kernels (2.6.32-71.29.1.el6.x86_64 and 2.6.32-244.el6.x86_64). I generated rpcdebug -m rpc -s all during 2 seconds and I'll add to BZ.

Looking at nfsstat during this period, something is really wrong. Look below:

* 2.6.32-71.29.1.el6.x86_64:
RPC:
	calls: 7365
	retrans: 0
	authrefrsh: 0
NFS3:
	null: 0
	getattr: 1426
	setattr: 126
	lookup: 973
	access: 1607
	readlink: 0
	read: 1790
	write: 72
	create: 7
	mkdir: 0
	symlink: 0
	mknod: 0
	remove: 119
	rmdir: 0
	rename: 49
	link: 85
	readdir: 660
	readdirplus: 390
	fsstat: 0
	fsinfo: 0
	pathconf: 0
	commit: 62

* 2.6.32-244.el6.x86_64:
RPC:
	calls: 15963
	retrans: 0
	authrefrsh: 15964
NFS3:
	null: 0
	getattr: 2187
	setattr: 44
	lookup: 636
	access: 1781
	readlink: 0
	read: 8394
	write: 54
	create: 6
	mkdir: 0
	symlink: 0
	mknod: 0
	remove: 71
	rmdir: 0
	rename: 56
	link: 42
	readdir: 0
	readdirplus: 2647
	fsstat: 0
	fsinfo: 0
	pathconf: 0
	commit: 46

The 2.6.32-244.el6.x86_64 server are running with 1/4 of users, comparated with another server.

Looking the nfsstat you can see a huge increase in 'authrefrsh' and that all requests of readdir is goning to readdirplus. Besides, the server with 1/4 of users are doing 2x more  calls.

Comparing rpcdebug, I found this:

* 2.6.32-71.29.1.el6.x86_64:
kernel: RPC: 34691 call_start nfs3 proc READDIRPLUS (sync)
kernel: RPC: 34691 call_reserve (status 0)
kernel: RPC: 34691 reserved req ffff8801b8173598 xid d54ee4ec
kernel: RPC: 34691 call_reserveresult (status 0)
kernel: RPC: 34691 call_allocate (status 0)
kernel: RPC: 34691 allocated buffer of size 572 at ffff88021efbb800

* 2.6.32-244.el6.x86_64:kernel: RPC: 17812 call_start nfs3 proc READDIRPLUS (sync)
kernel: RPC: 17812 call_reserve (status 0)
kernel: RPC: 17812 reserved req ffff8801b84681b0 xid 890edccb
kernel: RPC: 17812 call_reserveresult (status 0)
kernel: RPC: 17812 call_refresh (status 0)
kernel: RPC: 17812 refreshing UNIX cred ffff8803f0a3e900
kernel: RPC: 17812 call_refreshresult (status 0)
kernel: RPC: 17812 call_allocate (status 0)
kernel: RPC: 17812 allocated buffer of size 572 at ffff8801c8683000

I don't know why 'call_refresh' are calling after 'call_reserveresult', but on the old behaviour it's not happens.

Best regards

Comment 7 Marcus Alves Grando 2012-03-12 22:02:33 UTC
Created attachment 569511 [details]
rpcdebug -m rpc -s all on 2.6.32-71.29.1.el6.x86_64

Comment 8 Marcus Alves Grando 2012-03-12 22:03:20 UTC
Created attachment 569512 [details]
rpcdebug -m rpc -s all on 2.6.32-244.el6.x86_64

Comment 10 Tamas Vincze 2012-03-23 19:58:04 UTC
Using CentOS kernel 2.6.32-220.7.1.el6.x86_64 and I also see the authrefrsh count skyrocketing:

Client rpc stats:
calls      retrans    authrefrsh
105283279   0          105285429

Client nfs v3:
null         getattr      setattr      lookup       access       readlink     
0         0% 55955     0% 4901      0% 8735      0% 10309     0% 87        0% 
read         write        create       mkdir        symlink      mknod        
54463085 51% 50703785 48% 4352      0% 193       0% 176       0% 0         0% 
remove       rmdir        rename       link         readdir      readdirplus  
1136      0% 54        0% 264       0% 2         0% 0         0% 905       0% 
fsstat       fsinfo       pathconf     commit       
1965      0% 10        0% 5         0% 0         0% 

I'm working with large files (10s of GB) and in the middle of sequential reads and writes it stalls to a few KB/s. If I do an ls (from a separate terminal) of the directrory where those files are then the speed resumes to the nornal 100MB/s while ls itself is running for 3-4 minutes.

NFSv4 previously crashed the whole box, that's why I switched to NFS3.

Comment 11 Jeff Layton 2012-03-27 11:28:22 UTC
I think the first thing that needs to happen here is to define this problem better. It looks like there are 3-4 different problem reports all
squashed together here. Some of those are likely already fixed in 6.3 kernels.

I'm really only interested in problems that are not already fixed in 6.3
kernels. What we probably need to have the reporters of this bug test some
very recent 6.3 kernels (especially those with the appropriate readdir and
other NFS fixes), and then restate the problems that you're seeing on top
of that.

For instance, Marcus said:

"I testes another kernel 2.6.32-244.el6.x86_64 provided by RedHat support, and
has the same problem with all others."

...what I'm not clear on is which problem you mean since you enumerated 3-4
different problems originally. He also said:

"Looking the nfsstat you can see a huge increase in 'authrefrsh' and that all
requests of readdir is goning to readdirplus."

I'm not sure we can call the authrefresh counter increasing a "problem". It
may just be an artefact of other changes in the RPC layer. IOW, it may be a 
possible symptom, but is not a problem in and of itself. In any case, it
does not indicate an increase in the number of RPC calls, but rather an
increase in the number of calls to refresh the credentials for making a new
RPC call. For anything but GSSAPI, that's basically just a set_bit call.

The change to use readdirplus more widely is also expected. A patch series
added to RHEL6.1 removed the readdir plus directory size limit. That may or
may not have a performance impact with your workload.

If you're experiencing performance issues, then it's critical that we
understand what is actually slow, preferably at the system call level. Only
with that info will we be able to make progress here. You'll also need to
help quantify the slowness. IOW, we need to understand what got slower and
how much slower it is. Without that, we can't quantify any possible performance
improvement that we might make.

Comment 12 Marcus Alves Grando 2012-03-27 17:40:09 UTC
Hello Jeff,

(In reply to comment #11)
> I think the first thing that needs to happen here is to define this problem
> better. It looks like there are 3-4 different problem reports all
> squashed together here. Some of those are likely already fixed in 6.3 kernels.
> 
> I'm really only interested in problems that are not already fixed in 6.3
> kernels. What we probably need to have the reporters of this bug test some
> very recent 6.3 kernels (especially those with the appropriate readdir and
> other NFS fixes), and then restate the problems that you're seeing on top
> of that.
> 
> For instance, Marcus said:
> 
> "I testes another kernel 2.6.32-244.el6.x86_64 provided by RedHat support, and
> has the same problem with all others."

Sure, I'll test the latest 6.3 kernel and describe better the problems. Can you provide the latest 6.3 kernel with perf rpm? With perf top I saw a number of calls that I can't see on the latest 6.1 kernel, like below:

   PerfTop:    3486 irqs/sec  kernel:74.6%  exact:  0.0% [1000Hz cycles],  (all, 12 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------                                                              

             samples  pcnt function                            DSO
             _______ _____ ___________________________________ _________________________________________________________________________________

             3346.00  5.2% __GI_vfprintf                       /lib64/libc-2.12.so                                                              
             2169.00  3.4% inflate                             /lib64/libz.so.1.2.3                                                             
             1610.00  2.5% nfs_permission                      /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/fs/nfs/nfs.ko                  
             1442.00  2.3% _IO_default_xsputn_internal         /lib64/libc-2.12.so                                                              
             1437.00  2.2% inflate_fast                        /lib64/libz.so.1.2.3                                                             
             1110.00  1.7% __GI_____strtoll_l_internal         /lib64/libc-2.12.so                                                              
              988.00  1.5% find_inode                          [kernel.kallsyms]                                                                
              922.00  1.4% rpc_wake_up_queued_task             /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/net/sunrpc/sunrpc.ko           
              913.00  1.4% _spin_lock                          [kernel.kallsyms]                                                                
              890.00  1.4% __d_lookup                          [kernel.kallsyms]                                                                
              850.00  1.3% TrrMailFolderGetNextMessage         /usr/lib/libtrrmail.so.3.5.1                                                     
              836.00  1.3% bnx2_poll_work                      /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/drivers/net/bnx2.ko            
              801.00  1.3% __memcpy                            [kernel.kallsyms]                                                                
              692.00  1.1% find_busiest_group                  [kernel.kallsyms]                                                                
              685.00  1.1% rpc_free_client                     /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/net/sunrpc/sunrpc.ko           
              663.00  1.0% find_next_bit                       [kernel.kallsyms]                                                                
              641.00  1.0% copy_user_generic_string            [kernel.kallsyms]                                                                
              610.00  1.0% clear_page_c                        [kernel.kallsyms]                                                                
              596.00  0.9% __strchr_sse2                       /lib64/libc-2.12.so                                                              
              570.00  0.9% generic_pkt_to_tuple                /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/net/netfilter/nf_conntrack.ko  
              541.00  0.8% _itoa_word                          /lib64/libc-2.12.so                                                              
              535.00  0.8% memcpy                              /lib64/libc-2.12.so                                                              
              485.00  0.8% __list_add                          [kernel.kallsyms]                                                                
              465.00  0.7% xs_udp_timer                        /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/net/sunrpc/sunrpc.ko           
              437.00  0.7% nf_conntrack_free                   /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/net/netfilter/nf_conntrack.ko  
              436.00  0.7% kmem_cache_alloc                    [kernel.kallsyms]                                                                
              429.00  0.7% inode_init_once                     [kernel.kallsyms]                                                                
              424.00  0.7% _int_malloc                         /lib64/libc-2.12.so                                                              
              423.00  0.7% kfree                               [kernel.kallsyms]                                                                
              419.00  0.7% nfs_post_op_update_inode            /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/fs/nfs/nfs.ko                  
              411.00  0.6% TrrMailLog                          /usr/lib/libtrrmail.so.3.5.1                                                     
              404.00  0.6% crc32                               /lib64/libz.so.1.2.3                                                             
              394.00  0.6% TrrMailMsgGetPath                   /usr/lib/libtrrmail.so.3.5.1                                                     
              392.00  0.6% nfs_cache_upcall                    /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/fs/nfs/nfs.ko

> 
> ...what I'm not clear on is which problem you mean since you enumerated 3-4
> different problems originally. He also said:
> 
> "Looking the nfsstat you can see a huge increase in 'authrefrsh' and that all
> requests of readdir is goning to readdirplus."
> 
> I'm not sure we can call the authrefresh counter increasing a "problem". It
> may just be an artefact of other changes in the RPC layer. IOW, it may be a 
> possible symptom, but is not a problem in and of itself. In any case, it
> does not indicate an increase in the number of RPC calls, but rather an
> increase in the number of calls to refresh the credentials for making a new
> RPC call. For anything but GSSAPI, that's basically just a set_bit call.

Looking the number of rpc calls seems that's nothing change. It's almost equal the number of 6.1 and 6.3 kernel.

> 
> The change to use readdirplus more widely is also expected. A patch series
> added to RHEL6.1 removed the readdir plus directory size limit. That may or
> may not have a performance impact with your workload.
> 
> If you're experiencing performance issues, then it's critical that we
> understand what is actually slow, preferably at the system call level. Only
> with that info will we be able to make progress here. You'll also need to
> help quantify the slowness. IOW, we need to understand what got slower and
> how much slower it is. Without that, we can't quantify any possible performance
> improvement that we might make.

I'll try to figure out with the latest kernel.

Thanks

Comment 13 Jeff Layton 2012-03-27 18:03:12 UTC
(In reply to comment #12)

> Sure, I'll test the latest 6.3 kernel and describe better the problems. Can you
> provide the latest 6.3 kernel with perf rpm? With perf top I saw a number of
> calls that I can't see on the latest 6.1 kernel, like below:
> 

No, that will need to be done by the support person handling the support case.
Please ask them to give you the latest kernel possible so we can rule out any
fixes that are already slated for 6.3. As of today that would be -257.el6.

Comment 14 Marcus Alves Grando 2012-03-27 18:11:11 UTC
(In reply to comment #13)
> (In reply to comment #12)
> 
> > Sure, I'll test the latest 6.3 kernel and describe better the problems. Can you
> > provide the latest 6.3 kernel with perf rpm? With perf top I saw a number of
> > calls that I can't see on the latest 6.1 kernel, like below:
> > 
> 
> No, that will need to be done by the support person handling the support case.
> Please ask them to give you the latest kernel possible so we can rule out any
> fixes that are already slated for 6.3. As of today that would be -257.el6.

OK.

Comment 17 Jeff Layton 2012-05-01 13:09:44 UTC
Any more word on this? Did the latest kernel help anything?

Comment 19 RHEL Program Management 2012-07-02 15:45:13 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 20 Marcus Alves Grando 2012-11-06 00:47:18 UTC
Hello Jeff,

During last two days I returned to this problem. I have found the problem.

It's related with the remove the limit to use READDIRPLUS (NFS: remove readdir plus limit). 

The fix below (NFS: Adapt readdirplus to application usage patterns) solve my problem, can you add on the default EL kernel?

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=d69ee9b85541a69a1092f5da675bd23256dc62af

For all the other problems related on the beginning, all it's fixed until -279.11.1. 

Best regards

Comment 21 Jeff Layton 2012-11-06 01:51:01 UTC

*** This bug has been marked as a duplicate of bug 819891 ***


Note You need to log in before you can comment on or make changes to this bug.