Bug 146860

Summary: directory lookup contention for dcache_lock
Product: Red Hat Enterprise Linux 3 Reporter: Kurtis Rader <kdrader>
Component: kernelAssignee: Alexander Viro <aviro>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: medium    
Version: 3.0CC: k.georgiou, lwoodman, petrides, riel, sct, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 19:07:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
source and makefile to recreate problem none

Description Kurtis Rader 2005-02-02 04:57:25 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
A high pathname lookup rate by two tasks for objects in the same
directory, on a SMP system, can result in one or more CPUs spinning in
kernel mode for extended periods.  I found that running a single
instance of the attached  program would sporadically induce high CPU
load.  I found I could induce  the problem fairly reliably by forcing
each process to run on a different CPU.  For example,

    taskset 01 ./testd_c ; taskset 02 ./testd_c

This will often leave one CPU idle and the other 100% in system mode.
Occassionally both CPUs will be 100% busy in system mode.

A profiled kernel shows the following:

c029fa10 6767     0.955674    direct_strncpy_from_user
c017ee1a 9521     1.34461     .text.lock.dcache
c017d710 10661    1.50561     dput
c0172750 11062    1.56224     path_release
c029fc95 12025    1.69824     .text.lock.dec_and_lock
c0172620 13250    1.87124     permission
c0172ac0 13842    1.95484     link_path_walk
c0173470 39061    5.51641     path_init
c029fc50 75117    10.6084     atomic_dec_and_lock
c017e290 96373    13.6103     d_lookup

The atomic_dec_and_lock() is apparently from this statement at the top
of the dput() function:

        if (!atomic_dec_and_lock(&dentry->d_count, &dcache_lock))
                return;

The problem does occur on the older RHEL 2.1 kernels but much less
frequently.


Version-Release number of selected component (if applicable):
kernel-2.4.21-27.ELsmp

How reproducible:
Sometimes

Steps to Reproduce:
Run the attached program on a system with two CPUs. Force each task to
run on a different CPU using the taskset(1) command.    

Actual Results:  Occasionally one task will spend 100% of its time in
kernel mode for extended periods.

Expected Results:  Accumulated CPU time per task will increase at a
steady rate in direct proportion to the number of stat() calls.

Additional info:

Comment 1 Kurtis Rader 2005-02-02 04:58:51 UTC
Created attachment 110543 [details]
source and makefile to recreate problem

Comment 2 Kurtis Rader 2005-02-03 23:14:57 UTC
Please note that the contention, while sporadic, is severe when it
does occur. When it does occur a task can spin in the kernel
(apparently trying to acquire dcache_lock) for upwards of ten seconds.

Comment 3 RHEL Program Management 2007-10-19 19:07:56 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.