Bug 117400

Summary: dentunusd reported as around 4 billion
Product: Red Hat Enterprise Linux 2.1 Reporter: Angus Thomas <athomas>
Component: kernelAssignee: Don Howard <dhoward>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: bnocera, burks, jlayton
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-31 00:36:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 143573    
Attachments:
Description Flags
Ensure that dcache_lock is held when updating dentry_stat. none

Description Angus Thomas 2004-03-03 16:58:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5)
Gecko/20031007 Firebird/0.7

Description of problem:
[thomasaa@xstm7955pap vm}$ sar -v
Linux 2.4.9-e.12enterprise (xstm7955pap)        03/03/04
07:20:10    4294967288       321      3.92      2128         0     
0.00         0      0.00         0      0.00
07:30:05         1084       347      4.24      3010         0     
0.00         0      0.00         0      0.00


The 4 billion figure for dentunusd has to be wrong. It was discussed here:

https://listman.redhat.com/archives/ext3-users/2002-September/msg00061.html

Seems it is an unsigned int reducing below zero. This may be a kernel
error, rather than a sysstat error.



Version-Release number of selected component (if applicable):
sysstat-4.0.1-2

How reproducible:
Sometimes

Steps to Reproduce:
1. run sar -v
2. Look for huge figures in the dentunusd column
3.
    

Actual Results:  See pasted sar output in description

Expected Results:  Sane figures

Additional info:

Comment 1 Angus Thomas 2004-03-03 17:07:01 UTC
Machine is a POwerEdge 6650.

lsmod looks like this:

Module                  Size  Used by    Tainted: P
aic7xxx               127264   0 (unused)
ide-tape               61120   0 (autoclean)
lp                      8224   0 (autoclean)
parport                38144   0 (autoclean) [lp]
sr_mod                 17592   0 (autoclean) (unused)
st                     30548   0 (autoclean) (unused)
ide-cd                 35328   0 (autoclean)
cdrom                  35520   0 (autoclean) [sr_mod ide-cd]
sg                     35076   0 (autoclean)
nfs                    92192 112 (autoclean)
lockd                  61184   1 (autoclean) [nfs]
sunrpc                 86128   1 (autoclean) [nfs lockd]
esm                    78310   1
autofs4                12644   9 (autoclean)
bcm5700                73156   1
ext3                   74240   5
jbd                    55176   5 [ext3]
megaraid               28288   7
sd_mod                 13888   6
scsi_mod              125980   6 [aic7xxx sr_mod st sg megaraid sd_mod]


Comment 2 Angus Thomas 2004-03-19 16:19:36 UTC
Not a sar bug necessarily. Sar is failing to handle negative values
reported from /proc.

Specifically:

[thomasaa@xldn0145pap bin}$ cat /proc/sys/fs/dentry-state
337518  -953    45      0       0       0

That negative value for unused dentry objects is illogical so, really,
this is a kernel bug in the 2.4.9-e.25enterprise kernel.




Comment 3 Jason Burks 2004-10-19 18:00:40 UTC
Has a fix for this bug been generated?  I too am running 2.1AS (with
kernel 2.4.9-e.27enterprise) and experiencing the same dentunusd
issue.  After running the system for months at a time with 2.5GB of
RAM, we recently added more memory (now at 6GB total) and have been
having lots of i/o problems since then (and uptime hasn't made it past
4 days without requiring a reboot).  

Process counts get up into the hundreds, most waiting on i/o, with
loadavg over 50 (and I've seen as high at 80 or 90 sustained).  In
looking at sar output to try to help determine the problem, I noticed
this same dentunusd output as reported in this bug.  Any suggestions?

The machine is a Compaq DL-380 with dual 2.8GHz P4s.  I can provide
more details as needed.  It runs an internal SourceForge instance, so
disk i/o is critical (lots and lots of CVS repositories served).  I
have no idea if I'm barking up the wrong tree here or not, but any
help you could provide would be appreciated.

Comment 4 Don Howard 2006-01-28 01:24:12 UTC
It looks like a negative value for dentry_stat->nr_unused would cause a vicious
cycle of dcache pruning (note that all of this is signed int arithmetic):

    int shrink_dcache_memory(int priority, unsigned int gfp_mask)
    {
    ...
        if (priority)
                count = dentry_stat.nr_unused / priority;

        prune_dcache(count);
    ...



    void prune_dcache(int count)
    {
    ...
                dentry_stat.nr_unused--;

                /* Unused dentry with a count? */
                if (atomic_read(&dentry->d_count))
                        BUG();

                prune_one_dentry(dentry);
		if (!--count)
			break;
    ...

If I'm interpreting this correctly, I'd expect that this would bring a machine
with much disk IO activity to its knees.

Comment 5 Jason Burks 2006-01-30 12:57:08 UTC
Yes, my system does get brought to its knees when the disk IO gets to even 
moderate levels.  Is there any plan to provide a fix for this issue?

Jason Burks
GE Global Reseach

Comment 6 Don Howard 2006-01-30 20:28:37 UTC
*** Bug 169635 has been marked as a duplicate of this bug. ***

Comment 7 Don Howard 2006-02-02 01:49:04 UTC
Can folks who are experiencing this bug please test with the following kernel?

http://people.redhat.com/dhoward/bz117400/

Please note that this kernel is provided for debug/testing purposes, and is not
officially supported.

Comment 8 Jason Burks 2006-02-02 14:35:37 UTC
Don,

I'll have to see whether I can figure out a way to test that kernel.  This is 
happening on a production machine and not any of my test machines.  I'm not 
sure I can duplicate the production load on my test boxes, but I'll see what I 
can do.

By the way, I've also noticed a similiar signed/unsigned problem with the 
inode-sz field in my sar output.  It generally corresponds to when my 
dentunusd number gets way up there, but not always.  For instance, my machine 
crashed around 2:15 am or so today, and here is the relevant sar output from 
that time window:

12:00:01 AM dentunusd   file-sz  %file-sz  inode-sz  super-sz %super-sz  dquot-
sz %dquot-sz  rtsig-sz %rtsig-sz
01:00:00 AM      3117       600      0.60      3629         0      
0.00         0      0.00         0      0.00
01:10:00 AM 4294966953       638      0.64       776         0      
0.00         0      0.00         0      0.00
01:30:10 AM 4294966355       405      0.41       660         0      
0.00         0      0.00         1      0.10
01:40:00 AM 4294966461       657      0.66       671         0      
0.00         0      0.00         2      0.20
01:50:00 AM 4294966443       555      0.85       731         0      
0.00         0      0.00         4      0.39
02:00:01 AM 4294966236       569      0.87       653         0      
0.00         0      0.00         4      0.39
02:10:01 AM 4294966199       500      0.76       733         0      
0.00         0      0.00         4      0.39
03:20:00 AM      8000        43      0.07      4731         0      
0.00         0      0.00         0      0.00
03:30:00 AM    157474       114      0.17    134079         0      
0.00         0      0.00         0      0.00
03:40:00 AM     49440       210      0.32     32106         0      
0.00         0      0.00         0      0.00
03:50:00 AM     15827       130      0.20     14193         0      
0.00         0      0.00         0      0.00

As you can see, sar read the counters at 2:10 am, and then not again until 
after 3:00 am.  I had to powercycle the box after waiting for shutdown -r to 
do its magic for nearly 45 minutes.  So I guess, while you're looking into the 
dentry cache values, you might look to see if this is related.

I'll let you know about testing that kernel.

Thanks,
Jason Burks
GE Global Research

Comment 11 Jason Burks 2006-03-29 20:50:43 UTC
Don,

I was finally able to deploy your patched kernel to my production system
(unfortunately the only place I could reproduce the issue).  It's now been up
for 12 days with no hiccups whatsoever.  The system load numbers are much more
stable, old processes waiting on i/o finish up and go away properly, and the
problem seems to have gone away.  With the old kernel, running a full backup of
the system would almost always kill the box.  We've run the backup 12 nights in
a row with the new kernel and haven't had any problems.

Until the system has been up a full month, I won't call the patch a success, but
for now, it sure looks like it is.

Regards,
Jason Burks
GE Global Research

Comment 12 Don Howard 2006-03-29 21:16:23 UTC
Jason -

Thanks for the feedback, I look forward to hearing how the patch fares over the
next few weeks.  I'd also be interested to know if you see any kernel warning
messages like this:

    "prune_dcache(): negative count"

?

Comment 13 Jason Burks 2006-03-30 13:05:15 UTC
Don,

I don't see any kernel warnings like that in anything in /var/log.  Where 
would that warning appear if it showed up?

-Jason

Comment 14 Don Howard 2006-03-30 17:03:00 UTC
Jason -

If they appear, I'd expect to see them in /var/log/messages.

BTW: do you use any third party filesystems?  
What does /proc/modules look like on this system?

Comment 15 Jason Burks 2006-03-30 17:10:20 UTC
Don,

Okay, checked /var/log/messages and don't see any messages about prune_dcache.

--snip--
[root@openge log]# cat /proc/modules
cpqci                  28232   0 (unused)
tg3                    61984   2
lpfcdfc               150016   2
vxportal-3.4.3          3152   0 (autoclean) (unused)
qlog-3.4.3             53632   0 (autoclean) (unused)
vxfs-3.4.3           1054972   1 (autoclean) [vxportal-3.4.3 qlog-3.4.3]
vxspec                  5796   2
vxio                  659168   2
vxdmp                 121176  59
usb-ohci               23488   0 (unused)
usbcore                68896   1 [usb-ohci]
ext3                   71264  10
jbd                    55700  10 [ext3]
lpfc                  224768 112 [lpfcdfc]
cciss                  47648  12
sd_mod                 14048 112
scsi_mod              128508   2 [lpfc cciss sd_mod]
--snip--

As you can see, yes, we use Veritas vxfs for the filesystem on our SAN.  

-Jason

Comment 16 Jason Burks 2006-04-20 18:28:21 UTC
Don,

Just wanted to update you on my production server.  We're currently at 33+ 
days of uptime and counting.  There have been no load problems at all, and I 
have not seen any log messages about negative cache counts.  I think your 
patch has solved my problem.  

-Jaosn

Comment 20 Don Howard 2006-05-09 00:09:06 UTC
An updated test kernel is available at http://people.redhat.com/dhoward/bz117400/

The included patch addresses racey updates to dentry_stat.  Testing and feedback
requested.



Comment 21 Jason Burks 2006-05-09 17:59:38 UTC
Don,

Please provide more details on what this latest patch does.  The previously 
provided test kernel has been running very well for us.  I'd like to know what 
symptoms I should be looking for to see if I'm experiencing the race 
condition.  As my machine is a large production environment and I have not 
been able to reproduce the problems in my test environment, I'm reluctant to 
start playing with other kernels unless I know there is a specific need.

Thanks,
Jason

Comment 22 Don Howard 2006-05-09 19:58:00 UTC
Hi Jason -

The previous patch treated the symptom - it reset the dentry_stat counters when
they became negative. 

The updated kernel contains a patch that arranges for dentry_stat updates to
happen only when the caller is holding dcache_lock.  This should keep the
counters from being decremented below zero.


Comment 23 Don Howard 2006-05-09 20:00:47 UTC
Created attachment 128808 [details]
Ensure that dcache_lock is held when updating dentry_stat.

Comment 25 Don Howard 2006-09-18 16:47:06 UTC
Are folks still seeing negative dentry values? There's been no feedback on the
updated test kernel. I need some positive testing feedback before the patch can
be included in official builds.

Comment 26 Frank Hirtz 2006-09-18 16:50:45 UTC
It's not been something that the client has found to be readily reproducable and
they're loathe to deploy a test kernel widely for something along these lines,
thus I'm not likely able to confirm the fix as such.


Comment 27 Jason Burks 2006-09-18 17:18:55 UTC
I'm sorry I did not reply earlier.  We've been running this latest kernel 
(2.4.9-e.70.bz117400enterprise #1 SMP) since July without any issues.  I only 
have one box where the original error was occuring, so I can't say for sure 
that the problem is gone, only that it is fixed on my one box that was having 
the problem.

Regards,
Jason

Comment 28 Don Howard 2007-01-31 00:36:51 UTC
Interest in this issue appears to have waned.  I'm closing it now, due to lack
of firm feedback.

Eric Sandeen pointed out this patch as a possible fix:
http://linus.bkbits.net:8080/linux-2.5/?PAGE=gnupatch&REV=1.1722.49.23

If anyone is experiencing this issue, please re-open the ticket with details,
and I can provide a test kernel.

Thanks.