Bug 16105

Summary: Negative d_count (-1) under very heavy exec load
Product: [Retired] Red Hat Linux Reporter: hjl
Component: kernelAssignee: Michael K. Johnson <johnsonm>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 7.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
URL: http://boudicca.tux.org/hypermail/linux-kernel/2000week11/0189.html
Whiteboard: Winston gold
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-08-25 02:34:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Red Hat Bugzilla 2000-08-13 18:11:07 UTC
In 2 occations, under very heavy exec load, I got

Negative d_count (-1) for bin/gcc
Unable to handle kernel NULL pointer dereference at virtual address
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<c0138fdd>]
EFLAGS: 00010286
eax: 00000025   ebx: e76a9080   ecx: 00000002   edx: 0000003c
esi: ffffffff   edi: 0805b000   ebp: c1c69a80   esp: f0ed9eec
ds: 0018   es: 0018   ss: 0018

It happened on both beta5 and rc1 on 2 different SMP machines with 2GB RAM
and 4GB RAM, respectively. I don't know how to trigger it reliably.

Could someone who is familiar with SMP and fs please take a look at


It sounds very similar to what happened to me. My SMP machines were
under very heavy exec load.


Comment 1 Red Hat Bugzilla 2000-08-13 18:59:02 UTC
This defect is considered MUST-FIX for Winston Gold-release

Comment 2 Red Hat Bugzilla 2000-08-14 20:13:27 UTC
Do you have a decoded oops log?

Comment 3 Red Hat Bugzilla 2000-08-16 22:02:01 UTC
Al says that it would take ~300K of patches to the VFS to fix this;
he said that most of his VFS "threading" work was actually fixing
races and very little by comparison truly threading.  We can't fix
this for 7.0, unfortunately.

Comment 4 Red Hat Bugzilla 2000-08-20 18:21:30 UTC
If it won't be fixed in 7.0, can you comment out

	*(int *)0 = 0;


        if (count >= 0) { 
                dentry->d_count = count;

        printk(KERN_CRIT "Negative d_count (%d) for %s/%s\n",
        *(int *)0 = 0;  

in dcache () in fs/dcache.c?

Comment 5 Red Hat Bugzilla 2000-08-20 21:00:51 UTC
Possible worse damage... :-(

Comment 6 Red Hat Bugzilla 2000-08-24 21:45:43 UTC
Something seems wrong since 2.2.16-17 and getting worse in 2.2.16-21. I never
saw this problem on the same SMP machine basically doing the same stuff.
Now I am seeing it almost everytime when I do a parallel build of gcc. Maybe
some changes since 2.2.16-17 aggravate the problem.

Comment 7 Red Hat Bugzilla 2000-08-24 21:56:05 UTC
I don't think that we've changed anything that would make this
worse.  Have you re-installed the old kernels you claim were
better to see if the problem gets better?

Comment 8 Red Hat Bugzilla 2000-08-24 22:15:37 UTC
It may take a while to verify if all possible. Another data point,
all those machines have 4 or more harddrives with many patitions
between those harddrives. Looking through the change from 2.2.16-12
to 2.2.16-17, will linux-2.2.16-sard.patch cause the problem on SMP
machines with many harddrives/partitions.

Comment 9 Red Hat Bugzilla 2000-08-24 22:28:02 UTC
No, sard reports information, but shouldn't have any affect on this.

Comment 10 Red Hat Bugzilla 2000-08-25 02:34:19 UTC
You are right. After backing sard out, I got it again. I will see what
I can find out. It will take me a while to get anywhere.

Comment 11 Red Hat Bugzilla 2000-08-25 20:35:53 UTC
After some investigation, it seems that I was running a wrong kernel.
After rebooting the right kernel, I have been runnning my load test
for several hours now. Everything seems ok. So I close it for now.

Sorry for that. Thanks.