Bug 16105 - Negative d_count (-1) under very heavy exec load
Negative d_count (-1) under very heavy exec load
Status: CLOSED NOTABUG
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.0
i386 Linux
high Severity high
: ---
: ---
Assigned To: Michael K. Johnson
http://boudicca.tux.org/hypermail/lin...
Winston gold
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-08-13 14:11 EDT by hjl
Modified: 2008-05-01 11:37 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-08-24 22:34:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Red Hat Bugzilla 2000-08-13 14:11:07 EDT
In 2 occations, under very heavy exec load, I got

Negative d_count (-1) for bin/gcc
Unable to handle kernel NULL pointer dereference at virtual address
00000000
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<c0138fdd>]
EFLAGS: 00010286
eax: 00000025   ebx: e76a9080   ecx: 00000002   edx: 0000003c
esi: ffffffff   edi: 0805b000   ebp: c1c69a80   esp: f0ed9eec
ds: 0018   es: 0018   ss: 0018

It happened on both beta5 and rc1 on 2 different SMP machines with 2GB RAM
and 4GB RAM, respectively. I don't know how to trigger it reliably.

Could someone who is familiar with SMP and fs please take a look at

http://boudicca.tux.org/hypermail/linux-kernel/2000week11/0189.html

It sounds very similar to what happened to me. My SMP machines were
under very heavy exec load.

H.J.
Comment 1 Red Hat Bugzilla 2000-08-13 14:59:02 EDT
This defect is considered MUST-FIX for Winston Gold-release
Comment 2 Red Hat Bugzilla 2000-08-14 16:13:27 EDT
Do you have a decoded oops log?
Comment 3 Red Hat Bugzilla 2000-08-16 18:02:01 EDT
Al says that it would take ~300K of patches to the VFS to fix this;
he said that most of his VFS "threading" work was actually fixing
races and very little by comparison truly threading.  We can't fix
this for 7.0, unfortunately.
Comment 4 Red Hat Bugzilla 2000-08-20 14:21:30 EDT
If it won't be fixed in 7.0, can you comment out

	*(int *)0 = 0;

from

out:
        if (count >= 0) { 
                dentry->d_count = count;
                return;
        }

        printk(KERN_CRIT "Negative d_count (%d) for %s/%s\n",
                count,
                dentry->d_parent->d_name.name,
                dentry->d_name.name);
        *(int *)0 = 0;  

in dcache () in fs/dcache.c?
Comment 5 Red Hat Bugzilla 2000-08-20 17:00:51 EDT
Possible worse damage... :-(
Comment 6 Red Hat Bugzilla 2000-08-24 17:45:43 EDT
Something seems wrong since 2.2.16-17 and getting worse in 2.2.16-21. I never
saw this problem on the same SMP machine basically doing the same stuff.
Now I am seeing it almost everytime when I do a parallel build of gcc. Maybe
some changes since 2.2.16-17 aggravate the problem.
Comment 7 Red Hat Bugzilla 2000-08-24 17:56:05 EDT
I don't think that we've changed anything that would make this
worse.  Have you re-installed the old kernels you claim were
better to see if the problem gets better?
Comment 8 Red Hat Bugzilla 2000-08-24 18:15:37 EDT
It may take a while to verify if all possible. Another data point,
all those machines have 4 or more harddrives with many patitions
between those harddrives. Looking through the change from 2.2.16-12
to 2.2.16-17, will linux-2.2.16-sard.patch cause the problem on SMP
machines with many harddrives/partitions.
Comment 9 Red Hat Bugzilla 2000-08-24 18:28:02 EDT
No, sard reports information, but shouldn't have any affect on this.
Comment 10 Red Hat Bugzilla 2000-08-24 22:34:19 EDT
You are right. After backing sard out, I got it again. I will see what
I can find out. It will take me a while to get anywhere.
Comment 11 Red Hat Bugzilla 2000-08-25 16:35:53 EDT
After some investigation, it seems that I was running a wrong kernel.
After rebooting the right kernel, I have been runnning my load test
for several hours now. Everything seems ok. So I close it for now.

Sorry for that. Thanks.

Note You need to log in before you can comment on or make changes to this bug.