Bug 450194

Summary:

softlockup in dcache

Product:

Red Hat Enterprise Linux 5

Reporter:

Issue Tracker <tao>

Component:

kernel

Assignee:

Josef Bacik <jbacik>

Status:

CLOSED DUPLICATE

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

5.1

CC:

dzickus, esandeen, jlayton, kbaxley, kgraham, staubach, steved, tao, woodard

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-02-24 17:12:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
backport of dcache sb patch.	none
Comment	none

Description Issue Tracker 2008-06-05 20:10:10 UTC

Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2008-06-05 20:10:11 UTC

Soft lockups fo the following form were seen on atlas over the
weekend. So far they all seem to be due to the same job being
canceled due to timeout. Not sure why yet.


2008-04-13 06:58:14 BUG: soft lockup detected on CPU#1!
2008-04-13 06:58:14 
2008-04-13 06:58:14 Call Trace:
2008-04-13 06:58:14  <IRQ>  [<ffffffff800b5df2>] softlockup_tick+0xd5/0xe7
2008-04-13 06:58:14  [<ffffffff800909f3>] tasklet_action+0x62/0xac
2008-04-13 06:58:14  [<ffffffff80093b03>] update_process_times+0x54/0x7a
2008-04-13 06:58:14  [<ffffffff800758d1>] smp_local_timer_interrupt+0x2c/0x61
2008-04-13 06:58:14  [<ffffffff8005d368>] call_softirq+0x1c/0x28
2008-04-13 06:58:14  [<ffffffff80075f99>] smp_apic_timer_interrupt+0x41/0x47
2008-04-13 06:58:14  [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
2008-04-13 06:58:14  <EOI>  [<ffffffff80039423>] generic_drop_inode+0x0/0x15a
2008-04-13 06:58:14  [<ffffffff8004b3cd>] shrink_dcache_parent+0x3f/0xe1
2008-04-13 06:58:14  [<ffffffff8004b3bc>] shrink_dcache_parent+0x2e/0xe1
2008-04-13 06:58:14  [<ffffffff88090418>]
:nfs:nfs_lookup_revalidate+0x40e/0x4c8
2008-04-13 06:58:14  [<ffffffff8809ef59>] :nfs:nfs3_proc_access+0x12f/0x18e
2008-04-13 06:58:14  [<ffffffff800879b8>] find_busiest_group+0x20d/0x621
2008-04-13 06:58:14  [<ffffffff80063ae8>] _spin_unlock_irqrestore+0x8/0x9
2008-04-13 06:58:14  [<ffffffff8006373b>] __up_wakeup+0x35/0x67
2008-04-13 06:58:14  [<ffffffff8802d8f6>] :sunrpc:xprt_timer+0x0/0x7f
2008-04-13 06:58:14  [<ffffffff800122fe>] get_empty_filp+0x97/0x164
2008-04-13 06:58:14  [<ffffffff880e0b6c>] :aufs:aufs_d_revalidate+0x392/0x575
2008-04-13 06:58:14  [<ffffffff8809000a>] :nfs:nfs_lookup_revalidate+0x0/0x4c8
2008-04-13 06:58:14  [<ffffffff8009be75>] autoremove_wake_function+0x0/0x2e
2008-04-13 06:58:14  [<ffffffff8000949e>] __d_lookup+0xb0/0xff
2008-04-13 06:58:14  [<ffffffff8000cbf5>] do_lookup+0x17d/0x1d4
2008-04-13 06:58:14  [<ffffffff80009893>] __link_path_walk+0x3a6/0xf42
2008-04-13 06:58:14  [<ffffffff8000e6dd>] link_path_walk+0x5c/0xe5
2008-04-13 06:58:14  [<ffffffff8009be75>] autoremove_wake_function+0x0/0x2e
2008-04-13 06:58:14  [<ffffffff8000c8fc>] do_path_lookup+0x270/0x2e8
2008-04-13 06:58:14  [<ffffffff80023511>] __path_lookup_intent_open+0x56/0x97
2008-04-13 06:58:14  [<ffffffff8001a911>] open_namei+0x73/0x6ed
2008-04-13 06:58:14  [<ffffffff80065ae4>] do_page_fault+0x506/0x838
2008-04-13 06:58:14  [<ffffffff80027305>] do_filp_open+0x1c/0x38
2008-04-13 06:58:14  [<ffffffff8009be75>] autoremove_wake_function+0x0/0x2e
2008-04-13 06:58:14  [<ffffffff800196cc>] do_sys_open+0x44/0xbe
2008-04-13 06:58:15  [<ffffffff8005c116>] system_call+0x7e/0x83


This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 2 Issue Tracker 2008-06-05 20:10:12 UTC

> 
> Hi Mark,
> Here is a list of nodes that had the soft lock problem from over the  
> weekend on atlas:

> 
> 652: Has not been rebooted.
> 660: Has not been rebooted.
> 679: Has not been rebooted.
> 726: Has not been rebooted.
> 789: Has not been rebooted.
> 1000: Has not been rebooted.
> 1053: Has not been rebooted.
> 1055: Has not been rebooted.
> 1059: Has not been rebooted.

I spot checked these nodes and they do *not* show the signs
of being run out of memory by the user job. In other words,
this is a new kind of hang we haven't seen yet. 

The signature stack trace is that nfs calls are in the soft lockup
stack trace, e.g.:

  <EOI>  [<ffffffff80039423>] generic_drop_inode+0x0/0x15a
  [<ffffffff8004b3cd>] shrink_dcache_parent+0x3f/0xe1
  [<ffffffff8004b3bc>] shrink_dcache_parent+0x2e/0xe1
  [<ffffffff88090418>] :nfs:nfs_lookup_revalidate+0x40e/0x4c8

The nodes have all recovered now, but it seems like they went
"out to lunch" for perhaps as much as 10 minutes, which is why
SLURM marked them down.

Correlating the slurmd logs with theses events shows that
the nodes started hitting soft lockups right after SIGTERM
was sent to the tasks of the job because the job had
reached its timelimit. From the slurmd.log on an example 
node:

 [Apr 13 06:57:45] debug2: Processing RPC: REQUEST_KILL_TIMELIMIT
 [Apr 13 06:57:45] debug2: container signal 15 to job 48437.0
 [Apr 13 07:08:25] Job 48437: timeout: sent SIGTERM to 1 active steps

Note that it appears to have taken over 10 minutes to print the
first message after sending SIGTERM to the job. The time of
these messsaages correlate precisely to the time of the soft
lockup errors:

 2008-04-13 06:57:54 BUG: soft lockup detected on CPU#1!


To determine what jobs were running on these nodes at the time, 
first let's see when these nodes went down:
(according to SLURM):

# atlas1 /root > sinfo -o "%40R %N" -n
atlas[652,660,679,726,789,1000,1053,1055,1059]
 REASON                                   NODELIST
 Not responding [slurm@04/13-07:04:32]    atlas[652,660,679,726,789]
 Not responding [slurm@04/12-01:49:42]    atlas1000
 Not responding [slurm@04/12-01:48:02]    atlas[1053,1055,1059]

Now we can query what jobs were running on those nodes using 
sqlog. The trick is that the nodes might have been marked down
a few minutes after the jobs failed, so a time range should
be provided to the -t option:

# atlas1 /root > sqlog -t 04/13,6am..8am -n atlas[652,660,679,726,789] -o
jobid,user,st,start,end,nnodes,nodes
  JOBID     USER  ST          START            END    N NODELIST
  48433     ikuo  TO 04/12-14:57:23 04/13-06:57:41    8 atlas[724-731]
  48434     ikuo  TO 04/12-14:57:23 04/13-06:57:41    8
atlas[675-679,721-723]
  48436     ikuo  TO 04/12-14:57:23 04/13-06:57:41    8
atlas[657,660-666]
  48437     ikuo  TO 04/12-14:57:23 04/13-06:57:41    8
atlas[648-649,651-656]
  48424     ikuo  TO 04/12-14:57:22 04/13-06:57:41    8 atlas[786-793]

# atlas1 /root > sqlog -t 04/12,1am..2am -n
atlas[1053,1055,1059],atlas1000 -o
jobid,user,st,start,end,nnodes,nodes
  JOBID     USER  ST          START            END    N NODELIST
  46254     ikuo  CA 04/11-09:43:44 04/12-01:44:22    8
atlas[985-987,998-1002]
  46239     ikuo  CA 04/11-09:38:44 04/12-01:39:22    8 atlas[1052-1059]


Note that these are all the same user, but that the jobs were canceled
in the second (actually the first, timeline-wise) case. I verified
that the job cancel operation also sends SIGTERM to the tasks.

So it seems like something this user's job is doing when receiving 
SIGTERM is causing some kind of NFS issue on the nodes. I did log
in to one of the user's existing jobs and verified that they have
a signal handler for SIGTERM installed. It would be useful to 
know what that handler does.

Meanwhile, this issue has been opened as CHAOS BZ 285.

mark



woodard assigned to issue for LLNL (HPC).
Status set to: Waiting on Tech

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 3 Issue Tracker 2008-06-05 20:10:14 UTC

Last night on Atlas nodes 124 191 and 203 hit this bug (According to
Mark).

This also caused bz#284 to happen.  I will update that bug as well with
this
new information.



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 4 Issue Tracker 2008-06-05 20:10:15 UTC

Mark writes:
> I received a copy of the user's job from Adam. But can't reproduce
> the issue as myself.



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 5 Issue Tracker 2008-06-05 20:10:16 UTC

Also from Mark:
Lots more nodes hit this issue this weekend. All the same job

# atlas1 /var/log/slurm > for f in `cat /tmp/nodes` ; do sqlog -E +12am
-S
-09:50:23 -n $f -L1 --no-running --no-header -o
jid,user,start,end,st,nnodes,nodes; done | sort -u
  18260     ikuo 05/02-15:19:01 05/03-07:19:35  TO    8 atlas[247-254]
  18261     ikuo 05/02-15:19:01 05/03-07:19:35  TO    8
atlas[43-44,46,48,51,244-246]
  18262     ikuo 05/02-15:19:49 05/03-07:20:35  TO    8
atlas[942-947,950,967]
  18263     ikuo 05/02-15:19:49 05/03-07:20:35  TO    8 atlas[934-941]
  18264     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8 atlas[926-933]
  18265     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8 atlas[918-925]
  18266     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8
atlas[878-883,916-917]
  18267     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8 atlas[870-877]
  18268     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8 atlas[862-869]
  18269     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8
atlas[589-590,592-596,861]
  18270     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8
atlas[579-580,583-588]
  18271     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8
atlas[569-575,578]
  18272     ikuo 05/02-15:19:50 05/03-07:20:35  TO    8 atlas[561-568]
  18273     ikuo 05/02-15:19:51 05/03-07:20:35  TO    8
atlas[314-320,560]
  18274     ikuo 05/02-15:19:51 05/03-07:20:35  TO    8
atlas[274-280,313]
  18276     ikuo 05/02-15:22:43 05/03-07:23:35  TO    8
atlas[973,982,1124,1132,1140-1143]
  18277     ikuo 05/02-15:22:44 05/03-07:23:35  TO    8
atlas[263-265,968-972]



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 6 Issue Tracker 2008-06-05 20:10:17 UTC

Yesterday Mark was able to reproduce the problem on a chaos kernel which
didn't have aufs enabled.

13:56 <grondo> the fortran runtime is trying to find a message catalog,
and is 
               checking different paths
13:56 <neb> so it is the negative dcache entries.
13:57 <grondo> could be yes
13:57 <neb> that is interesting. I think I read something about NFS and 
            negative dcache entries.
13:57 <grondo> though I don't really understand what negative dcache
entry is
13:57 <grondo> spinning (soft lockup) process is stuck in dcache_lock
13:58 <grondo> some other processes are stuck waiting for kernel_lock
13:58 <grondo> and this one interesting process in 
               nfs_lookup_revalidate->shrink_dcache_parent->nfs_getattr
13:58 <neb> it is basically a notation that something isn't there.
13:58 <grondo> ok
13:59 <neb> a long time ago BSD discovered that caching negatives as well
as 
            positives GREATLY improved system performance on file lookup.
14:00 <neb> IIRC it was like a 200% improvement on some benchmarks
14:00 <grondo> I'm wondering what has the kernel lock
14:02 <grondo> ah
14:02 <grondo> reproduced
14:02 <grondo> heh
14:02 <grondo> filling the dcache seems to have done the trick
14:02 <neb> nice
14:02 <grondo> hm, let me just make sure...
14:03 <grondo> yep
14:03 <grondo> node didn't seem to lock up as long, but still got the
soft 
               lockup warning
14:04 <grondo> now if it is 100% reproducible, I can verify it is aufs by

               running with bind mode
14:05 <grondo> I wonder now if I don't even need the user job to
reproduce
14:06 <grondo> just get 8 threads to simultaneously try to open a
non-existent 
               file after running find / for awhile
14:14 <grondo> nice seems reproducible, as long as run the job for at
least 5 
               minutes
<snip>
16:35 <grondo> hit bz285 soft lockup without aufs
16:35 <grondo> :(
16:35 <neb> using just bind mounts?
16:35 <grondo> yes
16:36 <grondo> I'll get a crashdump on chimp 
16:37 <neb> Send me all the details and I'll play around with it on wopr

            tomorrow to make sure that I can reproduce it on a stock rh
kernel.
16:37 <neb> if I can I'll send it up.
16:37 <grondo> hm, I don't think I can give you the reproducer
16:37 <grondo> I'll have to see if we can reproduce with something
simpler
16:38 <neb> can you boot up a RHEL kernel.
16:41 <grondo> I could try
16:42 <neb> are you heading out in a few?
16:43 <grondo> about 20minutes
16:44 <neb> OK if we pick this up 1st thing in the am?
16:44 <grondo> yeah
16:44 <neb> by the time I get the latest kernel down and then you install
it 
            then you'll have to leave
<snip>
11:01 <neb> I was unclear re: your softlockup reproducer yesterday. Did
you or 
            didn't you need the user's code trigger the problem?
11:01 <grondo> so far I can't reproduce without the user's code



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 7 Issue Tracker 2008-06-05 20:10:18 UTC

Working on seeing if the problem will reproduce on -53.1.14 and -92

Having trouble with the stap script.


This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 8 Issue Tracker 2008-06-05 20:10:19 UTC

10:46 <grondo> ok, I'm honing in on the nfs softlockup problem
10:46 <neb> what have you figured out?
10:46 <grondo> with an stap probe I'm seeing cases where tasks spend up
to ~20s 
               or more in prune_dcache
10:47 <grondo> prune_dcache is being called with huge counts in this case,
like 
               20000+
10:47 <grondo> where the normal case is more like 1 or 2
10:47 <grondo> this wouldn't be a huge issue if just one task was in 
               prune_dcache, but we seem to be hitting a case where 2, 3
or 
               even more tasks are in prune_dcache
10:48 <grondo> Furthermore, when NFS calls shrink_dcache_parent() from 
               nfs_lookup_revalidate(), it has the kernel lock held
10:48 <grondo> so various things, especially NFS, start backing up
10:48 <grondo> which is bad when you have NFS root
10:49 <grondo> Possible solutions
10:50 <grondo> 1. don't let `found' get too big in select_parent().
Return 
               early if we've found 1K or so, so we don't get the case
of 
               20,30,40K
10:51 <grondo> 2. Don't call shrink_dcache_parent() from 
               nfs_lookup_revalidate() with the kernel lock held. 
10:51 <grondo> I worry about #1 because I'm not sure if that function is
used 
               at umount time
10:52 <grondo> and I'm not sure if #2 is enough, since dcache_lock could
still 
               be heavily contended if we have a bunch of tasks in
prune_dcache
10:52 <grondo> Hm, actually #1 might be ok, select_parent() already
returns 
               early if (found && need_resched())
10:53 <grondo> I could add anther test found >= 1024)
10:53 <grondo> I also want to work on a standalone reproducer
10:55 <neb> re #2 would it be possible to drop the kernel lock before
calling 
            shrink_cache_parent() from nfs_lookup_revalidate?
10:55 <neb> and then pick it up again after?
10:56 <grondo> maybe
10:56 <grondo> though nfs may be depending on only one task being in that

               section of code at a time
10:57 <grondo> the only thing it does after, though, is d_drop(dentry)
10:57 <grondo> but it then unlocks the kernel, so I assume it has some
reason 
               for doing the d_drop first
10:58 <neb> my thoughts on that would be to try it and then we'd submit
to the 
            NFS maintainer here and see if he has a problem with it.
10:58 <grondo> however, it could set a flag, shrink_dcache_needed, and
call the 
               shrink_dcache_parent() after the unlock_kernel()
10:59 <grondo> I will try that. I fear it won't be enough though...
11:04 <grondo> hm
11:06 <grondo> Can't call shrink_dcache_parent(dentry) after
d_drop(dentry)
11:07 <grondo> and seemingly can't unlock_kernel() until after
d_drop(dentry)
11:06 <grondo> Can't call shrink_dcache_parent(dentry) after
d_drop(dentry)
11:07 <grondo> and seemingly can't unlock_kernel() until after
d_drop(dentry)
11:08 <grondo> why doesn't NFS use its own locks?
11:10 <neb> last I heard, no one had put together a truely compelling case

            where it mattered for performance and because NFS is so 
            interrelated with all the other kernel subsystems, trying to
make 
            finer grained locks led to problems with lock order inversions

            leading to deadlocks.
11:10 <neb> sorting out those deadlocks were deemed so mindboggeling
difficult 
            that until someone came up with a case where it truely
mattered, 
            they were going to defer.



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 9 Issue Tracker 2008-06-05 20:10:20 UTC

15:01 <grondo> so here's something else with this nfs soft lockup. If this

               stack trace is to be trusted
15:01 <grondo>  #0 [ffff8102f4bab7b8] schedule at ffffffff80061f4e
15:01 <grondo>  #1 [ffff8102f4bab7c0] nfs3_rpc_wrapper at
ffffffff8809ea6c
15:01 <grondo>  #2 [ffff8102f4bab800] nfs3_proc_getattr at
ffffffff8809ed82
15:01 <grondo>  #3 [ffff8102f4bab890] __first_cpu at ffffffff80142602
15:01 <grondo>  #4 [ffff8102f4bab908] shrink_dcache_parent at
ffffffff8004ba37
15:01 <grondo>  #5 [ffff8102f4bab960] nfs_lookup_revalidate at
ffffffff88090437
15:01 <grondo> then this task called schedule() with the dcache_lock held
15:01 <grondo> PID: 7073   TASK: ffff81020437b040  CPU: 5   COMMAND:
"cp2k.popt"
15:01 <grondo> crash> p dcache_lock
15:01 <grondo> dcache_lock = $3 = {
15:01 <grondo>   raw_lock = {
15:01 <grondo>     slock = 0
15:01 <grondo>   }, 
15:01 <grondo>   magic = 3735899821, 
15:01 <grondo>   owner_cpu = 5, 
15:01 <grondo>   owner = 0xffff81020437b040
15:01 <grondo> }
15:04 <neb> OOH yuck that looks like a baddie to me.
15:04 <grondo> can that stack trace be possible?
15:05 <neb> getting from nfs3_rpc_wrapper to schedule() is totally
doable.
15:06 <neb> shrink_dcache_parent to nfs3_proc_getattr Is something that I

            haven't verified but I think it can be done.
15:06 <neb> I don't know about __first_cpu
15:07 <neb> so 5->4 makes sense and 4->2 sounds likely (don't know
anything 
            about 3) 2->1->0 is also likely right
15:08 <neb> I have no clue about #3 at all
15:08 <neb> that seems bogus to me.



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 10 Issue Tracker 2008-06-05 20:10:21 UTC

Date: Wed, 07 May 2008 16:09:26 -0700
From: Ben Woodard <woodard>
User-Agent: Thunderbird 2.0.0.12 (X11/20080306)
MIME-Version: 1.0
To: Steve Dickson <steved>
CC: Kent Baxley <kbaxley>, Jeffrey Layton <jlayton>,

 Peter Staubach <staubach>
Subject: Re: RHEL5 softlockups in dcache
References: <20080507163254.260b900e.redhat.com>
In-Reply-To: <20080507163254.260b900e.redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

We are getting a bit closer to the problem I think. But I'm not well 
versed in the finer points of the dcache or NFS and so this is kind of 
mind boggling to me. Here is what we are seeing:

#0 [ffff8102f4bab7b8] schedule at ffffffff80061f4e
#1 [ffff8102f4bab7c0] nfs3_rpc_wrapper at ffffffff8809ea6c
#2 [ffff8102f4bab800] nfs3_proc_getattr at ffffffff8809ed82
#3 [ffff8102f4bab890] __first_cpu at ffffffff80142602
#4 [ffff8102f4bab908] shrink_dcache_parent at ffffffff8004ba37
#5 [ffff8102f4bab960] nfs_lookup_revalidate at ffffffff88090437

then this task called schedule() with the dcache_lock held
PID: 7073   TASK: ffff81020437b040  CPU: 5   COMMAND: "cp2k.popt"
crash> p dcache_lock
dcache_lock = $3 = {
   raw_lock = {
     slock = 0
   },
  magic = 3735899821,
  owner_cpu = 5,
  owner = 0xffff81020437b040
}

I don't know what the heck that __first_cpu is doing in there I think it

must be something wrong with the backtraces.

So far we have only found one way to reproduce the problem and it isn't 
transportable. We run find on / to heavily pre-populate the dcache with 
dentries that end up being unused. Then we run a program that triggers 
the problem. The problem shows up intermittently in the user's 
environment but if we run the find, we can reliably reproduce it. The 
program seems to have 8 synchronized tasks that are all probing a path 
for a file. We think that they look for the file early on and then this 
preloads the cache with negative dentries. Then later on after the local 
cache has timed out, all 8 processes respond to a SIGTERM and so they 
all start searching for the file again and that is what causes the 
softlockup to appear.

We think what is going on is that we have a big dcache with lots of 
unused entries and some timed out negative dcache entries from the 
previous search through the path looking for the file. Then, all at once 
all the CPUs in the system try to look for the same file which isn't 
there in several places. This causes multiple tasks to try to shrink the 
dcache at the same time, getting hung up on the dcache_lock and then we 
get a softlockup when one of the CPUs tries to revalidate the negative 
dentry and it has to go out on the wire. When it goes out on the wire it 
goes to sleep holding the dcache_lock.

Honesty, the finer points of dcache locking are beyond me which is why 
I'm was looking for some help from you guys. Unfortunately, our only 
reproducer at the moment an export controlled code that can't leave the 
lab. Once we have a transportable reproducer. I should be able to pass 
this up it to you guys in a more conventional format.

So in summary the key points seem to be:

   1. dcache is large, 200K-400K entries, a lot of which are unused
   2. user app hits negative dentries which causes shrink_dcache_parent
      to get triggered in nfs_lookup_revalidate. We think that this
      requires
         1. that the path has been looked up previously
         2. that the dentries have to be revalidated, i.e. you hit the
            mount points cache timeout but we are not sure
   3. For some reason, not sure why, select_parent() is returning very
      large values for `found' which is passed as the count to
      prune_dcache() literally multiple thousands. We think that this
      might be because there are a large number of unused dentries
      expiring under the mount point or something.
   4. It seems like maybe the dcache_lock is heavily contended
   5. plus we're doing all this with the kernel_lock held, which backs
      up the rest of nfs
   6. We tried limiting the value of `found' returned by select_parent()
      to 1024 or less, but that actually didn't seem to help much

-ben

Kent Baxley wrote:
> Hey, guys.
>
> I know this is way, way, out of the normal process, but, Ben and the
guys at LLNL have run into a weird problem, and they're looking for a
nudge in the right direction.  For those that have Issue Tracker access,
the ticket can be found here:
>
>
https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=178773
>
> It appears that they're hitting some sort of softlockup in dcache.  The
systems in question are all running RHEL5-based kernels. Over the weekend
several of their cluster nodes went "out to lunch" for approximately 10
minutes.  The nodes did eventually recover, and the common warning message
and call trace on all of the systems so far seems to have been:
>
> BUG: soft lockup detected on CPU#1!
>
> ...
>
> <EOI>  [<ffffffff80039423>] generic_drop_inode+0x0/0x15a
> [<ffffffff8004b3cd>] shrink_dcache_parent+0x3f/0xe1
> [<ffffffff8004b3bc>] shrink_dcache_parent+0x2e/0xe1
> [<ffffffff88090418>] :nfs:nfs_lookup_revalidate+0x40e/0x4c8
>
>
> (The full trace is logged in the summary of the Issue Tracker ticket).
>
> I believe Ben and LLNL are getting close to isolating the problem, but,
at this point, they're getting into some unfamiliar territory and are
looking for some tips on how to proceed.  The last post to the Issue
Tracker ticket details their most current findings and problems with
regard to where to go next.
>
> Ben is also working on getting the folks at LLNL to try and reproduce
the problem on stock -53.1.14 and -92 RHEL5 kernels. 
>
> Anyway, if you guys have a second to look at the problem and can provide
Ben with some info that can keep them going in the right direction, it
would be greatly appreciated.
>
> Thanks for your time!
>
>
>   

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 11 Issue Tracker 2008-06-05 20:10:22 UTC

File uploaded: softlock-dcache.C

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 132211

Comment 12 Issue Tracker 2008-06-05 20:10:23 UTC

I've been trying to come up with a reproducer. So far I've failed.
However, this is what I've come up with so far trying to capture what we
think is going on with the problem.


This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 13 Issue Tracker 2008-06-05 20:10:24 UTC

From: Steve Dickson <SteveD>
To: Ben Woodard <woodard>
CC: Kent Baxley <kbaxley>, Jeffrey Layton <jlayton>,
Peter Staubach <staubach>
Subject: Re: RHEL5 softlockups in dcache
Date: Thu, 08 May 2008 07:24:22 -0400
User-Agent: Thunderbird 2.0.0.12 (X11/20080226)



Ben Woodard wrote:
> We are getting a bit closer to the problem I think. But I'm not well
> versed in the finer points of the dcache or NFS and so this is kind of
> mind boggling to me. Here is what we are seeing:
> 
> #0 [ffff8102f4bab7b8] schedule at ffffffff80061f4e
> #1 [ffff8102f4bab7c0] nfs3_rpc_wrapper at ffffffff8809ea6c
> #2 [ffff8102f4bab800] nfs3_proc_getattr at ffffffff8809ed82
> #3 [ffff8102f4bab890] __first_cpu at ffffffff80142602
> #4 [ffff8102f4bab908] shrink_dcache_parent at ffffffff8004ba37
> #5 [ffff8102f4bab960] nfs_lookup_revalidate at ffffffff88090437  
It appears the reason shrink_dcache_parent() is being called is 
do to something failing (i.e. goto out_bad). Do we know what is
failing and why? That failure (or error path) might be that real
cause of this lockup... 

steved.


This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 14 Issue Tracker 2008-06-05 20:10:25 UTC

From: Jeff Layton <jlayton>
To: Ben Woodard <woodard>
Cc: Steve Dickson <steved>, Kent Baxley <kbaxley>,
Peter Staubach <staubach>
Subject: Re: RHEL5 softlockups in dcache
Date: Thu, 8 May 2008 07:14:49 -0400
X-Mailer: Claws Mail 3.3.1 (GTK+ 2.12.9; i386-redhat-linux-gnu)

On Wed, 07 May 2008 16:09:26 -0700
Ben Woodard <woodard> wrote:

> We are getting a bit closer to the problem I think. But I'm not well 
> versed in the finer points of the dcache or NFS and so this is kind of 
> mind boggling to me. Here is what we are seeing:
> 
> #0 [ffff8102f4bab7b8] schedule at ffffffff80061f4e
> #1 [ffff8102f4bab7c0] nfs3_rpc_wrapper at ffffffff8809ea6c
> #2 [ffff8102f4bab800] nfs3_proc_getattr at ffffffff8809ed82
> #3 [ffff8102f4bab890] __first_cpu at ffffffff80142602
> #4 [ffff8102f4bab908] shrink_dcache_parent at ffffffff8004ba37
> #5 [ffff8102f4bab960] nfs_lookup_revalidate at ffffffff88090437
> 
> then this task called schedule() with the dcache_lock held
> PID: 7073   TASK: ffff81020437b040  CPU: 5   COMMAND: "cp2k.popt"  
> crash> p dcache_lock  
> dcache_lock = $3 = {
>    raw_lock = {
>      slock = 0
>    },
>   magic = 3735899821,
>   owner_cpu = 5,
>   owner = 0xffff81020437b040
> }
> 
> I don't know what the heck that __first_cpu is doing in there I think
it 
> must be something wrong with the backtraces.
> 
> So far we have only found one way to reproduce the problem and it isn't

> transportable. We run find on / to heavily pre-populate the dcache with

> dentries that end up being unused. Then we run a program that triggers 
> the problem. The problem shows up intermittently in the user's 
> environment but if we run the find, we can reliably reproduce it. The 
> program seems to have 8 synchronized tasks that are all probing a path 
> for a file. We think that they look for the file early on and then this

> preloads the cache with negative dentries. Then later on after the local

> cache has timed out, all 8 processes respond to a SIGTERM and so they 
> all start searching for the file again and that is what causes the 
> softlockup to appear.
> 
> We think what is going on is that we have a big dcache with lots of 
> unused entries and some timed out negative dcache entries from the 
> previous search through the path looking for the file. Then, all at once

> all the CPUs in the system try to look for the same file which isn't 
> there in several places. This causes multiple tasks to try to shrink the

> dcache at the same time, getting hung up on the dcache_lock and then we

> get a softlockup when one of the CPUs tries to revalidate the negative 
> dentry and it has to go out on the wire. When it goes out on the wire it

> goes to sleep holding the dcache_lock.
> 
> Honesty, the finer points of dcache locking are beyond me which is why 
> I'm was looking for some help from you guys. Unfortunately, our only 
> reproducer at the moment an export controlled code that can't leave the

> lab. Once we have a transportable reproducer. I should be able to pass 
> this up it to you guys in a more conventional format.
> 
> So in summary the key points seem to be:
> 
>    1. dcache is large, 200K-400K entries, a lot of which are unused
>    2. user app hits negative dentries which causes shrink_dcache_parent
>       to get triggered in nfs_lookup_revalidate. We think that this
>       requires
>          1. that the path has been looked up previously
>          2. that the dentries have to be revalidated, i.e. you hit the
>             mount points cache timeout but we are not sure
>    3. For some reason, not sure why, select_parent() is returning very
>       large values for `found' which is passed as the count to
>       prune_dcache() literally multiple thousands. We think that this
>       might be because there are a large number of unused dentries
>       expiring under the mount point or something.
>    4. It seems like maybe the dcache_lock is heavily contended
>    5. plus we're doing all this with the kernel_lock held, which backs
>       up the rest of nfs
>    6. We tried limiting the value of `found' returned by
select_parent()
>       to 1024 or less, but that actually didn't seem to help much
> 
> -ben
>   

Sounds ugly. I don't see any reason right offhand why we'd need to do a
getattr when we're pruning the dcache, but dcache handing is very
obtuse stuff. The locking there is some of the most fragile stuff in the
kernel. Even very small, subtle changes can have large effects...and
yes, the dcache_lock is very busy.

It's very good that you have a reproducer. I'd definitely recommend
testing this on the latest and greatest RHEL5 kernel so that we can
rule out existing changes (it seems like Eric Sandeen did some dcache
changes a while back).

Exactly what kernel are they running here? I don't see any mention of
it in the IT.

-- 
Jeff Layton <jlayton>



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 15 Issue Tracker 2008-06-05 20:10:26 UTC

09:39 <grondo> If that were the case, then the job container may facilitate
the 
               reproducer because 1. it signals all tasks in the job
container 
               at once and/or 2. it seems to send the signal to every task
in 
               the container, this might mean threads get the signal too.
I 
               should look at how kill_pg() works, maybe that is a
misfeature 
               of the job container thing
10:00 <neb> meeting
10:00 <grondo> arggh
10:02 <neb> it isn't thaaat bad -- kent is a nice guy and keeps the
meetings 
            short.
10:02 <grondo> so if your reproducer still doesn't work, can you try
putting 
               your children into a process group and use killpg()?
10:02 <grondo> also, allow the number of total children to be tunable
10:03 <grondo> and try with more tasks
10:04 <grondo> basically, reproduces much easier when using the job
containers
10:11 <neb> does "much easier" mean haven't been able to reproduce it
without 
            process groups yet?
10:12 <grondo> yes, but I only tried 4-5 times
10:12 <grondo> but reproduced using job containers the second time
10:12 <neb> do you have some sample code on how to add tasks to process
groups.
10:12 <neb> otherwise I have to look it up.
10:12 <neb> I don't know that part of posix programming off the top of my
head.
10:12 <grondo> man 2 setpgid



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 16 Issue Tracker 2008-06-05 20:10:27 UTC

I was able to reproduce the problem on a 92.el5 kernel:

2008-05-08 11:11:13 BUG: soft lockup - CPU#3 stuck for 10s!
[cp2k.popt:6903]
2008-05-08 11:11:13 CPU 3:
2008-05-08 11:11:13 Modules linked in: job(U) w83627hf hwmon_vid i2c_isa
ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs ib_umad ib_mthca
ib_cm ib_sa
 ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath dm_mod
video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac
parport_pc lp 
parport floppy sata_nv i2c_nforce2 libata pcspkr i2c_core ohci_hcd k8temp
scsi_mod hwmon shpchp k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache
sunrpc e10
00
2008-05-08 11:11:13 Pid: 6903, comm: cp2k.popt Tainted: G     
2.6.18-92.el5 #1
2008-05-08 11:11:13 RIP: 0010:[<ffffffff8002e7ea>]  [<ffffffff8002e7ea>]
prune_dcache+0x40/0x149
2008-05-08 11:11:13 RSP: 0018:ffff8101f7649a38  EFLAGS: 00000287
2008-05-08 11:11:13 RAX: 000000000000328e RBX: ffff8102e1378b70 RCX:
0000000000000067
2008-05-08 11:11:13 RDX: ffff8102e1378bb0 RSI: ffffffff802f4080 RDI:
ffffffff803c5480
2008-05-08 11:11:13 RBP: 00000000000006f3 R08: ffff81010384b8e0 R09:
ffff8100a1ed88e8
2008-05-08 11:11:13 R10: 0000000000000202 R11: ffffffff80039784 R12:
ffff81010384b8e0
2008-05-08 11:11:13 R13: 0000000000000001 R14: ffff81010384b8e0 R15:
ffff810099265810
2008-05-08 11:11:13 FS:  0000000040100940(0063) GS:ffff8101039697c0(0000)
knlGS:0000000000000000
2008-05-08 11:11:13 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2008-05-08 11:11:13 CR2: 00002aaaab56b0f4 CR3: 00000001e6b8f000 CR4:
00000000000006e0
2008-05-08 11:11:13 
2008-05-08 11:11:13 Call Trace:
2008-05-08 11:11:13  [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149
2008-05-08 11:11:13  [<ffffffff8004cd30>] shrink_dcache_parent+0x3f/0xe1
2008-05-08 11:11:13  [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1
2008-05-08 11:11:13  [<ffffffff8808f41b>]
:nfs:nfs_lookup_revalidate+0x35c/0x416
2008-05-08 11:11:13  [<ffffffff80089241>] __wake_up_common+0x3e/0x68
2008-05-08 11:11:13  [<ffffffff80089241>] __wake_up_common+0x3e/0x68
2008-05-08 11:11:13  [<ffffffff8002e254>] __wake_up+0x38/0x4f
2008-05-08 11:11:13  [<ffffffff800646ff>] __up_wakeup+0x35/0x67
2008-05-08 11:11:13  [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3
2008-05-08 11:11:13  [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f
2008-05-08 11:11:13  [<ffffffff80009523>] __d_lookup+0xb0/0xff
2008-05-08 11:11:13  [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4
2008-05-08 11:11:13  [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42
2008-05-08 11:11:13  [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
2008-05-08 11:11:13  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 11:11:13  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 11:11:13  [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
2008-05-08 11:11:13  [<ffffffff800234db>]
__path_lookup_intent_open+0x56/0x97
2008-05-08 11:11:13  [<ffffffff8001a954>] open_namei+0x73/0x6d5
2008-05-08 11:11:13  [<ffffffff80017542>] cache_grow+0x356/0x395
2008-05-08 11:11:13  [<ffffffff8002732a>] do_filp_open+0x1c/0x38
2008-05-08 11:11:13  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 11:11:13  [<ffffffff80019720>] do_sys_open+0x44/0xbe
2008-05-08 11:11:13  [<ffffffff8005d116>] system_call+0x7e/0x83
2008-05-08 11:11:13 
2008-05-08 11:11:22 BUG: soft lockup - CPU#0 stuck for 10s! [slurmd:5993]
2008-05-08 11:11:22 CPU 0:
2008-05-08 11:11:22 Modules linked in: job(U) w83627hf hwmon_vid i2c_isa
ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs ib_umad ib_mthca
ib_cm ib_sa
 ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath dm_mod
video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac
parport_pc lp 
parport floppy sata_nv i2c_nforce2 libata pcspkr i2c_core ohci_hcd k8temp
scsi_mod hwmon shpchp k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache
sunrpc e10
00
2008-05-08 11:11:22 Pid: 5993, comm: slurmd Tainted: G      2.6.18-92.el5
#1
2008-05-08 11:11:22 RIP: 0010:[<ffffffff80064b54>]  [<ffffffff80064b54>]
.text.lock.spinlock+0x2/0x30
2008-05-08 11:11:22 RSP: 0018:ffff8101ffc8be10  EFLAGS: 00000286
2008-05-08 11:11:22 RAX: ffff8102ffe0a800 RBX: ffff8101ffc8be68 RCX:
0000000000000000
2008-05-08 11:11:22 RDX: ffff81009e530204 RSI: ffff8101ffc8be4f RDI:
ffffffff803c5480
2008-05-08 11:11:22 RBP: 00000000802f2630 R08: ffff8100cefdec00 R09:
ffff81020385a000
2008-05-08 11:11:22 R10: 0000000000000000 R11: 000000d000000028 R12:
ffff81000b4f6400
2008-05-08 11:11:22 R13: ffff8101ffc8bd88 R14: ffff8101ffc8be48 R15:
0000003000000018
2008-05-08 11:11:22 FS:  00002aaaab991c00(0000) GS:ffffffff8039e000(0000)
knlGS:0000000000000000
2008-05-08 11:11:22 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2008-05-08 11:11:22 CR2: 0000000040302da8 CR3: 00000001ff066000 CR4:
00000000000006e0
2008-05-08 11:11:22 
2008-05-08 11:11:22 Call Trace:
2008-05-08 11:11:22  [<ffffffff80022655>] d_alloc+0x166/0x1a9
2008-05-08 11:11:22  [<ffffffff8020d0d6>] sock_attach_fd+0x56/0xfd
2008-05-08 11:11:22  [<ffffffff8020d92f>] sys_accept+0xd5/0x1ea
2008-05-08 11:11:22  [<ffffffff8006258f>] __sched_text_start+0x17f/0xaeb
2008-05-08 11:11:22  [<ffffffff8005d116>] system_call+0x7e/0x83
2008-05-08 11:11:22 
2008-05-08 11:11:23 BUG: soft lockup - CPU#3 stuck for 10s!
[cp2k.popt:6903]
2008-05-08 11:11:23 CPU 3:
2008-05-08 11:11:23 Modules linked in: job(U) w83627hf hwmon_vid i2c_isa
ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs ib_umad ib_mthca
ib_cm ib_sa
 ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath dm_mod
video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac
parport_pc lp 
parport floppy sata_nv i2c_nforce2 libata pcspkr i2c_core ohci_hcd k8temp
scsi_mod hwmon shpchp k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache
sunrpc e10
00
2008-05-08 11:11:23 Pid: 6903, comm: cp2k.popt Tainted: G     
2.6.18-92.el5 #1
2008-05-08 11:11:23 RIP: 0010:[<ffffffff8002e7ff>]  [<ffffffff8002e7ff>]
prune_dcache+0x55/0x149
2008-05-08 11:11:23 RSP: 0018:ffff8101f7649a38  EFLAGS: 00000287
2008-05-08 11:11:23 RAX: 0000000000000dba RBX: ffff8102e1f01ed0 RCX:
0000000000000067
2008-05-08 11:11:23 RDX: ffff8102e1f01f10 RSI: ffffffff802f4080 RDI:
ffffffff803c5480
2008-05-08 11:11:23 RBP: ffff8100a1ed88e8 R08: ffff81010384b8e0 R09:
ffff8100a1ed88e8
2008-05-08 11:11:23 R10: 0000000000000202 R11: ffffffff80039784 R12:
0000000000000202
2008-05-08 11:11:23 R13: ffffffff80039784 R14: ffff8101f7649a28 R15:
ffff810099265810
2008-05-08 11:11:23 FS:  0000000040100940(0063) GS:ffff8101039697c0(0000)
knlGS:0000000000000000
2008-05-08 11:11:23 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2008-05-08 11:11:23 CR2: 00002aaaab56b0f4 CR3: 00000001e6b8f000 CR4:
00000000000006e0
2008-05-08 11:11:23 
2008-05-08 11:11:23 Call Trace:
2008-05-08 11:11:23  [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149
2008-05-08 11:11:23  [<ffffffff8004cd30>] shrink_dcache_parent+0x3f/0xe1
2008-05-08 11:11:23  [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1
2008-05-08 11:11:23  [<ffffffff8808f41b>]
:nfs:nfs_lookup_revalidate+0x35c/0x416
2008-05-08 11:11:23  [<ffffffff80089241>] __wake_up_common+0x3e/0x68
2008-05-08 11:11:23  [<ffffffff80089241>] __wake_up_common+0x3e/0x68
2008-05-08 11:11:23  [<ffffffff8002e254>] __wake_up+0x38/0x4f
2008-05-08 11:11:23  [<ffffffff800646ff>] __up_wakeup+0x35/0x67
2008-05-08 11:11:23  [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3
2008-05-08 11:11:23  [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f
2008-05-08 11:11:23  [<ffffffff80009523>] __d_lookup+0xb0/0xff
2008-05-08 11:11:23  [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4
2008-05-08 11:11:23  [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42
2008-05-08 11:11:23  [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
2008-05-08 11:11:23  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 11:11:23  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 11:11:23  [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
2008-05-08 11:11:23  [<ffffffff800234db>]
__path_lookup_intent_open+0x56/0x97
2008-05-08 11:11:23  [<ffffffff8001a954>] open_namei+0x73/0x6d5
2008-05-08 11:11:23  [<ffffffff80017542>] cache_grow+0x356/0x395
2008-05-08 11:11:23  [<ffffffff8002732a>] do_filp_open+0x1c/0x38
2008-05-08 11:11:23  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 11:11:23  [<ffffffff80019720>] do_sys_open+0x44/0xbe
2008-05-08 11:11:23  [<ffffffff8005d116>] system_call+0x7e/0x83
2008-05-08 11:11:23 



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 17 Issue Tracker 2008-06-05 20:10:28 UTC

I was unable to reproduce this problem again after hitting it once on the
nodes.
I rebooted and tried again and hit the problem on the second try.
I tried to crash the node, but kdump is not working on our HW in
92.el5 (might be a kexec-tools problem now that I think about it),
but I did get a sysrq-w:

2008-05-08 13:35:45 BUG: soft lockup - CPU#5 stuck for 10s!
[cp2k.popt:6403]
2008-05-08 13:35:45 CPU 5:
2008-05-08 13:35:45 Modules linked in: stap_6120(U) job(U) w83627hf
hwmon_vid i2c_isa ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs
ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api
dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery
asus_acpi acpi_memhotplug ac parport_pc lp parport floppy i2c_nforce2
ohci_hcd pcspkr i2c_core sata_nv k8temp libata shpchp hwmon scsi_mod
k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache sunrpc e1000
2008-05-08 13:35:45 Pid: 6403, comm: cp2k.popt Tainted: G     
2.6.18-92.el5 #1
2008-05-08 13:35:45 RIP: 0010:[<ffffffff8002e7ff>]  [<ffffffff8002e7ff>]
prune_dcache+0x55/0x149
2008-05-08 13:35:45 RSP: 0018:ffff8102eb3f3a38  EFLAGS: 00000206
2008-05-08 13:35:45 RAX: 0000000000002a6c RBX: ffff8102e73514b0 RCX:
0000000000000001
2008-05-08 13:35:45 RDX: ffff8102e73514f0 RSI: ffffffff802f4080 RDI:
ffffffff803c5480
2008-05-08 13:35:45 RBP: ffff8101c9ba3300 R08: ffff81020384b8e0 R09:
ffff8101c9ba3300
2008-05-08 13:35:45 R10: 0000000000000246 R11: ffffffff80039784 R12:
0000000000000246
2008-05-08 13:35:45 R13: ffffffff80039784 R14: ffff8102eb3f3a28 R15:
ffff8101c9910300
2008-05-08 13:35:45 FS:  0000000040201940(0063) GS:ffff8102039c2140(0000)
knlGS:0000000000000000
2008-05-08 13:35:45 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2008-05-08 13:35:45 CR2: 00002aaaab56b0f4 CR3: 00000002fd458000 CR4:
00000000000006e0
2008-05-08 13:35:45 
2008-05-08 13:35:45 Call Trace:
2008-05-08 13:35:45  [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149
2008-05-08 13:35:45  [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1
2008-05-08 13:35:45  [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1
2008-05-08 13:35:45  [<ffffffff8808f41b>]
:nfs:nfs_lookup_revalidate+0x35c/0x416
2008-05-08 13:35:45  [<ffffffff80089241>] __wake_up_common+0x3e/0x68
2008-05-08 13:35:45  [<ffffffff80089241>] __wake_up_common+0x3e/0x68
2008-05-08 13:35:45  [<ffffffff800646ff>] __up_wakeup+0x35/0x67
2008-05-08 13:35:45  [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3
2008-05-08 13:35:45  [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f
2008-05-08 13:35:45  [<ffffffff80009523>] __d_lookup+0xb0/0xff
2008-05-08 13:35:45  [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4
2008-05-08 13:35:45  [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42
2008-05-08 13:35:45  [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
2008-05-08 13:35:45  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 13:35:45  [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
2008-05-08 13:35:45  [<ffffffff800234db>]
__path_lookup_intent_open+0x56/0x97
2008-05-08 13:35:45  [<ffffffff8001a954>] open_namei+0x73/0x6d5
2008-05-08 13:35:45  [<ffffffff80017542>] cache_grow+0x356/0x395
2008-05-08 13:35:45  [<ffffffff8002732a>] do_filp_open+0x1c/0x38
2008-05-08 13:35:45  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 13:35:45  [<ffffffff80019720>] do_sys_open+0x44/0xbe
2008-05-08 13:35:45  [<ffffffff8005d116>] system_call+0x7e/0x83
2008-05-08 13:35:45 
2008-05-08 13:35:55 BUG: soft lockup - CPU#5 stuck for 10s!
[cp2k.popt:6403]
2008-05-08 13:35:55 CPU 5:
2008-05-08 13:35:55 Modules linked in: stap_6120(U) job(U) w83627hf
hwmon_vid i2c_isa ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs
ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api
dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery
asus_acpi acpi_memhotplug ac parport_pc lp parport floppy i2c_nforce2
ohci_hcd pcspkr i2c_core sata_nv k8temp libata shpchp hwmon scsi_mod
k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache sunrpc e1000
2008-05-08 13:35:55 Pid: 6403, comm: cp2k.popt Tainted: G     
2.6.18-92.el5 #1
2008-05-08 13:35:55 RIP: 0010:[<ffffffff8002e7ff>]  [<ffffffff8002e7ff>]
prune_dcache+0x55/0x149
2008-05-08 13:35:55 RSP: 0018:ffff8102eb3f3a38  EFLAGS: 00000206
2008-05-08 13:35:55 RAX: 00000000000003cd RBX: ffff8100a6bd04b0 RCX:
0000000000000001
2008-05-08 13:35:55 RDX: ffff8100a6bd04f0 RSI: ffffffff802f4080 RDI:
ffffffff803c5480
2008-05-08 13:35:55 RBP: ffff8101c9ba3300 R08: ffff81020384b8e0 R09:
ffff8101c9ba3300
2008-05-08 13:35:55 R10: 0000000000000246 R11: ffffffff80039784 R12:
0000000000000246
2008-05-08 13:35:55 R13: ffffffff80039784 R14: ffff8102eb3f3a28 R15:
ffff8101c9910300
2008-05-08 13:35:55 FS:  0000000040201940(0063) GS:ffff8102039c2140(0000)
knlGS:0000000000000000
2008-05-08 13:35:55 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2008-05-08 13:35:55 CR2: 00002aaaab56b0f4 CR3: 00000002fd458000 CR4:
00000000000006e0
2008-05-08 13:35:55 
2008-05-08 13:35:55 Call Trace:
2008-05-08 13:35:55  [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149
2008-05-08 13:35:55  [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1
2008-05-08 13:35:55  [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1
2008-05-08 13:35:55  [<ffffffff8808f41b>]
:nfs:nfs_lookup_revalidate+0x35c/0x416
2008-05-08 13:35:55  [<ffffffff80089241>] __wake_up_common+0x3e/0x68
2008-05-08 13:35:55  [<ffffffff80089241>] __wake_up_common+0x3e/0x68
2008-05-08 13:35:55  [<ffffffff800646ff>] __up_wakeup+0x35/0x67
2008-05-08 13:35:55  [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3
2008-05-08 13:35:55  [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f
2008-05-08 13:35:55  [<ffffffff80009523>] __d_lookup+0xb0/0xff
2008-05-08 13:35:55  [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4
2008-05-08 13:35:55  [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42
2008-05-08 13:35:55  [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
2008-05-08 13:35:55  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 13:35:55  [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
2008-05-08 13:35:55  [<ffffffff800234db>]
__path_lookup_intent_open+0x56/0x97
2008-05-08 13:35:55  [<ffffffff8001a954>] open_namei+0x73/0x6d5
2008-05-08 13:35:55  [<ffffffff80017542>] cache_grow+0x356/0x395
2008-05-08 13:35:55  [<ffffffff8002732a>] do_filp_open+0x1c/0x38
2008-05-08 13:35:55  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
2008-05-08 13:35:55  [<ffffffff80019720>] do_sys_open+0x44/0xbe
2008-05-08 13:35:55  [<ffffffff8005d116>] system_call+0x7e/0x83
2008-05-08 13:35:55 

chimp2@root:conman chimp4

<ConMan> Connection to console [chimp4] opened.
SysRq : Show CPUs
CPU2:
 ffff810103933f48 0000000000000000 ffff8101fd6836e0 ffffffff8019cf20
 0000000000000000 ffff810103841400 ffff8100cdceb790 ffffffff8019cf4f
 ffffffff8005e2fc ffffffff80022dba ffff8101fd6c77a0 0000000000000246
Call Trace:
 <IRQ>  [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff8019cf4f>] showacpu+0x2f/0x3b
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75
 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80062569>] __sched_text_start+0x159/0xaeb
 [<ffffffff800624fe>] __sched_text_start+0xee/0xaeb
 [<ffffffff80064993>] __down+0xc3/0xd8
 [<ffffffff8008ac03>] default_wake_function+0x0/0xe
 [<ffffffff8002e254>] __wake_up+0x38/0x4f
 [<ffffffff80064651>] __down_failed+0x35/0x3a
 [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff80064c5f>] .text.lock.kernel_lock+0x1d/0x1f
 [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44
 [<ffffffff80062fa7>] thread_return+0xac/0xdf
 [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44
 [<ffffffff80063015>] cond_resched+0x3b/0x42
 [<ffffffff8000d043>] dput+0x1d/0x114
 [<ffffffff800e2014>] prune_one_dentry+0x66/0x76
 [<ffffffff8002e88a>] prune_dcache+0xe0/0x149
 [<ffffffff8004cd38>] shrink_dcache_parent+0x47/0xe1
 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1
 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416
 [<ffffffff8809d931>] :nfs:nfs3_proc_access+0x12f/0x18e
 [<ffffffff80089241>] __wake_up_common+0x3e/0x68
 [<ffffffff80089241>] __wake_up_common+0x3e/0x68
 [<ffffffff800646ff>] __up_wakeup+0x35/0x67
 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f
 [<ffffffff80009523>] __d_lookup+0xb0/0xff
 [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4
 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42
 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97
 [<ffffffff8001a954>] open_namei+0x73/0x6d5
 [<ffffffff80066852>] do_page_fault+0x4fe/0x830
 [<ffffffff8002732a>] do_filp_open+0x1c/0x38
 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80019720>] do_sys_open+0x44/0xbe
 [<ffffffff8005d116>] system_call+0x7e/0x83

CPU1:
 ffff8103038cbdc8 0000000000000000 0000000000000001 0000000000000000
 ffffffff8019cf20 ffff8100a4c61988 0000000000000008 ffffffff8019cf4f
 ffff8100a4c61988 ffffffff80091fe2 ffffffff8030f4a0 0000000000000077
Call Trace:
 <IRQ>  [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff8019cf4f>] showacpu+0x2f/0x3b
 [<ffffffff80091fe2>] on_each_cpu+0x19/0x22
 [<ffffffff8019ce55>] __handle_sysrq+0x90/0x121
 [<ffffffff801ab3a7>] receive_chars+0x11f/0x26d
 [<ffffffff8003d671>] lock_timer_base+0x1b/0x3c
 [<ffffffff801ac44e>] serial8250_interrupt+0x7a/0xfe
 [<ffffffff8001097e>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b71de>] __do_IRQ+0xa4/0x103
 [<ffffffff80011ed2>] __do_softirq+0x5e/0xd6
 [<ffffffff8006c3f4>] do_IRQ+0xe7/0xf5
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff8002e7ff>] prune_dcache+0x55/0x149
 [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149
 [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1
 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1
 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416
 [<ffffffff80089241>] __wake_up_common+0x3e/0x68
 [<ffffffff80064969>] __down+0x99/0xd8
 [<ffffffff8008ac03>] default_wake_function+0x0/0xe
 [<ffffffff8802f98a>] :sunrpc:rpcauth_lookup_credcache+0x12e/0x24c
 [<ffffffff8002e254>] __wake_up+0x38/0x4f
 [<ffffffff80064651>] __down_failed+0x35/0x3a
 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3
 [<ffffffff8809008e>] :nfs:nfs_access_get_cached+0xab/0xfa
 [<ffffffff80009523>] __d_lookup+0xb0/0xff
 [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4
 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42
 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97
 [<ffffffff8001a954>] open_namei+0x73/0x6d5
 [<ffffffff80066852>] do_page_fault+0x4fe/0x830
 [<ffffffff8002732a>] do_filp_open+0x1c/0x38
 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80019720>] do_sys_open+0x44/0xbe
 [<ffffffff8005d116>] system_call+0x7e/0x83

CPU0:
 ffffffff80416f48 0000000000000000 ffff8103ff41bda0 ffffffff8019cf20
 0000000000000000 ffff81020386d380 ffff8103ff4d2df8 ffffffff8019cf4f
 ffff8103ff4d2df8 ffffffff80022dba ffff8100ab345810 ffff8103ff4d2df8
Call Trace:
 <IRQ>  [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff8019cf4f>] showacpu+0x2f/0x3b
 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75
 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80064b54>] .text.lock.spinlock+0x2/0x30
 [<ffffffff80022655>] d_alloc+0x166/0x1a9
 [<ffffffff800e88e6>] dcache_dir_open+0x0/0x26
 [<ffffffff800e88fa>] dcache_dir_open+0x14/0x26
 [<ffffffff8001e336>] __dentry_open+0xd9/0x1dc
 [<ffffffff80027338>] do_filp_open+0x2a/0x38
 [<ffffffff80019720>] do_sys_open+0x44/0xbe
 [<ffffffff8005d116>] system_call+0x7e/0x83

CPU3:
 ffff8103038f7f48 0000000000000000 ffff81020390de70 ffffffff8019cf20
 0000000000000000 0000000000000001 ffffffff803c82c0 ffffffff8019cf4f
 ffffffff8019cf20 ffffffff80022dba ffffffff8006ad3b 0000000000000040
Call Trace:
 <IRQ>  [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff8019cf4f>] showacpu+0x2f/0x3b
 [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75
 [<ffffffff8006ad3b>] default_idle+0x0/0x50
 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006ad64>] default_idle+0x29/0x50
 [<ffffffff80048b1d>] cpu_idle+0x95/0xb8
 [<ffffffff80076667>] start_secondary+0x45a/0x469

CPU5:
 ffff81030391ff48 0000000000000000 ffff8102eb3f37b0 ffffffff8019cf20
 0000000000000000 ffff810203849980 0000000000000000 ffffffff8019cf4f
 ffffffff8005e2fc ffffffff80022dba ffffffff803053c0 0000000000000246
Call Trace:
 <IRQ>  [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff8019cf4f>] showacpu+0x2f/0x3b
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75
 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80064aa8>] _spin_unlock_irqrestore+0x8/0x9
 [<ffffffff8006498e>] __down+0xbe/0xd8
 [<ffffffff8008ac03>] default_wake_function+0x0/0xe
 [<ffffffff8002e254>] __wake_up+0x38/0x4f
 [<ffffffff80064651>] __down_failed+0x35/0x3a
 [<ffffffff80064c5f>] .text.lock.kernel_lock+0x1d/0x1f
 [<ffffffff80062fa7>] thread_return+0xac/0xdf
 [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff88090564>] :nfs:nfs_dentry_iput+0x0/0x4e
 [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44
 [<ffffffff80063015>] cond_resched+0x3b/0x42
 [<ffffffff8000d043>] dput+0x1d/0x114
 [<ffffffff800e2014>] prune_one_dentry+0x66/0x76
 [<ffffffff8002e88a>] prune_dcache+0xe0/0x149
 [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1
 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1
 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416
 [<ffffffff80089241>] __wake_up_common+0x3e/0x68
 [<ffffffff80089241>] __wake_up_common+0x3e/0x68
 [<ffffffff800646ff>] __up_wakeup+0x35/0x67
 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3
 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f
 [<ffffffff80009523>] __d_lookup+0xb0/0xff
 [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4
 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42
 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97
 [<ffffffff8001a954>] open_namei+0x73/0x6d5
 [<ffffffff80017542>] cache_grow+0x356/0x395
 [<ffffffff8002732a>] do_filp_open+0x1c/0x38
 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
 [<ffffffff80019720>] do_sys_open+0x44/0xbe
 [<ffffffff8005d116>] system_call+0x7e/0x83

CPU4:
 ffff81010399ff48 0000000000000000 ffff81000cd7de70 ffffffff8019cf20
 0000000000000000 0000000000000001 ffffffff803c82e0 ffffffff8019cf4f
 ffffffff8019cf20 ffffffff80022dba ffffffff8006ad3b 0000000000000040
Call Trace:
 <IRQ>  [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff8019cf4f>] showacpu+0x2f/0x3b
 [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75
 [<ffffffff8006ad3b>] default_idle+0x0/0x50
 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006ad64>] default_idle+0x29/0x50
 [<ffffffff80048b1d>] cpu_idle+0x95/0xb8
 [<ffffffff80076667>] start_secondary+0x45a/0x469

CPU7:
 ffff8103039cff48 0000000000000000 ffff810203a17e70 ffffffff8019cf20
 0000000000000000 0000000000000001 ffffffff803c8340 ffffffff8019cf4f
 ffffffff8019cf20 ffffffff80022dba ffffffff8006ad3b 0000000000000040
Call Trace:
 <IRQ>  [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff8019cf4f>] showacpu+0x2f/0x3b
 [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75
 [<ffffffff8006ad3b>] default_idle+0x0/0x50
 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006ad64>] default_idle+0x29/0x50
 [<ffffffff80048b1d>] cpu_idle+0x95/0xb8
 [<ffffffff80076667>] start_secondary+0x45a/0x469

CPU6:
 ffff8101039c7f48 0000000000000000 ffff8103fffad7b0 ffffffff8019cf20
 0000000000000000 ffff810303841400 ffff8100cdceb790 ffffffff8019cf4f
 ffffffff8005e2fc ffffffff80022dba ffffffff803053c0 0000000000000246
Call Trace:
 <IRQ>  [<ffffffff8019cf20>] showacpu+0x0/0x3b
 [<ffffffff8019cf4f>] showacpu+0x2f/0x3b
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75
 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80064aa8>] _spin_unlock_irqrestore+0x8/0x9
 [<ffffffff8006498e>] __down+0xbe/0xd8
 [<ffffffff8008ac03>] default_wake_function+0x0/0xe
 [<ffffffff80064651>] __down_failed+0x35/0x3a
 [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff80064c5f>] .text.lock.kernel_lock+0x1d/0x1f
 [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44
 [<ffffffff80062fa7>] thread_return+0xac/0xdf
 [<ffffffff88090564>] :nfs:nfs_dentry_iput+0x0/0x4e
 [<ffffffff80039784>] generic_drop_inode+0x0/0x15a
 [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44
 [<ffffffff80063015>] cond_resched+0x3b/0x42
 [<ffffffff8000d043>] dput+0x1d/0x114
 [<ffffffff800e2014>] prune_one_dentry+0x66/0x76
 [<ffffffff8002e88a>] prune_dcache+0xe0/0x149
 [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1
 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1
 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416
 [<ffffffff80089241>] __wake_up_common+0x3e/0x68
 [<ffffffff80089241>] __wake_up_common+0x3e/0x68
 [<ffffffff800646ff>] __up_wakeup+0x35/0x67
 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3
 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f
 [<ffffffff80009523>] __d_lookup+0xb0/0xff
 [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4
 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42
 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5
 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8
 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97
 [<ffffffff8001a954>] open_namei+0x73/0x6d5
 [<ffffffff80017542>] cache_grow+0x356/0x395
 [<ffffffff8002732a>] do_filp_open+0x1c/0x38
 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
 [<ffffffff80019720>] do_sys_open+0x44/0xbe
 [<ffffffff8005d116>] system_call+0x7e/0x83




This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 18 Issue Tracker 2008-06-05 20:10:29 UTC

Created attachment 915143 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 19 Issue Tracker 2008-06-05 20:10:31 UTC

I added the following debug code to nfs_lookup_revalidate():

--- linux+rh+chaos.orig/fs/nfs/dir.c
+++ linux+rh+chaos/fs/nfs/dir.c
@@ -752,6 +752,9 @@ static int nfs_lookup_revalidate(struct 
                dfprintk(LOOKUPCACHE, "%s: %s/%s has dud inode\\n",
                                __FUNCTION__,
dentry->d_parent->d_name.name,
                                dentry->d_name.name);
+               printk("nfs_lookup_revalidate: is_bad_inode: %s/%s\\n",
+                               dentry->d_parent->d_name.name,
+                               dentry->d_name.name);
                goto out_bad;
        }
 
@@ -764,19 +767,35 @@ static int nfs_lookup_revalidate(struct 
                goto out_valid;
        }
 
-       if (NFS_STALE(inode))
+       if (NFS_STALE(inode)) {
+               printk("nfs_lookup_revalidate: NFS_STALE(inode):
%s/%s\\n",
+                               dentry->d_parent->d_name.name,
+                               dentry->d_name.name);
                goto out_bad;
+       }
 
-       if (nfs_register_inode_group(dir))
+       if (nfs_register_inode_group(dir)) {
+               printk("nfs_lookup_revalidate: "
+                      "nfs_register_inode_group(dir): %s/%s\\n",
+                               dentry->d_parent->d_name.name,
+                               dentry->d_name.name);
                goto out_bad;
+       }
 
        error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle,
&fattr);
-       if (error)
+       if (error) {
+               printk("nfs_lookup_revalidate: "
+                      "NFS_PROTO(dir)->lookup(dir): error=%d\\n",
error);
                goto out_bad;
-       if (nfs_compare_fh(NFS_FH(inode), &fhandle))
+       }
+       if (nfs_compare_fh(NFS_FH(inode), &fhandle)) {
+               printk("nfs_lookup_revalidate:
nfs_compare_fh=true\\n");
                goto out_bad;
-       if ((error = nfs_refresh_inode(inode, &fattr)) != 0)
+       }
+       if ((error = nfs_refresh_inode(inode, &fattr)) != 0) {
+               printk("nfs_lookup_revalidate: nfs_refresh_inode=%d\\n",
error);
                goto out_bad;
+       }
 
        nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
  out_valid:

to determine how we're getting into shrink_dcache_parent().

I saw the following output on a node that hit the issue:

2008-05-08 17:07:52 nfs_lookup_revalidate: NFS_PROTO(dir)->lookup(dir):
error=-512
2008-05-08 17:08:01 BUG: soft lockup detected on CPU#0!
2008-05-08 17:08:01 
2008-05-08 17:08:01 Call Trace:
2008-05-08 17:08:01  <IRQ>  [<ffffffff800b5e1b>]
softlockup_tick+0xd5/0xf0
2008-05-08 17:08:01  [<ffffffff80093910>] update_process_times+0x54/0x7a
2008-05-08 17:08:01  [<ffffffff80075877>]
smp_local_timer_interrupt+0x2c/0x61
2008-05-08 17:08:01  [<ffffffff80075f3f>]
smp_apic_timer_interrupt+0x41/0x47
2008-05-08 17:08:01  [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
2008-05-08 17:08:01  <EOI>  [<ffffffff80007708>] _raw_spin_lock+0x6d/0xeb
2008-05-08 17:08:01  [<ffffffff8002292d>] d_alloc+0x17c/0x1c1
2008-05-08 17:08:01  [<ffffffff8020b46a>] sock_attach_fd+0x56/0xfd
2008-05-08 17:08:01  [<ffffffff80012429>] get_empty_filp+0x10d/0x177
2008-05-08 17:08:01  [<ffffffff8020bcc3>] sys_accept+0xd5/0x1ea
2008-05-08 17:08:01  [<ffffffff800312f5>] do_fork+0x10e/0x187
2008-05-08 17:08:01  [<ffffffff8005c116>] system_call+0x7e/0x83
2008-05-08 17:08:01  [<ffffffff8005c116>] system_call+0x7e/0x83
2008-05-08 17:08:01 
2008-05-08 17:08:01 BUG: soft lockup detected on CPU#1!
2008-05-08 17:08:01 
2008-05-08 17:08:01 Call Trace:
2008-05-08 17:08:01  <IRQ>  [<ffffffff800b5e1b>]
softlockup_tick+0xd5/0xf0
2008-05-08 17:08:01  [<ffffffff80090768>] tasklet_action+0x62/0xac
2008-05-08 17:08:01  [<ffffffff80093910>] update_process_times+0x54/0x7a
2008-05-08 17:08:01  [<ffffffff80075877>]
smp_local_timer_interrupt+0x2c/0x61
2008-05-08 17:08:01  [<ffffffff8005d368>] call_softirq+0x1c/0x28
2008-05-08 17:08:02  [<ffffffff80075f3f>]
smp_apic_timer_interrupt+0x41/0x47
2008-05-08 17:08:02  [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
2008-05-08 17:08:02  <EOI>  [<ffffffff8002eb1b>] prune_dcache+0x53/0x14d
2008-05-08 17:08:02  [<ffffffff8002eaf3>] prune_dcache+0x2b/0x14d
2008-05-08 17:08:02  [<ffffffff8004ba11>] shrink_dcache_parent+0x1c/0xf6
2008-05-08 17:08:02  [<ffffffff88090496>]
:nfs:nfs_lookup_revalidate+0x470/0x530
2008-05-08 17:08:02  [<ffffffff800871c5>] __wake_up_common+0x3e/0x68
2008-05-08 17:08:02  [<ffffffff88031c72>]
:sunrpc:rpcauth_lookup_credcache+0x193/0x24d
2008-05-08 17:08:02  [<ffffffff8002e491>] __wake_up+0x38/0x4f
2008-05-08 17:08:02  [<ffffffff8809114a>]
:nfs:nfs_access_get_cached+0xba/0xfd
2008-05-08 17:08:02  [<ffffffff80009524>] __d_lookup+0xd3/0x110
2008-05-08 17:08:02  [<ffffffff8000cd2a>] do_lookup+0x17d/0x1d4
2008-05-08 17:08:02  [<ffffffff80009913>] __link_path_walk+0x3b2/0xf62
2008-05-08 17:08:02  [<ffffffff8000e80c>] link_path_walk+0x5c/0xe5
2008-05-08 17:08:02  [<ffffffff8009bd69>]
autoremove_wake_function+0x0/0x2e
2008-05-08 17:08:02  [<ffffffff80015b07>] get_unused_fd+0xf9/0x107
2008-05-08 17:08:02  [<ffffffff8000ca32>] do_path_lookup+0x274/0x2eb
2008-05-08 17:08:02  [<ffffffff800237e4>]
__path_lookup_intent_open+0x56/0x96
2008-05-08 17:08:02  [<ffffffff8001aab4>] open_namei+0x73/0x6d9
2008-05-08 17:08:02  [<ffffffff8001761f>] cache_grow+0x362/0x38d
2008-05-08 17:08:02  [<ffffffff80027607>] do_filp_open+0x1c/0x38
2008-05-08 17:08:02  [<ffffffff8009bd69>]
autoremove_wake_function+0x0/0x2e
2008-05-08 17:08:02  [<ffffffff80015b07>] get_unused_fd+0xf9/0x107
2008-05-08 17:08:02  [<ffffffff80019869>] do_sys_open+0x44/0xbe
2008-05-08 17:08:02  [<ffffffff8005c116>] system_call+0x7e/0x83
2008-05-08 17:08:02 
2008-05-08 17:08:10 BUG: soft lockup detected on CPU#6!
2008-05-08 17:08:10 
2008-05-08 17:08:10 Call Trace:
2008-05-08 17:08:10  <IRQ>  [<ffffffff800b5e1b>]
softlockup_tick+0xd5/0xf0
2008-05-08 17:08:10  [<ffffffff80093910>] update_process_times+0x54/0x7a
2008-05-08 17:08:10  [<ffffffff80075877>]
smp_local_timer_interrupt+0x2c/0x61
2008-05-08 17:08:10  [<ffffffff8005d368>] call_softirq+0x1c/0x28
2008-05-08 17:08:10  [<ffffffff80075f3f>]
smp_apic_timer_interrupt+0x41/0x47
2008-05-08 17:08:10  [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
2008-05-08 17:08:10  <EOI>  [<ffffffff8002c801>]
dummy_inode_permission+0x0/0x3
2008-05-08 17:08:10  [<ffffffff8000770a>] _raw_spin_lock+0x6f/0xeb
2008-05-08 17:08:10  [<ffffffff8002292d>] d_alloc+0x17c/0x1c1
2008-05-08 17:08:10  [<ffffffff800e78ad>] dcache_dir_open+0x0/0x26
2008-05-08 17:08:10  [<ffffffff800e78c1>] dcache_dir_open+0x14/0x26
2008-05-08 17:08:10  [<ffffffff8001e4b6>] __dentry_open+0xd9/0x1e2
2008-05-08 17:08:10  [<ffffffff80027615>] do_filp_open+0x2a/0x38
2008-05-08 17:08:10  [<ffffffff80015b07>] get_unused_fd+0xf9/0x107
2008-05-08 17:08:10  [<ffffffff80019869>] do_sys_open+0x44/0xbe
2008-05-08 17:08:10  [<ffffffff8005c116>] system_call+0x7e/0x83
2008-05-08 17:08:10 



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 20 Issue Tracker 2008-06-05 20:10:32 UTC

File uploaded: softlock-dcache.C

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 132459

Comment 21 Issue Tracker 2008-06-05 20:10:33 UTC

Still haven't figured out a reproducer but have been working on it. We
think that it has something to do with the job controller sending a TERM
signal to all the processes pretty much simultaneously. Added a bunch of
command line arguments which hopefully will allow us to tune things and
cause the problem more easily.

softlock-dcache [options]
        --dirtymem allocate a bunch of memory and then dirty it
        --tasks <num> how many processes to run
        --grouptasks make all the tasks part of a proceess group
        --finddir directory to do the find in
        --help print this message




This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 22 Issue Tracker 2008-06-05 20:10:34 UTC

File softlock-dcache.C became obsolete.
This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 23 Issue Tracker 2008-06-05 20:10:35 UTC

File uploaded: job.patch

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 132461

Comment 24 Issue Tracker 2008-06-05 20:10:36 UTC

File uploaded: job-fix-resource-leaks.patch

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 132462

Comment 25 Issue Tracker 2008-06-05 20:10:37 UTC

File uploaded: job-fix-unchecked.patch

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 132463

Comment 26 Issue Tracker 2008-06-05 20:10:38 UTC

Date: Thu, 08 May 2008 18:08:48 -0700
From: Ben Woodard <woodard>
User-Agent: Thunderbird 2.0.0.12 (X11/20080306)
MIME-Version: 1.0
To: Steve Dickson <SteveD>
CC: Kent Baxley <kbaxley>, Jeffrey Layton <jlayton>,

 Peter Staubach <staubach>
Subject: Re: RHEL5 softlockups in dcache
References: <20080507163254.260b900e.redhat.com>
<482236A6.4020909> <4822E2E6.1030107>
In-Reply-To: <4822E2E6.1030107>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Jeff Layton wrote:
 > Sounds ugly. I don't see any reason right offhand why we'd need to do
a
 > getattr when we're pruning the dcache, but dcache handing is very
 > obtuse stuff.

Steve Dickson wrote:
 > It appears the reason shrink_dcache_parent() is being called is
 > due to something failing (i.e. goto out_bad). Do we know what is
 > failing and why? That failure (or error path) might be that real
 > cause of this lockup...

Mark instrumented nfs_lookup_revalidate() to see how we're getting into 
shrink_parent_dcache()

         error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle, 
&fattr);
         if (error) {
                 printk("nfs_lookup_revalidate: 
NFS_PROTO(dir)->lookup(dir): error=%d\\n", error);
               goto out_bad;
         }

What we get is:
2008-05-08 17:07:52 nfs_lookup_revalidate: NFS_PROTO(dir)->lookup(dir): 
error=-512

-512 is -ERESTARTSYS

We think that this is set in nfs3_rpc_wrapper() as a result of EJUKEBOX. 
I never ever heard of EJUKEBOX before.
#define EJUKEBOX  528     /* Request initiated, but will not complete 
before timeout */

Jeff Layton wrote:
 > Exactly what kernel are they running here? I don't see any mention of
 > it in the IT.

We were able to reproduce this on the -92 release of the kernel with one 
added kernel module. This is a job container module. I attached the 
three patches that make up this module to the IT. We currently do not 
think the module is causing the problem, we think that this allows us to 
expose the underlying race at a higher probability. It seems to take 
around 4 iterations before we can trigger it with the -92 kernel. We 
think that given enough iterations we will be able to hit it with the 
-92 kernel without the module but since the test takes so much time to 
run, we haven't tried more than a handful of iterations.

 > The locking there is some of the most fragile stuff in the
 > kernel. Even very small, subtle changes can have large effects...and
 > yes, the dcache_lock is very busy.
 >
 > It's very good that you have a reproducer. I'd definitely recommend
 > testing this on the latest and greatest RHEL5 kernel so that we can
 > rule out existing changes (it seems like Eric Sandeen did some dcache
 > changes a while back).
 >

Do you guys have any ideas of things we can try? Or any ideas of what 
the problem might be?

-ben



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 27 Issue Tracker 2008-06-05 20:10:39 UTC

Info from Steve Dickson:


> -512 is -ERESTARTSYS
> 
> We think that this is set in nfs3_rpc_wrapper() as a result of
EJUKEBOX.
> I never ever heard of EJUKEBOX before.
> #define EJUKEBOX  528     /* Request initiated, but will not complete
> before timeout */  
Using tshark, what is the error that's being returned by the server?
Also who is the server?

> 
> Jeff Layton wrote:  
>> Exactly what kernel are they running here? I don't see any mention of
>> it in the IT.  
> 
> We were able to reproduce this on the -92 release of the kernel with
one
> added kernel module. This is a job container module. I attached the
> three patches that make up this module to the IT. We currently do not
> think the module is causing the problem, we think that this allows us
to
> expose the underlying race at a higher probability. It seems to take
> around 4 iterations before we can trigger it with the -92 kernel. We
> think that given enough iterations we will be able to hit it with the
> -92 kernel without the module but since the test takes so much time to
> run, we haven't tried more than a handful of iterations.
>   
>> The locking there is some of the most fragile stuff in the
>> kernel. Even very small, subtle changes can have large effects...and
>> yes, the dcache_lock is very busy.
>>
>> It's very good that you have a reproducer. I'd definitely recommend
>> testing this on the latest and greatest RHEL5 kernel so that we can
>> rule out existing changes (it seems like Eric Sandeen did some dcache
>> changes a while back).
>>  
> 
> Do you guys have any ideas of things we can try? Or any ideas of what
> the problem might be?  
I would look in some error patch where a lock is not being released...
Unfortunately, it could be anywhere including in the rpc code.

steved.


This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 28 Issue Tracker 2008-06-05 20:10:40 UTC

Jeff Layton input:

> We think that this is set in nfs3_rpc_wrapper() as a result of EJUKEBOX.

> I never ever heard of EJUKEBOX before.
> #define EJUKEBOX  528     /* Request initiated, but will not complete 
> before timeout */
>   

ERESTARTSYS is also often returned when a call is interrupted by a
signal. It might be worthwhile to check and see if there's a signal
pending when you get this error.

> Jeff Layton wrote:  
>  > Exactly what kernel are they running here? I don't see any mention
of
>  > it in the IT.  
> 
> We were able to reproduce this on the -92 release of the kernel with one

> added kernel module. This is a job container module. I attached the 
> three patches that make up this module to the IT. We currently do not 
> think the module is causing the problem, we think that this allows us to

> expose the underlying race at a higher probability. It seems to take 
> around 4 iterations before we can trigger it with the -92 kernel. We 
> think that given enough iterations we will be able to hit it with the 
> -92 kernel without the module but since the test takes so much time to 
> run, we haven't tried more than a handful of iterations.
>   

Ok, good to know.

>  > The locking there is some of the most fragile stuff in the
>  > kernel. Even very small, subtle changes can have large effects...and
>  > yes, the dcache_lock is very busy.
>  >
>  > It's very good that you have a reproducer. I'd definitely
recommend
>  > testing this on the latest and greatest RHEL5 kernel so that we can
>  > rule out existing changes (it seems like Eric Sandeen did some
dcache
>  > changes a while back).
>  >  
> 
> Do you guys have any ideas of things we can try? Or any ideas of what 
> the problem might be?
>   

Whenever I see an ERESTARTSYS, my immediate thought is "signal".
That's
probably a more likely scenario than the server returning error, but I
could be wrong here.

Either way, I agree with Steve. It sounds like there may be a codepath
that causes a return from shrink_dcache_parent w/o releasing the lock.
If I'm correct about the signals then it may have something to do with
calling into there with a signal pending.



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 29 Issue Tracker 2008-06-05 20:10:41 UTC

Jeff Layton input, cont'd:

Jeff Layton <jlayton> wrote:
> 
> Whenever I see an ERESTARTSYS, my immediate thought is "signal".
That's
> probably a more likely scenario than the server returning error, but I
> could be wrong here.
>   

Actually, I'm not wrong here. If you look closer at nfs3_rpc_wrapper(),
you'll see that it only returns -ERESTARTSYS if the task is signaled.
Also, rpc_call_sync and its lower functions can return -ERESTARTSYS for
the same reason. I'd definitely look closely at that.

One possibility might be to experiment with a fault-injection patch.
Change nfs_lookup_revalidate so that it can predictably call
shrink_dcache_parent with a signal pending and see if that makes the
problem happen. It may cause the whole thing to crash, but it might be
worth experimenting with :-)

> Either way, I agree with Steve. It sounds like there may be a codepath
> that causes a return from shrink_dcache_parent w/o releasing the lock.
> If I'm correct about the signals then it may have something to do with
> calling into there with a signal pending.
>   

I'd also add some instrumentation to check the dcache_lock after return
from shrink_dcache_parent in this function. If it's locked, see if the
owner is the current task.

Good luck!
-- 
Jeff Layton <jlayton>

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 30 Issue Tracker 2008-06-05 20:10:42 UTC

File uploaded: softlock-dcache.C

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 132865

Comment 31 Issue Tracker 2008-06-05 20:10:43 UTC

I finally got something that causes a softlockup but it seems to be a
different softlockup possibly related to utrace:

strace -o foo -ff ./softlock-dcache --grouptasks --dirtymem

On our own modified kernel the softlock looks like this:

2008-05-12 19:03:06 BUG: soft lockup detected on CPU#2!
2008-05-12 19:03:06
2008-05-12 19:03:06 Call Trace:
2008-05-12 19:03:06  <IRQ>  [<ffffffff800b5ddb>]
softlockup_tick+0xd5/0xe7
2008-05-12 19:03:06  [<ffffffff80093aec>] update_process_times+0x54/0x7a
2008-05-12 19:03:06  [<ffffffff800758b6>]
smp_local_timer_interrupt+0x2c/0x61
2008-05-12 19:03:06  [<ffffffff8005d368>] call_softirq+0x1c/0x28
2008-05-12 19:03:06  [<ffffffff80075f7e>]
smp_apic_timer_interrupt+0x41/0x47
2008-05-12 19:03:06  [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
2008-05-12 19:03:06  <EOI>  [<ffffffff880942de>]
:nfs:nfs_destroy_inode+0x0/0x13
2008-05-12 19:03:06  [<ffffffff800c3481>]
shrink_inactive_list+0x75e/0x7e7
2008-05-12 19:03:06  [<ffffffff800c3427>]
shrink_inactive_list+0x704/0x7e7
2008-05-12 19:03:06  [<ffffffff8009bc9b>] keventd_create_kthread+0x0/0x61
2008-05-12 19:03:06  [<ffffffff880942de>] :nfs:nfs_destroy_inode+0x0/0x13
2008-05-12 19:03:06  [<ffffffff880942de>] :nfs:nfs_destroy_inode+0x0/0x13
2008-05-12 19:03:06  [<ffffffff80012b15>] shrink_zone+0xf6/0x11c
2008-05-12 19:03:06  [<ffffffff8009be5e>]
autoremove_wake_function+0x0/0x2e
2008-05-12 19:03:06  [<ffffffff800c3c05>] try_to_free_pages+0x197/0x2b9
2008-05-12 19:03:06  [<ffffffff8000f135>] __alloc_pages+0x1cb/0x2ce
2008-05-12 19:03:06  [<ffffffff80008b95>] __handle_mm_fault+0x59b/0xdf4
2008-05-12 19:03:06  [<ffffffff80065a96>] do_page_fault+0x4b8/0x81d
2008-05-12 19:03:06  [<ffffffff80061f42>] thread_return+0x0/0xeb
2008-05-12 19:03:06  [<ffffffff800ba742>] utrace_quiescent+0x20f/0x256
2008-05-12 19:03:06  [<ffffffff8005ce55>] error_exit+0x0/0x84
2008-05-12 19:03:06
2008-05-12 19:03:07 BUG: soft lockup detected on CPU#3!
2008-05-12 19:03:07
2008-05-12 19:03:07 Call Trace:
2008-05-12 19:03:07  <IRQ>  [<ffffffff800b5ddb>]
softlockup_tick+0xd5/0xe7
2008-05-12 19:03:07  [<ffffffff800909dc>] tasklet_action+0x62/0xac
2008-05-12 19:03:07  [<ffffffff80093aec>] update_process_times+0x54/0x7a
2008-05-12 19:03:07  [<ffffffff800758b6>]
smp_local_timer_interrupt+0x2c/0x61
2008-05-12 19:03:07  [<ffffffff8005d368>] call_softirq+0x1c/0x28
2008-05-12 19:03:07  [<ffffffff80075f7e>]
smp_apic_timer_interrupt+0x41/0x47
2008-05-12 19:03:07  [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
2008-05-12 19:03:07  <EOI>  [<ffffffff800c299c>]
shrink_active_list+0x1b0/0x426
2008-05-12 19:03:07  [<ffffffff800c2911>] shrink_active_list+0x125/0x426
2008-05-12 19:03:07  [<ffffffff80012af7>] shrink_zone+0xd8/0x11c
2008-05-12 19:03:07  [<ffffffff8009be5e>]
autoremove_wake_function+0x0/0x2e
2008-05-12 19:03:07  [<ffffffff800c3c05>] try_to_free_pages+0x197/0x2b9
2008-05-12 19:03:07  [<ffffffff8000f135>] __alloc_pages+0x1cb/0x2ce
2008-05-12 19:03:07  [<ffffffff80008b95>] __handle_mm_fault+0x59b/0xdf4
2008-05-12 19:03:07  [<ffffffff80065a96>] do_page_fault+0x4b8/0x81d
2008-05-12 19:03:07  [<ffffffff80061f42>] thread_return+0x0/0xeb
2008-05-12 19:03:07  [<ffffffff800ba742>] utrace_quiescent+0x20f/0x256
2008-05-12 19:03:07  [<ffffffff8005ce55>] error_exit+0x0/0x84
2008-05-12 19:03:07




This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 32 Issue Tracker 2008-06-05 20:10:44 UTC

Gave Roland a heads up re the softlockup that I did manage to trigger this
is how he responded:

That is just a false backtrace from the kernel's lossy method and the
address having been there on that kernel stack before.  The meaningful
part of those traces is the do_page_fault path.  There is no
utrace-related
call in the real call chain for those threads.


Thanks,
Roland



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 33 Issue Tracker 2008-06-05 20:10:45 UTC

File softlock-dcache.C became obsolete.
This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 34 Issue Tracker 2008-06-05 20:10:46 UTC

File softlock-dcache.C became obsolete.
This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 35 Issue Tracker 2008-06-05 20:10:47 UTC

File uploaded: softlock-dcache.C

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 133546

Comment 36 Issue Tracker 2008-06-05 20:10:48 UTC

Another version of softlock-dcache still no dice reproducing the problem.


This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 37 Issue Tracker 2008-06-05 20:10:49 UTC

Mark can you please take a look at my current version of the reproducer and
see if you can see anything that I missed?

Other than an account on chimp, what are the problems preventing me from
playing with the reproducer that you have right now? Maybe I'll observe
something that you didn't see. 

Can the code run on a machine other than chimp?



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 38 Issue Tracker 2008-06-05 20:10:50 UTC

>josef< Wanted to ask you about:
http://kerneltrap.org/mailarchive/linux-fsdevel/2008/4/4/1351064
>josef< it looks like you did that for RHEL3 -- what about RHEL5?
>josef< I think it might solve the problem that we are seeing here at
LLNL.
>josef< can you look at
https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=178773
and tell me what you think?
<josef> rhel5 was completely different
<josef> rhel3 i just did that to get it fixed
<josef> its the same problem
<josef> my preference would be to see where dchinners work went
<neb> just to verify you mean, 178773 is the same problem as is in that
lkml thread.
<josef> i beleive so
<neb> do you have a ref to dchinners work that I can look at?
<josef> http://lwn.net/Articles/185465/
<neb> NM found it next message
<josef> the cond_resched_lock() in the for loop in the shrink_dcach stuff
would fix the softlockups
<josef> and thats upstream
<neb> how could cond_resched_lock() interact with the latency of NFS
dentry revalidation?
<neb> BTW we don't mount/unmount 
<josef> well if you are in the reclaim code via the nfs dentry
revalidation doing the cond_resched is going to make things slower since
it pops it off of hte processor to let something else run
<neb> We were sort of working on the theory that maybe the BKL taken in
the lookup path might be hamstringing forward progress.
<josef> it will block of the process is holding it yes
<josef> but if you resched the bkl is dropped
<josef> so if the waiters are also waiting on the bkl they'll be fine
<neb> one challenging thing is that all the cores are running the same
code and they are all entering the lookup patch at practically the same
time and all trying to shrink the dcache.
<josef> well then they'll be rescheduling and other stuff will be allowed
to work and eventually the dcach will be shrunk enough to continue to work
<neb> we would hope so but the machine goes catatonic but then we get a
soft lockup.
<neb> Machine doesn't appear to recover.
<josef> right thats what the cond_resched would fix
<neb> This one?
http://lxr.devel.redhat.com/RHEL5+2.6.18-53.1.14.el5/fs/dcache.c#L409
<josef> hm yeah, is that the kernel they are on?
<neb> yeah
<josef> well shit
<neb> double shit
<neb> ;-)
<neb> That is sort of why we were sort of trying to come up with a way
that it might be the BKL that NFS takes. 
<josef> may want to run this by jlayton, as he knows the NFS side much
better than i do
<neb> In all honesty, I don't think that we had it clear enough in our
minds to articulate it until just now but I guess we were sort of
thinking, that even though cond_resched_lock drops the dcache_lock it
isn't dropping the BKL that NFS needs to progress.
<josef> well when the process is scheduled the BKL is dropped
automatically
<neb> fair enough. It was just a hypothesis that we were starting to
consider.
<josef> well now that i think about it its all done on a counter
<josef> so if a bunch of processes are doing stuff with it then it may
starve other stuff
<neb> In this case 8core machine all CPUs doing exactly the same thing at
practically the same time.
<josef> that is awesome
<josef> i'd like to take some time and rework the dcache reclamation
stuff
<neb> HPC codes do that kind of thing all the time.
<neb> Here's the actual scenario - HPC job computing using all 8 cores.
runs out of time allocation, job scheduler sends the session a
killpg(SIGTERM) and so all processes do their final checkpoint before they
die.
<neb> Can I have the TAM take this issue and send it up to you and
jlayton?
<josef> yeah
<neb> I've been trying to come up with a portable reproducer for 3 weeks
but have so far failed.
<neb> thanks.
<josef> np, make sure to hit jlayton, since i'm going on a week vacation
starting tomorrow :)

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 39 Issue Tracker 2008-06-05 20:10:51 UTC

http://lkml.org/lkml/2008/5/21/580 seems to be the final disposition of the
chinner's patch.


This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 40 Issue Tracker 2008-06-05 20:10:52 UTC

posting a link to Ben's email to rhkl:

http://post-office.corp.redhat.com/archives/rhkernel-list/2008-June/msg00009.html


This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 41 Issue Tracker 2008-06-05 20:10:53 UTC

From: David Mair <dmair>
To: Kent Baxley <kbaxley>
Subject: Re: IT 178773
Date: Tue, 3 Jun 2008 14:58:12 -0400
Reply-To: David Mair <dmair>
User-Agent: KMail/1.9.9
Organization: Red Hat

Ack.  If you already have Layton and Josef involved then (as if I need to
say 
this) place that in the ticket.  I have no problem with that at all.

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 42 Issue Tracker 2008-06-05 20:10:54 UTC

Date problem reported: 5-6-08

Problem:

1) RHEL5.1, x86_64 with all Compute Nodes being diskless.  Everything is
done over NFS.

2) The nodes, over the last few months, have been running into a
softlockup condition inside of dcache.  When the systems get in this
state, they're basically catatonic. 

The lengthy explanation can be found here:

http://post-office.corp.redhat.com/archives/rhkernel-list/2008-June/msg00009.html

3) The problem was reproduced on stock 2.6.18-53.1.14 kernels as well
as 2.6.18-92 based ones.

4) Ben has not been able to come up with a "portable" (i.e.
something that we can use here) reproducer. The software that manifests
the problem is a huge program that can't be shipped off-site.  What
we've got here is basically an 8-core machine with all CPUs doing
exactly the same thing at practically the same time.

5) Ben and I have been touching base with Jeff Layton and Josef on
this, and, they appear to be ok with us sending it up.  


What is needed from SEG: As stated, some direct conversations have taken
place with engineering, and I got the OK from Dave Mair to send this on up
to engineering.  

Thanks.


Issue escalated to Support Engineering Group by: kbaxley.
Category set to: Kernel
Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 43 Issue Tracker 2008-06-05 20:10:55 UTC

File uploaded: split-lru-old.patch

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 136542

Comment 44 Issue Tracker 2008-06-05 20:10:56 UTC

File uploaded: split-lru-rhel5.patch

This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773
it_file 136543

Comment 45 Issue Tracker 2008-06-05 20:10:57 UTC

I tried to take Kentaro Makita's patch from:
http://lkml.org/lkml/2008/5/21/580 and backport it to RHEL5 but I failed
because too much of the context had changed. I felt that I'd likely screw
something up trying to hand apply every part of the patch and I was afraid
that I might introduce an additional bug.

So as a starting point, I grabbed David Chinner's patch from
http://lkml.org/lkml/2006/5/25/318 and had a lot more success applying it.
It wasn't that difficult to sort out the few conflicts where the patch
failed to apply. What I started out with is the patch called
split-lru-old.patch and what I ended up with is split-lru-rhel5.patch,
both attached. 

The kernel based upon these booted just fine but unfortunately, the
softlockup in the dcache still reproduced.



This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 178773

Comment 46 Josef Bacik 2008-06-11 20:57:58 UTC

Created attachment 308995 [details]
backport of dcache sb patch.

well this compiles, let me know if this works, this is the patch that is
upstream.  If it doesn't work I'd like to see the softlockup messages.

Comment 47 Issue Tracker 2008-06-11 23:22:05 UTC

Looks like we are going to need something more than that. 

16:19 <grondo> Ok, hit dcache lockup right away with the new patch

What do we try next?



This event sent from IssueTracker by woodard 
 issue 178773

Comment 48 Josef Bacik 2008-06-12 13:44:01 UTC

gonna need to see the softlockup messages.

Comment 53 RHEL Program Management 2009-02-16 15:27:28 UTC

Updating PM score.

Comment 54 Josef Bacik 2009-02-24 17:12:49 UTC

closing due to inactivity.

Comment 55 Ben Woodard 2009-11-06 16:03:00 UTC


*** This bug has been marked as a duplicate of bug 526612 ***