Bug 450194
Summary: | softlockup in dcache | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Issue Tracker <tao> | ||||||
Component: | kernel | Assignee: | Josef Bacik <jbacik> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.1 | CC: | dzickus, esandeen, jlayton, kbaxley, kgraham, staubach, steved, tao, woodard | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-02-24 17:12:49 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Issue Tracker
2008-06-05 20:10:10 UTC
Soft lockups fo the following form were seen on atlas over the weekend. So far they all seem to be due to the same job being canceled due to timeout. Not sure why yet. 2008-04-13 06:58:14 BUG: soft lockup detected on CPU#1! 2008-04-13 06:58:14 2008-04-13 06:58:14 Call Trace: 2008-04-13 06:58:14 <IRQ> [<ffffffff800b5df2>] softlockup_tick+0xd5/0xe7 2008-04-13 06:58:14 [<ffffffff800909f3>] tasklet_action+0x62/0xac 2008-04-13 06:58:14 [<ffffffff80093b03>] update_process_times+0x54/0x7a 2008-04-13 06:58:14 [<ffffffff800758d1>] smp_local_timer_interrupt+0x2c/0x61 2008-04-13 06:58:14 [<ffffffff8005d368>] call_softirq+0x1c/0x28 2008-04-13 06:58:14 [<ffffffff80075f99>] smp_apic_timer_interrupt+0x41/0x47 2008-04-13 06:58:14 [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c 2008-04-13 06:58:14 <EOI> [<ffffffff80039423>] generic_drop_inode+0x0/0x15a 2008-04-13 06:58:14 [<ffffffff8004b3cd>] shrink_dcache_parent+0x3f/0xe1 2008-04-13 06:58:14 [<ffffffff8004b3bc>] shrink_dcache_parent+0x2e/0xe1 2008-04-13 06:58:14 [<ffffffff88090418>] :nfs:nfs_lookup_revalidate+0x40e/0x4c8 2008-04-13 06:58:14 [<ffffffff8809ef59>] :nfs:nfs3_proc_access+0x12f/0x18e 2008-04-13 06:58:14 [<ffffffff800879b8>] find_busiest_group+0x20d/0x621 2008-04-13 06:58:14 [<ffffffff80063ae8>] _spin_unlock_irqrestore+0x8/0x9 2008-04-13 06:58:14 [<ffffffff8006373b>] __up_wakeup+0x35/0x67 2008-04-13 06:58:14 [<ffffffff8802d8f6>] :sunrpc:xprt_timer+0x0/0x7f 2008-04-13 06:58:14 [<ffffffff800122fe>] get_empty_filp+0x97/0x164 2008-04-13 06:58:14 [<ffffffff880e0b6c>] :aufs:aufs_d_revalidate+0x392/0x575 2008-04-13 06:58:14 [<ffffffff8809000a>] :nfs:nfs_lookup_revalidate+0x0/0x4c8 2008-04-13 06:58:14 [<ffffffff8009be75>] autoremove_wake_function+0x0/0x2e 2008-04-13 06:58:14 [<ffffffff8000949e>] __d_lookup+0xb0/0xff 2008-04-13 06:58:14 [<ffffffff8000cbf5>] do_lookup+0x17d/0x1d4 2008-04-13 06:58:14 [<ffffffff80009893>] __link_path_walk+0x3a6/0xf42 2008-04-13 06:58:14 [<ffffffff8000e6dd>] link_path_walk+0x5c/0xe5 2008-04-13 06:58:14 [<ffffffff8009be75>] autoremove_wake_function+0x0/0x2e 2008-04-13 06:58:14 [<ffffffff8000c8fc>] do_path_lookup+0x270/0x2e8 2008-04-13 06:58:14 [<ffffffff80023511>] __path_lookup_intent_open+0x56/0x97 2008-04-13 06:58:14 [<ffffffff8001a911>] open_namei+0x73/0x6ed 2008-04-13 06:58:14 [<ffffffff80065ae4>] do_page_fault+0x506/0x838 2008-04-13 06:58:14 [<ffffffff80027305>] do_filp_open+0x1c/0x38 2008-04-13 06:58:14 [<ffffffff8009be75>] autoremove_wake_function+0x0/0x2e 2008-04-13 06:58:14 [<ffffffff800196cc>] do_sys_open+0x44/0xbe 2008-04-13 06:58:15 [<ffffffff8005c116>] system_call+0x7e/0x83 This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 > > Hi Mark, > Here is a list of nodes that had the soft lock problem from over the > weekend on atlas: > > 652: Has not been rebooted. > 660: Has not been rebooted. > 679: Has not been rebooted. > 726: Has not been rebooted. > 789: Has not been rebooted. > 1000: Has not been rebooted. > 1053: Has not been rebooted. > 1055: Has not been rebooted. > 1059: Has not been rebooted. I spot checked these nodes and they do *not* show the signs of being run out of memory by the user job. In other words, this is a new kind of hang we haven't seen yet. The signature stack trace is that nfs calls are in the soft lockup stack trace, e.g.: <EOI> [<ffffffff80039423>] generic_drop_inode+0x0/0x15a [<ffffffff8004b3cd>] shrink_dcache_parent+0x3f/0xe1 [<ffffffff8004b3bc>] shrink_dcache_parent+0x2e/0xe1 [<ffffffff88090418>] :nfs:nfs_lookup_revalidate+0x40e/0x4c8 The nodes have all recovered now, but it seems like they went "out to lunch" for perhaps as much as 10 minutes, which is why SLURM marked them down. Correlating the slurmd logs with theses events shows that the nodes started hitting soft lockups right after SIGTERM was sent to the tasks of the job because the job had reached its timelimit. From the slurmd.log on an example node: [Apr 13 06:57:45] debug2: Processing RPC: REQUEST_KILL_TIMELIMIT [Apr 13 06:57:45] debug2: container signal 15 to job 48437.0 [Apr 13 07:08:25] Job 48437: timeout: sent SIGTERM to 1 active steps Note that it appears to have taken over 10 minutes to print the first message after sending SIGTERM to the job. The time of these messsaages correlate precisely to the time of the soft lockup errors: 2008-04-13 06:57:54 BUG: soft lockup detected on CPU#1! To determine what jobs were running on these nodes at the time, first let's see when these nodes went down: (according to SLURM): # atlas1 /root > sinfo -o "%40R %N" -n atlas[652,660,679,726,789,1000,1053,1055,1059] REASON NODELIST Not responding [slurm@04/13-07:04:32] atlas[652,660,679,726,789] Not responding [slurm@04/12-01:49:42] atlas1000 Not responding [slurm@04/12-01:48:02] atlas[1053,1055,1059] Now we can query what jobs were running on those nodes using sqlog. The trick is that the nodes might have been marked down a few minutes after the jobs failed, so a time range should be provided to the -t option: # atlas1 /root > sqlog -t 04/13,6am..8am -n atlas[652,660,679,726,789] -o jobid,user,st,start,end,nnodes,nodes JOBID USER ST START END N NODELIST 48433 ikuo TO 04/12-14:57:23 04/13-06:57:41 8 atlas[724-731] 48434 ikuo TO 04/12-14:57:23 04/13-06:57:41 8 atlas[675-679,721-723] 48436 ikuo TO 04/12-14:57:23 04/13-06:57:41 8 atlas[657,660-666] 48437 ikuo TO 04/12-14:57:23 04/13-06:57:41 8 atlas[648-649,651-656] 48424 ikuo TO 04/12-14:57:22 04/13-06:57:41 8 atlas[786-793] # atlas1 /root > sqlog -t 04/12,1am..2am -n atlas[1053,1055,1059],atlas1000 -o jobid,user,st,start,end,nnodes,nodes JOBID USER ST START END N NODELIST 46254 ikuo CA 04/11-09:43:44 04/12-01:44:22 8 atlas[985-987,998-1002] 46239 ikuo CA 04/11-09:38:44 04/12-01:39:22 8 atlas[1052-1059] Note that these are all the same user, but that the jobs were canceled in the second (actually the first, timeline-wise) case. I verified that the job cancel operation also sends SIGTERM to the tasks. So it seems like something this user's job is doing when receiving SIGTERM is causing some kind of NFS issue on the nodes. I did log in to one of the user's existing jobs and verified that they have a signal handler for SIGTERM installed. It would be useful to know what that handler does. Meanwhile, this issue has been opened as CHAOS BZ 285. mark woodard assigned to issue for LLNL (HPC). Status set to: Waiting on Tech This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Last night on Atlas nodes 124 191 and 203 hit this bug (According to Mark). This also caused bz#284 to happen. I will update that bug as well with this new information. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Mark writes:
> I received a copy of the user's job from Adam. But can't reproduce
> the issue as myself.
This event sent from IssueTracker by streeter [SEG - Kernel]
issue 178773
Also from Mark: Lots more nodes hit this issue this weekend. All the same job # atlas1 /var/log/slurm > for f in `cat /tmp/nodes` ; do sqlog -E +12am -S -09:50:23 -n $f -L1 --no-running --no-header -o jid,user,start,end,st,nnodes,nodes; done | sort -u 18260 ikuo 05/02-15:19:01 05/03-07:19:35 TO 8 atlas[247-254] 18261 ikuo 05/02-15:19:01 05/03-07:19:35 TO 8 atlas[43-44,46,48,51,244-246] 18262 ikuo 05/02-15:19:49 05/03-07:20:35 TO 8 atlas[942-947,950,967] 18263 ikuo 05/02-15:19:49 05/03-07:20:35 TO 8 atlas[934-941] 18264 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[926-933] 18265 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[918-925] 18266 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[878-883,916-917] 18267 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[870-877] 18268 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[862-869] 18269 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[589-590,592-596,861] 18270 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[579-580,583-588] 18271 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[569-575,578] 18272 ikuo 05/02-15:19:50 05/03-07:20:35 TO 8 atlas[561-568] 18273 ikuo 05/02-15:19:51 05/03-07:20:35 TO 8 atlas[314-320,560] 18274 ikuo 05/02-15:19:51 05/03-07:20:35 TO 8 atlas[274-280,313] 18276 ikuo 05/02-15:22:43 05/03-07:23:35 TO 8 atlas[973,982,1124,1132,1140-1143] 18277 ikuo 05/02-15:22:44 05/03-07:23:35 TO 8 atlas[263-265,968-972] This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Yesterday Mark was able to reproduce the problem on a chaos kernel which didn't have aufs enabled. 13:56 <grondo> the fortran runtime is trying to find a message catalog, and is checking different paths 13:56 <neb> so it is the negative dcache entries. 13:57 <grondo> could be yes 13:57 <neb> that is interesting. I think I read something about NFS and negative dcache entries. 13:57 <grondo> though I don't really understand what negative dcache entry is 13:57 <grondo> spinning (soft lockup) process is stuck in dcache_lock 13:58 <grondo> some other processes are stuck waiting for kernel_lock 13:58 <grondo> and this one interesting process in nfs_lookup_revalidate->shrink_dcache_parent->nfs_getattr 13:58 <neb> it is basically a notation that something isn't there. 13:58 <grondo> ok 13:59 <neb> a long time ago BSD discovered that caching negatives as well as positives GREATLY improved system performance on file lookup. 14:00 <neb> IIRC it was like a 200% improvement on some benchmarks 14:00 <grondo> I'm wondering what has the kernel lock 14:02 <grondo> ah 14:02 <grondo> reproduced 14:02 <grondo> heh 14:02 <grondo> filling the dcache seems to have done the trick 14:02 <neb> nice 14:02 <grondo> hm, let me just make sure... 14:03 <grondo> yep 14:03 <grondo> node didn't seem to lock up as long, but still got the soft lockup warning 14:04 <grondo> now if it is 100% reproducible, I can verify it is aufs by running with bind mode 14:05 <grondo> I wonder now if I don't even need the user job to reproduce 14:06 <grondo> just get 8 threads to simultaneously try to open a non-existent file after running find / for awhile 14:14 <grondo> nice seems reproducible, as long as run the job for at least 5 minutes <snip> 16:35 <grondo> hit bz285 soft lockup without aufs 16:35 <grondo> :( 16:35 <neb> using just bind mounts? 16:35 <grondo> yes 16:36 <grondo> I'll get a crashdump on chimp 16:37 <neb> Send me all the details and I'll play around with it on wopr tomorrow to make sure that I can reproduce it on a stock rh kernel. 16:37 <neb> if I can I'll send it up. 16:37 <grondo> hm, I don't think I can give you the reproducer 16:37 <grondo> I'll have to see if we can reproduce with something simpler 16:38 <neb> can you boot up a RHEL kernel. 16:41 <grondo> I could try 16:42 <neb> are you heading out in a few? 16:43 <grondo> about 20minutes 16:44 <neb> OK if we pick this up 1st thing in the am? 16:44 <grondo> yeah 16:44 <neb> by the time I get the latest kernel down and then you install it then you'll have to leave <snip> 11:01 <neb> I was unclear re: your softlockup reproducer yesterday. Did you or didn't you need the user's code trigger the problem? 11:01 <grondo> so far I can't reproduce without the user's code This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Working on seeing if the problem will reproduce on -53.1.14 and -92 Having trouble with the stap script. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 10:46 <grondo> ok, I'm honing in on the nfs softlockup problem 10:46 <neb> what have you figured out? 10:46 <grondo> with an stap probe I'm seeing cases where tasks spend up to ~20s or more in prune_dcache 10:47 <grondo> prune_dcache is being called with huge counts in this case, like 20000+ 10:47 <grondo> where the normal case is more like 1 or 2 10:47 <grondo> this wouldn't be a huge issue if just one task was in prune_dcache, but we seem to be hitting a case where 2, 3 or even more tasks are in prune_dcache 10:48 <grondo> Furthermore, when NFS calls shrink_dcache_parent() from nfs_lookup_revalidate(), it has the kernel lock held 10:48 <grondo> so various things, especially NFS, start backing up 10:48 <grondo> which is bad when you have NFS root 10:49 <grondo> Possible solutions 10:50 <grondo> 1. don't let `found' get too big in select_parent(). Return early if we've found 1K or so, so we don't get the case of 20,30,40K 10:51 <grondo> 2. Don't call shrink_dcache_parent() from nfs_lookup_revalidate() with the kernel lock held. 10:51 <grondo> I worry about #1 because I'm not sure if that function is used at umount time 10:52 <grondo> and I'm not sure if #2 is enough, since dcache_lock could still be heavily contended if we have a bunch of tasks in prune_dcache 10:52 <grondo> Hm, actually #1 might be ok, select_parent() already returns early if (found && need_resched()) 10:53 <grondo> I could add anther test found >= 1024) 10:53 <grondo> I also want to work on a standalone reproducer 10:55 <neb> re #2 would it be possible to drop the kernel lock before calling shrink_cache_parent() from nfs_lookup_revalidate? 10:55 <neb> and then pick it up again after? 10:56 <grondo> maybe 10:56 <grondo> though nfs may be depending on only one task being in that section of code at a time 10:57 <grondo> the only thing it does after, though, is d_drop(dentry) 10:57 <grondo> but it then unlocks the kernel, so I assume it has some reason for doing the d_drop first 10:58 <neb> my thoughts on that would be to try it and then we'd submit to the NFS maintainer here and see if he has a problem with it. 10:58 <grondo> however, it could set a flag, shrink_dcache_needed, and call the shrink_dcache_parent() after the unlock_kernel() 10:59 <grondo> I will try that. I fear it won't be enough though... 11:04 <grondo> hm 11:06 <grondo> Can't call shrink_dcache_parent(dentry) after d_drop(dentry) 11:07 <grondo> and seemingly can't unlock_kernel() until after d_drop(dentry) 11:06 <grondo> Can't call shrink_dcache_parent(dentry) after d_drop(dentry) 11:07 <grondo> and seemingly can't unlock_kernel() until after d_drop(dentry) 11:08 <grondo> why doesn't NFS use its own locks? 11:10 <neb> last I heard, no one had put together a truely compelling case where it mattered for performance and because NFS is so interrelated with all the other kernel subsystems, trying to make finer grained locks led to problems with lock order inversions leading to deadlocks. 11:10 <neb> sorting out those deadlocks were deemed so mindboggeling difficult that until someone came up with a case where it truely mattered, they were going to defer. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 15:01 <grondo> so here's something else with this nfs soft lockup. If this stack trace is to be trusted 15:01 <grondo> #0 [ffff8102f4bab7b8] schedule at ffffffff80061f4e 15:01 <grondo> #1 [ffff8102f4bab7c0] nfs3_rpc_wrapper at ffffffff8809ea6c 15:01 <grondo> #2 [ffff8102f4bab800] nfs3_proc_getattr at ffffffff8809ed82 15:01 <grondo> #3 [ffff8102f4bab890] __first_cpu at ffffffff80142602 15:01 <grondo> #4 [ffff8102f4bab908] shrink_dcache_parent at ffffffff8004ba37 15:01 <grondo> #5 [ffff8102f4bab960] nfs_lookup_revalidate at ffffffff88090437 15:01 <grondo> then this task called schedule() with the dcache_lock held 15:01 <grondo> PID: 7073 TASK: ffff81020437b040 CPU: 5 COMMAND: "cp2k.popt" 15:01 <grondo> crash> p dcache_lock 15:01 <grondo> dcache_lock = $3 = { 15:01 <grondo> raw_lock = { 15:01 <grondo> slock = 0 15:01 <grondo> }, 15:01 <grondo> magic = 3735899821, 15:01 <grondo> owner_cpu = 5, 15:01 <grondo> owner = 0xffff81020437b040 15:01 <grondo> } 15:04 <neb> OOH yuck that looks like a baddie to me. 15:04 <grondo> can that stack trace be possible? 15:05 <neb> getting from nfs3_rpc_wrapper to schedule() is totally doable. 15:06 <neb> shrink_dcache_parent to nfs3_proc_getattr Is something that I haven't verified but I think it can be done. 15:06 <neb> I don't know about __first_cpu 15:07 <neb> so 5->4 makes sense and 4->2 sounds likely (don't know anything about 3) 2->1->0 is also likely right 15:08 <neb> I have no clue about #3 at all 15:08 <neb> that seems bogus to me. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Date: Wed, 07 May 2008 16:09:26 -0700 From: Ben Woodard <woodard> User-Agent: Thunderbird 2.0.0.12 (X11/20080306) MIME-Version: 1.0 To: Steve Dickson <steved> CC: Kent Baxley <kbaxley>, Jeffrey Layton <jlayton>, Peter Staubach <staubach> Subject: Re: RHEL5 softlockups in dcache References: <20080507163254.260b900e.redhat.com> In-Reply-To: <20080507163254.260b900e.redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit We are getting a bit closer to the problem I think. But I'm not well versed in the finer points of the dcache or NFS and so this is kind of mind boggling to me. Here is what we are seeing: #0 [ffff8102f4bab7b8] schedule at ffffffff80061f4e #1 [ffff8102f4bab7c0] nfs3_rpc_wrapper at ffffffff8809ea6c #2 [ffff8102f4bab800] nfs3_proc_getattr at ffffffff8809ed82 #3 [ffff8102f4bab890] __first_cpu at ffffffff80142602 #4 [ffff8102f4bab908] shrink_dcache_parent at ffffffff8004ba37 #5 [ffff8102f4bab960] nfs_lookup_revalidate at ffffffff88090437 then this task called schedule() with the dcache_lock held PID: 7073 TASK: ffff81020437b040 CPU: 5 COMMAND: "cp2k.popt" crash> p dcache_lock dcache_lock = $3 = { raw_lock = { slock = 0 }, magic = 3735899821, owner_cpu = 5, owner = 0xffff81020437b040 } I don't know what the heck that __first_cpu is doing in there I think it must be something wrong with the backtraces. So far we have only found one way to reproduce the problem and it isn't transportable. We run find on / to heavily pre-populate the dcache with dentries that end up being unused. Then we run a program that triggers the problem. The problem shows up intermittently in the user's environment but if we run the find, we can reliably reproduce it. The program seems to have 8 synchronized tasks that are all probing a path for a file. We think that they look for the file early on and then this preloads the cache with negative dentries. Then later on after the local cache has timed out, all 8 processes respond to a SIGTERM and so they all start searching for the file again and that is what causes the softlockup to appear. We think what is going on is that we have a big dcache with lots of unused entries and some timed out negative dcache entries from the previous search through the path looking for the file. Then, all at once all the CPUs in the system try to look for the same file which isn't there in several places. This causes multiple tasks to try to shrink the dcache at the same time, getting hung up on the dcache_lock and then we get a softlockup when one of the CPUs tries to revalidate the negative dentry and it has to go out on the wire. When it goes out on the wire it goes to sleep holding the dcache_lock. Honesty, the finer points of dcache locking are beyond me which is why I'm was looking for some help from you guys. Unfortunately, our only reproducer at the moment an export controlled code that can't leave the lab. Once we have a transportable reproducer. I should be able to pass this up it to you guys in a more conventional format. So in summary the key points seem to be: 1. dcache is large, 200K-400K entries, a lot of which are unused 2. user app hits negative dentries which causes shrink_dcache_parent to get triggered in nfs_lookup_revalidate. We think that this requires 1. that the path has been looked up previously 2. that the dentries have to be revalidated, i.e. you hit the mount points cache timeout but we are not sure 3. For some reason, not sure why, select_parent() is returning very large values for `found' which is passed as the count to prune_dcache() literally multiple thousands. We think that this might be because there are a large number of unused dentries expiring under the mount point or something. 4. It seems like maybe the dcache_lock is heavily contended 5. plus we're doing all this with the kernel_lock held, which backs up the rest of nfs 6. We tried limiting the value of `found' returned by select_parent() to 1024 or less, but that actually didn't seem to help much -ben Kent Baxley wrote: > Hey, guys. > > I know this is way, way, out of the normal process, but, Ben and the guys at LLNL have run into a weird problem, and they're looking for a nudge in the right direction. For those that have Issue Tracker access, the ticket can be found here: > > https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=178773 > > It appears that they're hitting some sort of softlockup in dcache. The systems in question are all running RHEL5-based kernels. Over the weekend several of their cluster nodes went "out to lunch" for approximately 10 minutes. The nodes did eventually recover, and the common warning message and call trace on all of the systems so far seems to have been: > > BUG: soft lockup detected on CPU#1! > > ... > > <EOI> [<ffffffff80039423>] generic_drop_inode+0x0/0x15a > [<ffffffff8004b3cd>] shrink_dcache_parent+0x3f/0xe1 > [<ffffffff8004b3bc>] shrink_dcache_parent+0x2e/0xe1 > [<ffffffff88090418>] :nfs:nfs_lookup_revalidate+0x40e/0x4c8 > > > (The full trace is logged in the summary of the Issue Tracker ticket). > > I believe Ben and LLNL are getting close to isolating the problem, but, at this point, they're getting into some unfamiliar territory and are looking for some tips on how to proceed. The last post to the Issue Tracker ticket details their most current findings and problems with regard to where to go next. > > Ben is also working on getting the folks at LLNL to try and reproduce the problem on stock -53.1.14 and -92 RHEL5 kernels. > > Anyway, if you guys have a second to look at the problem and can provide Ben with some info that can keep them going in the right direction, it would be greatly appreciated. > > Thanks for your time! > > > This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File uploaded: softlock-dcache.C This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 132211 I've been trying to come up with a reproducer. So far I've failed. However, this is what I've come up with so far trying to capture what we think is going on with the problem. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 From: Steve Dickson <SteveD>
To: Ben Woodard <woodard>
CC: Kent Baxley <kbaxley>, Jeffrey Layton <jlayton>,
Peter Staubach <staubach>
Subject: Re: RHEL5 softlockups in dcache
Date: Thu, 08 May 2008 07:24:22 -0400
User-Agent: Thunderbird 2.0.0.12 (X11/20080226)
Ben Woodard wrote:
> We are getting a bit closer to the problem I think. But I'm not well
> versed in the finer points of the dcache or NFS and so this is kind of
> mind boggling to me. Here is what we are seeing:
>
> #0 [ffff8102f4bab7b8] schedule at ffffffff80061f4e
> #1 [ffff8102f4bab7c0] nfs3_rpc_wrapper at ffffffff8809ea6c
> #2 [ffff8102f4bab800] nfs3_proc_getattr at ffffffff8809ed82
> #3 [ffff8102f4bab890] __first_cpu at ffffffff80142602
> #4 [ffff8102f4bab908] shrink_dcache_parent at ffffffff8004ba37
> #5 [ffff8102f4bab960] nfs_lookup_revalidate at ffffffff88090437
It appears the reason shrink_dcache_parent() is being called is
do to something failing (i.e. goto out_bad). Do we know what is
failing and why? That failure (or error path) might be that real
cause of this lockup...
steved.
This event sent from IssueTracker by streeter [SEG - Kernel]
issue 178773
From: Jeff Layton <jlayton> To: Ben Woodard <woodard> Cc: Steve Dickson <steved>, Kent Baxley <kbaxley>, Peter Staubach <staubach> Subject: Re: RHEL5 softlockups in dcache Date: Thu, 8 May 2008 07:14:49 -0400 X-Mailer: Claws Mail 3.3.1 (GTK+ 2.12.9; i386-redhat-linux-gnu) On Wed, 07 May 2008 16:09:26 -0700 Ben Woodard <woodard> wrote: > We are getting a bit closer to the problem I think. But I'm not well > versed in the finer points of the dcache or NFS and so this is kind of > mind boggling to me. Here is what we are seeing: > > #0 [ffff8102f4bab7b8] schedule at ffffffff80061f4e > #1 [ffff8102f4bab7c0] nfs3_rpc_wrapper at ffffffff8809ea6c > #2 [ffff8102f4bab800] nfs3_proc_getattr at ffffffff8809ed82 > #3 [ffff8102f4bab890] __first_cpu at ffffffff80142602 > #4 [ffff8102f4bab908] shrink_dcache_parent at ffffffff8004ba37 > #5 [ffff8102f4bab960] nfs_lookup_revalidate at ffffffff88090437 > > then this task called schedule() with the dcache_lock held > PID: 7073 TASK: ffff81020437b040 CPU: 5 COMMAND: "cp2k.popt" > crash> p dcache_lock > dcache_lock = $3 = { > raw_lock = { > slock = 0 > }, > magic = 3735899821, > owner_cpu = 5, > owner = 0xffff81020437b040 > } > > I don't know what the heck that __first_cpu is doing in there I think it > must be something wrong with the backtraces. > > So far we have only found one way to reproduce the problem and it isn't > transportable. We run find on / to heavily pre-populate the dcache with > dentries that end up being unused. Then we run a program that triggers > the problem. The problem shows up intermittently in the user's > environment but if we run the find, we can reliably reproduce it. The > program seems to have 8 synchronized tasks that are all probing a path > for a file. We think that they look for the file early on and then this > preloads the cache with negative dentries. Then later on after the local > cache has timed out, all 8 processes respond to a SIGTERM and so they > all start searching for the file again and that is what causes the > softlockup to appear. > > We think what is going on is that we have a big dcache with lots of > unused entries and some timed out negative dcache entries from the > previous search through the path looking for the file. Then, all at once > all the CPUs in the system try to look for the same file which isn't > there in several places. This causes multiple tasks to try to shrink the > dcache at the same time, getting hung up on the dcache_lock and then we > get a softlockup when one of the CPUs tries to revalidate the negative > dentry and it has to go out on the wire. When it goes out on the wire it > goes to sleep holding the dcache_lock. > > Honesty, the finer points of dcache locking are beyond me which is why > I'm was looking for some help from you guys. Unfortunately, our only > reproducer at the moment an export controlled code that can't leave the > lab. Once we have a transportable reproducer. I should be able to pass > this up it to you guys in a more conventional format. > > So in summary the key points seem to be: > > 1. dcache is large, 200K-400K entries, a lot of which are unused > 2. user app hits negative dentries which causes shrink_dcache_parent > to get triggered in nfs_lookup_revalidate. We think that this > requires > 1. that the path has been looked up previously > 2. that the dentries have to be revalidated, i.e. you hit the > mount points cache timeout but we are not sure > 3. For some reason, not sure why, select_parent() is returning very > large values for `found' which is passed as the count to > prune_dcache() literally multiple thousands. We think that this > might be because there are a large number of unused dentries > expiring under the mount point or something. > 4. It seems like maybe the dcache_lock is heavily contended > 5. plus we're doing all this with the kernel_lock held, which backs > up the rest of nfs > 6. We tried limiting the value of `found' returned by select_parent() > to 1024 or less, but that actually didn't seem to help much > > -ben > Sounds ugly. I don't see any reason right offhand why we'd need to do a getattr when we're pruning the dcache, but dcache handing is very obtuse stuff. The locking there is some of the most fragile stuff in the kernel. Even very small, subtle changes can have large effects...and yes, the dcache_lock is very busy. It's very good that you have a reproducer. I'd definitely recommend testing this on the latest and greatest RHEL5 kernel so that we can rule out existing changes (it seems like Eric Sandeen did some dcache changes a while back). Exactly what kernel are they running here? I don't see any mention of it in the IT. -- Jeff Layton <jlayton> This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 09:39 <grondo> If that were the case, then the job container may facilitate the reproducer because 1. it signals all tasks in the job container at once and/or 2. it seems to send the signal to every task in the container, this might mean threads get the signal too. I should look at how kill_pg() works, maybe that is a misfeature of the job container thing 10:00 <neb> meeting 10:00 <grondo> arggh 10:02 <neb> it isn't thaaat bad -- kent is a nice guy and keeps the meetings short. 10:02 <grondo> so if your reproducer still doesn't work, can you try putting your children into a process group and use killpg()? 10:02 <grondo> also, allow the number of total children to be tunable 10:03 <grondo> and try with more tasks 10:04 <grondo> basically, reproduces much easier when using the job containers 10:11 <neb> does "much easier" mean haven't been able to reproduce it without process groups yet? 10:12 <grondo> yes, but I only tried 4-5 times 10:12 <grondo> but reproduced using job containers the second time 10:12 <neb> do you have some sample code on how to add tasks to process groups. 10:12 <neb> otherwise I have to look it up. 10:12 <neb> I don't know that part of posix programming off the top of my head. 10:12 <grondo> man 2 setpgid This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 I was able to reproduce the problem on a 92.el5 kernel: 2008-05-08 11:11:13 BUG: soft lockup - CPU#3 stuck for 10s! [cp2k.popt:6903] 2008-05-08 11:11:13 CPU 3: 2008-05-08 11:11:13 Modules linked in: job(U) w83627hf hwmon_vid i2c_isa ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy sata_nv i2c_nforce2 libata pcspkr i2c_core ohci_hcd k8temp scsi_mod hwmon shpchp k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache sunrpc e10 00 2008-05-08 11:11:13 Pid: 6903, comm: cp2k.popt Tainted: G 2.6.18-92.el5 #1 2008-05-08 11:11:13 RIP: 0010:[<ffffffff8002e7ea>] [<ffffffff8002e7ea>] prune_dcache+0x40/0x149 2008-05-08 11:11:13 RSP: 0018:ffff8101f7649a38 EFLAGS: 00000287 2008-05-08 11:11:13 RAX: 000000000000328e RBX: ffff8102e1378b70 RCX: 0000000000000067 2008-05-08 11:11:13 RDX: ffff8102e1378bb0 RSI: ffffffff802f4080 RDI: ffffffff803c5480 2008-05-08 11:11:13 RBP: 00000000000006f3 R08: ffff81010384b8e0 R09: ffff8100a1ed88e8 2008-05-08 11:11:13 R10: 0000000000000202 R11: ffffffff80039784 R12: ffff81010384b8e0 2008-05-08 11:11:13 R13: 0000000000000001 R14: ffff81010384b8e0 R15: ffff810099265810 2008-05-08 11:11:13 FS: 0000000040100940(0063) GS:ffff8101039697c0(0000) knlGS:0000000000000000 2008-05-08 11:11:13 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2008-05-08 11:11:13 CR2: 00002aaaab56b0f4 CR3: 00000001e6b8f000 CR4: 00000000000006e0 2008-05-08 11:11:13 2008-05-08 11:11:13 Call Trace: 2008-05-08 11:11:13 [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149 2008-05-08 11:11:13 [<ffffffff8004cd30>] shrink_dcache_parent+0x3f/0xe1 2008-05-08 11:11:13 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1 2008-05-08 11:11:13 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416 2008-05-08 11:11:13 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 2008-05-08 11:11:13 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 2008-05-08 11:11:13 [<ffffffff8002e254>] __wake_up+0x38/0x4f 2008-05-08 11:11:13 [<ffffffff800646ff>] __up_wakeup+0x35/0x67 2008-05-08 11:11:13 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3 2008-05-08 11:11:13 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f 2008-05-08 11:11:13 [<ffffffff80009523>] __d_lookup+0xb0/0xff 2008-05-08 11:11:13 [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4 2008-05-08 11:11:13 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42 2008-05-08 11:11:13 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5 2008-05-08 11:11:13 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 11:11:13 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 11:11:13 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8 2008-05-08 11:11:13 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97 2008-05-08 11:11:13 [<ffffffff8001a954>] open_namei+0x73/0x6d5 2008-05-08 11:11:13 [<ffffffff80017542>] cache_grow+0x356/0x395 2008-05-08 11:11:13 [<ffffffff8002732a>] do_filp_open+0x1c/0x38 2008-05-08 11:11:13 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 11:11:13 [<ffffffff80019720>] do_sys_open+0x44/0xbe 2008-05-08 11:11:13 [<ffffffff8005d116>] system_call+0x7e/0x83 2008-05-08 11:11:13 2008-05-08 11:11:22 BUG: soft lockup - CPU#0 stuck for 10s! [slurmd:5993] 2008-05-08 11:11:22 CPU 0: 2008-05-08 11:11:22 Modules linked in: job(U) w83627hf hwmon_vid i2c_isa ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy sata_nv i2c_nforce2 libata pcspkr i2c_core ohci_hcd k8temp scsi_mod hwmon shpchp k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache sunrpc e10 00 2008-05-08 11:11:22 Pid: 5993, comm: slurmd Tainted: G 2.6.18-92.el5 #1 2008-05-08 11:11:22 RIP: 0010:[<ffffffff80064b54>] [<ffffffff80064b54>] .text.lock.spinlock+0x2/0x30 2008-05-08 11:11:22 RSP: 0018:ffff8101ffc8be10 EFLAGS: 00000286 2008-05-08 11:11:22 RAX: ffff8102ffe0a800 RBX: ffff8101ffc8be68 RCX: 0000000000000000 2008-05-08 11:11:22 RDX: ffff81009e530204 RSI: ffff8101ffc8be4f RDI: ffffffff803c5480 2008-05-08 11:11:22 RBP: 00000000802f2630 R08: ffff8100cefdec00 R09: ffff81020385a000 2008-05-08 11:11:22 R10: 0000000000000000 R11: 000000d000000028 R12: ffff81000b4f6400 2008-05-08 11:11:22 R13: ffff8101ffc8bd88 R14: ffff8101ffc8be48 R15: 0000003000000018 2008-05-08 11:11:22 FS: 00002aaaab991c00(0000) GS:ffffffff8039e000(0000) knlGS:0000000000000000 2008-05-08 11:11:22 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2008-05-08 11:11:22 CR2: 0000000040302da8 CR3: 00000001ff066000 CR4: 00000000000006e0 2008-05-08 11:11:22 2008-05-08 11:11:22 Call Trace: 2008-05-08 11:11:22 [<ffffffff80022655>] d_alloc+0x166/0x1a9 2008-05-08 11:11:22 [<ffffffff8020d0d6>] sock_attach_fd+0x56/0xfd 2008-05-08 11:11:22 [<ffffffff8020d92f>] sys_accept+0xd5/0x1ea 2008-05-08 11:11:22 [<ffffffff8006258f>] __sched_text_start+0x17f/0xaeb 2008-05-08 11:11:22 [<ffffffff8005d116>] system_call+0x7e/0x83 2008-05-08 11:11:22 2008-05-08 11:11:23 BUG: soft lockup - CPU#3 stuck for 10s! [cp2k.popt:6903] 2008-05-08 11:11:23 CPU 3: 2008-05-08 11:11:23 Modules linked in: job(U) w83627hf hwmon_vid i2c_isa ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy sata_nv i2c_nforce2 libata pcspkr i2c_core ohci_hcd k8temp scsi_mod hwmon shpchp k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache sunrpc e10 00 2008-05-08 11:11:23 Pid: 6903, comm: cp2k.popt Tainted: G 2.6.18-92.el5 #1 2008-05-08 11:11:23 RIP: 0010:[<ffffffff8002e7ff>] [<ffffffff8002e7ff>] prune_dcache+0x55/0x149 2008-05-08 11:11:23 RSP: 0018:ffff8101f7649a38 EFLAGS: 00000287 2008-05-08 11:11:23 RAX: 0000000000000dba RBX: ffff8102e1f01ed0 RCX: 0000000000000067 2008-05-08 11:11:23 RDX: ffff8102e1f01f10 RSI: ffffffff802f4080 RDI: ffffffff803c5480 2008-05-08 11:11:23 RBP: ffff8100a1ed88e8 R08: ffff81010384b8e0 R09: ffff8100a1ed88e8 2008-05-08 11:11:23 R10: 0000000000000202 R11: ffffffff80039784 R12: 0000000000000202 2008-05-08 11:11:23 R13: ffffffff80039784 R14: ffff8101f7649a28 R15: ffff810099265810 2008-05-08 11:11:23 FS: 0000000040100940(0063) GS:ffff8101039697c0(0000) knlGS:0000000000000000 2008-05-08 11:11:23 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2008-05-08 11:11:23 CR2: 00002aaaab56b0f4 CR3: 00000001e6b8f000 CR4: 00000000000006e0 2008-05-08 11:11:23 2008-05-08 11:11:23 Call Trace: 2008-05-08 11:11:23 [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149 2008-05-08 11:11:23 [<ffffffff8004cd30>] shrink_dcache_parent+0x3f/0xe1 2008-05-08 11:11:23 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1 2008-05-08 11:11:23 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416 2008-05-08 11:11:23 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 2008-05-08 11:11:23 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 2008-05-08 11:11:23 [<ffffffff8002e254>] __wake_up+0x38/0x4f 2008-05-08 11:11:23 [<ffffffff800646ff>] __up_wakeup+0x35/0x67 2008-05-08 11:11:23 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3 2008-05-08 11:11:23 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f 2008-05-08 11:11:23 [<ffffffff80009523>] __d_lookup+0xb0/0xff 2008-05-08 11:11:23 [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4 2008-05-08 11:11:23 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42 2008-05-08 11:11:23 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5 2008-05-08 11:11:23 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 11:11:23 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 11:11:23 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8 2008-05-08 11:11:23 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97 2008-05-08 11:11:23 [<ffffffff8001a954>] open_namei+0x73/0x6d5 2008-05-08 11:11:23 [<ffffffff80017542>] cache_grow+0x356/0x395 2008-05-08 11:11:23 [<ffffffff8002732a>] do_filp_open+0x1c/0x38 2008-05-08 11:11:23 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 11:11:23 [<ffffffff80019720>] do_sys_open+0x44/0xbe 2008-05-08 11:11:23 [<ffffffff8005d116>] system_call+0x7e/0x83 2008-05-08 11:11:23 This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 I was unable to reproduce this problem again after hitting it once on the nodes. I rebooted and tried again and hit the problem on the second try. I tried to crash the node, but kdump is not working on our HW in 92.el5 (might be a kexec-tools problem now that I think about it), but I did get a sysrq-w: 2008-05-08 13:35:45 BUG: soft lockup - CPU#5 stuck for 10s! [cp2k.popt:6403] 2008-05-08 13:35:45 CPU 5: 2008-05-08 13:35:45 Modules linked in: stap_6120(U) job(U) w83627hf hwmon_vid i2c_isa ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy i2c_nforce2 ohci_hcd pcspkr i2c_core sata_nv k8temp libata shpchp hwmon scsi_mod k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache sunrpc e1000 2008-05-08 13:35:45 Pid: 6403, comm: cp2k.popt Tainted: G 2.6.18-92.el5 #1 2008-05-08 13:35:45 RIP: 0010:[<ffffffff8002e7ff>] [<ffffffff8002e7ff>] prune_dcache+0x55/0x149 2008-05-08 13:35:45 RSP: 0018:ffff8102eb3f3a38 EFLAGS: 00000206 2008-05-08 13:35:45 RAX: 0000000000002a6c RBX: ffff8102e73514b0 RCX: 0000000000000001 2008-05-08 13:35:45 RDX: ffff8102e73514f0 RSI: ffffffff802f4080 RDI: ffffffff803c5480 2008-05-08 13:35:45 RBP: ffff8101c9ba3300 R08: ffff81020384b8e0 R09: ffff8101c9ba3300 2008-05-08 13:35:45 R10: 0000000000000246 R11: ffffffff80039784 R12: 0000000000000246 2008-05-08 13:35:45 R13: ffffffff80039784 R14: ffff8102eb3f3a28 R15: ffff8101c9910300 2008-05-08 13:35:45 FS: 0000000040201940(0063) GS:ffff8102039c2140(0000) knlGS:0000000000000000 2008-05-08 13:35:45 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2008-05-08 13:35:45 CR2: 00002aaaab56b0f4 CR3: 00000002fd458000 CR4: 00000000000006e0 2008-05-08 13:35:45 2008-05-08 13:35:45 Call Trace: 2008-05-08 13:35:45 [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149 2008-05-08 13:35:45 [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1 2008-05-08 13:35:45 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1 2008-05-08 13:35:45 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416 2008-05-08 13:35:45 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 2008-05-08 13:35:45 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 2008-05-08 13:35:45 [<ffffffff800646ff>] __up_wakeup+0x35/0x67 2008-05-08 13:35:45 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3 2008-05-08 13:35:45 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f 2008-05-08 13:35:45 [<ffffffff80009523>] __d_lookup+0xb0/0xff 2008-05-08 13:35:45 [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4 2008-05-08 13:35:45 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42 2008-05-08 13:35:45 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5 2008-05-08 13:35:45 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 13:35:45 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8 2008-05-08 13:35:45 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97 2008-05-08 13:35:45 [<ffffffff8001a954>] open_namei+0x73/0x6d5 2008-05-08 13:35:45 [<ffffffff80017542>] cache_grow+0x356/0x395 2008-05-08 13:35:45 [<ffffffff8002732a>] do_filp_open+0x1c/0x38 2008-05-08 13:35:45 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 13:35:45 [<ffffffff80019720>] do_sys_open+0x44/0xbe 2008-05-08 13:35:45 [<ffffffff8005d116>] system_call+0x7e/0x83 2008-05-08 13:35:45 2008-05-08 13:35:55 BUG: soft lockup - CPU#5 stuck for 10s! [cp2k.popt:6403] 2008-05-08 13:35:55 CPU 5: 2008-05-08 13:35:55 Modules linked in: stap_6120(U) job(U) w83627hf hwmon_vid i2c_isa ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_ucm ib_uverbs ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy i2c_nforce2 ohci_hcd pcspkr i2c_core sata_nv k8temp libata shpchp hwmon scsi_mod k8_edac edac_mc serio_raw nfs nfs_acl lockd fscache sunrpc e1000 2008-05-08 13:35:55 Pid: 6403, comm: cp2k.popt Tainted: G 2.6.18-92.el5 #1 2008-05-08 13:35:55 RIP: 0010:[<ffffffff8002e7ff>] [<ffffffff8002e7ff>] prune_dcache+0x55/0x149 2008-05-08 13:35:55 RSP: 0018:ffff8102eb3f3a38 EFLAGS: 00000206 2008-05-08 13:35:55 RAX: 00000000000003cd RBX: ffff8100a6bd04b0 RCX: 0000000000000001 2008-05-08 13:35:55 RDX: ffff8100a6bd04f0 RSI: ffffffff802f4080 RDI: ffffffff803c5480 2008-05-08 13:35:55 RBP: ffff8101c9ba3300 R08: ffff81020384b8e0 R09: ffff8101c9ba3300 2008-05-08 13:35:55 R10: 0000000000000246 R11: ffffffff80039784 R12: 0000000000000246 2008-05-08 13:35:55 R13: ffffffff80039784 R14: ffff8102eb3f3a28 R15: ffff8101c9910300 2008-05-08 13:35:55 FS: 0000000040201940(0063) GS:ffff8102039c2140(0000) knlGS:0000000000000000 2008-05-08 13:35:55 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2008-05-08 13:35:55 CR2: 00002aaaab56b0f4 CR3: 00000002fd458000 CR4: 00000000000006e0 2008-05-08 13:35:55 2008-05-08 13:35:55 Call Trace: 2008-05-08 13:35:55 [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149 2008-05-08 13:35:55 [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1 2008-05-08 13:35:55 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1 2008-05-08 13:35:55 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416 2008-05-08 13:35:55 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 2008-05-08 13:35:55 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 2008-05-08 13:35:55 [<ffffffff800646ff>] __up_wakeup+0x35/0x67 2008-05-08 13:35:55 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3 2008-05-08 13:35:55 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f 2008-05-08 13:35:55 [<ffffffff80009523>] __d_lookup+0xb0/0xff 2008-05-08 13:35:55 [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4 2008-05-08 13:35:55 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42 2008-05-08 13:35:55 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5 2008-05-08 13:35:55 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 13:35:55 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8 2008-05-08 13:35:55 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97 2008-05-08 13:35:55 [<ffffffff8001a954>] open_namei+0x73/0x6d5 2008-05-08 13:35:55 [<ffffffff80017542>] cache_grow+0x356/0x395 2008-05-08 13:35:55 [<ffffffff8002732a>] do_filp_open+0x1c/0x38 2008-05-08 13:35:55 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 2008-05-08 13:35:55 [<ffffffff80019720>] do_sys_open+0x44/0xbe 2008-05-08 13:35:55 [<ffffffff8005d116>] system_call+0x7e/0x83 2008-05-08 13:35:55 chimp2@root:conman chimp4 <ConMan> Connection to console [chimp4] opened. SysRq : Show CPUs CPU2: ffff810103933f48 0000000000000000 ffff8101fd6836e0 ffffffff8019cf20 0000000000000000 ffff810103841400 ffff8100cdceb790 ffffffff8019cf4f ffffffff8005e2fc ffffffff80022dba ffff8101fd6c77a0 0000000000000246 Call Trace: <IRQ> [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff8019cf4f>] showacpu+0x2f/0x3b [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c <EOI> [<ffffffff80062569>] __sched_text_start+0x159/0xaeb [<ffffffff800624fe>] __sched_text_start+0xee/0xaeb [<ffffffff80064993>] __down+0xc3/0xd8 [<ffffffff8008ac03>] default_wake_function+0x0/0xe [<ffffffff8002e254>] __wake_up+0x38/0x4f [<ffffffff80064651>] __down_failed+0x35/0x3a [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff80064c5f>] .text.lock.kernel_lock+0x1d/0x1f [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44 [<ffffffff80062fa7>] thread_return+0xac/0xdf [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44 [<ffffffff80063015>] cond_resched+0x3b/0x42 [<ffffffff8000d043>] dput+0x1d/0x114 [<ffffffff800e2014>] prune_one_dentry+0x66/0x76 [<ffffffff8002e88a>] prune_dcache+0xe0/0x149 [<ffffffff8004cd38>] shrink_dcache_parent+0x47/0xe1 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416 [<ffffffff8809d931>] :nfs:nfs3_proc_access+0x12f/0x18e [<ffffffff80089241>] __wake_up_common+0x3e/0x68 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 [<ffffffff800646ff>] __up_wakeup+0x35/0x67 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f [<ffffffff80009523>] __d_lookup+0xb0/0xff [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97 [<ffffffff8001a954>] open_namei+0x73/0x6d5 [<ffffffff80066852>] do_page_fault+0x4fe/0x830 [<ffffffff8002732a>] do_filp_open+0x1c/0x38 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e [<ffffffff80019720>] do_sys_open+0x44/0xbe [<ffffffff8005d116>] system_call+0x7e/0x83 CPU1: ffff8103038cbdc8 0000000000000000 0000000000000001 0000000000000000 ffffffff8019cf20 ffff8100a4c61988 0000000000000008 ffffffff8019cf4f ffff8100a4c61988 ffffffff80091fe2 ffffffff8030f4a0 0000000000000077 Call Trace: <IRQ> [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff8019cf4f>] showacpu+0x2f/0x3b [<ffffffff80091fe2>] on_each_cpu+0x19/0x22 [<ffffffff8019ce55>] __handle_sysrq+0x90/0x121 [<ffffffff801ab3a7>] receive_chars+0x11f/0x26d [<ffffffff8003d671>] lock_timer_base+0x1b/0x3c [<ffffffff801ac44e>] serial8250_interrupt+0x7a/0xfe [<ffffffff8001097e>] handle_IRQ_event+0x29/0x58 [<ffffffff800b71de>] __do_IRQ+0xa4/0x103 [<ffffffff80011ed2>] __do_softirq+0x5e/0xd6 [<ffffffff8006c3f4>] do_IRQ+0xe7/0xf5 [<ffffffff8005d615>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff8002e7ff>] prune_dcache+0x55/0x149 [<ffffffff8002e7d7>] prune_dcache+0x2d/0x149 [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 [<ffffffff80064969>] __down+0x99/0xd8 [<ffffffff8008ac03>] default_wake_function+0x0/0xe [<ffffffff8802f98a>] :sunrpc:rpcauth_lookup_credcache+0x12e/0x24c [<ffffffff8002e254>] __wake_up+0x38/0x4f [<ffffffff80064651>] __down_failed+0x35/0x3a [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3 [<ffffffff8809008e>] :nfs:nfs_access_get_cached+0xab/0xfa [<ffffffff80009523>] __d_lookup+0xb0/0xff [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97 [<ffffffff8001a954>] open_namei+0x73/0x6d5 [<ffffffff80066852>] do_page_fault+0x4fe/0x830 [<ffffffff8002732a>] do_filp_open+0x1c/0x38 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e [<ffffffff80019720>] do_sys_open+0x44/0xbe [<ffffffff8005d116>] system_call+0x7e/0x83 CPU0: ffffffff80416f48 0000000000000000 ffff8103ff41bda0 ffffffff8019cf20 0000000000000000 ffff81020386d380 ffff8103ff4d2df8 ffffffff8019cf4f ffff8103ff4d2df8 ffffffff80022dba ffff8100ab345810 ffff8103ff4d2df8 Call Trace: <IRQ> [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff8019cf4f>] showacpu+0x2f/0x3b [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c <EOI> [<ffffffff80064b54>] .text.lock.spinlock+0x2/0x30 [<ffffffff80022655>] d_alloc+0x166/0x1a9 [<ffffffff800e88e6>] dcache_dir_open+0x0/0x26 [<ffffffff800e88fa>] dcache_dir_open+0x14/0x26 [<ffffffff8001e336>] __dentry_open+0xd9/0x1dc [<ffffffff80027338>] do_filp_open+0x2a/0x38 [<ffffffff80019720>] do_sys_open+0x44/0xbe [<ffffffff8005d116>] system_call+0x7e/0x83 CPU3: ffff8103038f7f48 0000000000000000 ffff81020390de70 ffffffff8019cf20 0000000000000000 0000000000000001 ffffffff803c82c0 ffffffff8019cf4f ffffffff8019cf20 ffffffff80022dba ffffffff8006ad3b 0000000000000040 Call Trace: <IRQ> [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff8019cf4f>] showacpu+0x2f/0x3b [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75 [<ffffffff8006ad3b>] default_idle+0x0/0x50 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c <EOI> [<ffffffff8006ad64>] default_idle+0x29/0x50 [<ffffffff80048b1d>] cpu_idle+0x95/0xb8 [<ffffffff80076667>] start_secondary+0x45a/0x469 CPU5: ffff81030391ff48 0000000000000000 ffff8102eb3f37b0 ffffffff8019cf20 0000000000000000 ffff810203849980 0000000000000000 ffffffff8019cf4f ffffffff8005e2fc ffffffff80022dba ffffffff803053c0 0000000000000246 Call Trace: <IRQ> [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff8019cf4f>] showacpu+0x2f/0x3b [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c <EOI> [<ffffffff80064aa8>] _spin_unlock_irqrestore+0x8/0x9 [<ffffffff8006498e>] __down+0xbe/0xd8 [<ffffffff8008ac03>] default_wake_function+0x0/0xe [<ffffffff8002e254>] __wake_up+0x38/0x4f [<ffffffff80064651>] __down_failed+0x35/0x3a [<ffffffff80064c5f>] .text.lock.kernel_lock+0x1d/0x1f [<ffffffff80062fa7>] thread_return+0xac/0xdf [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff88090564>] :nfs:nfs_dentry_iput+0x0/0x4e [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44 [<ffffffff80063015>] cond_resched+0x3b/0x42 [<ffffffff8000d043>] dput+0x1d/0x114 [<ffffffff800e2014>] prune_one_dentry+0x66/0x76 [<ffffffff8002e88a>] prune_dcache+0xe0/0x149 [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 [<ffffffff800646ff>] __up_wakeup+0x35/0x67 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f [<ffffffff80009523>] __d_lookup+0xb0/0xff [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97 [<ffffffff8001a954>] open_namei+0x73/0x6d5 [<ffffffff80017542>] cache_grow+0x356/0x395 [<ffffffff8002732a>] do_filp_open+0x1c/0x38 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 [<ffffffff80019720>] do_sys_open+0x44/0xbe [<ffffffff8005d116>] system_call+0x7e/0x83 CPU4: ffff81010399ff48 0000000000000000 ffff81000cd7de70 ffffffff8019cf20 0000000000000000 0000000000000001 ffffffff803c82e0 ffffffff8019cf4f ffffffff8019cf20 ffffffff80022dba ffffffff8006ad3b 0000000000000040 Call Trace: <IRQ> [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff8019cf4f>] showacpu+0x2f/0x3b [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75 [<ffffffff8006ad3b>] default_idle+0x0/0x50 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c <EOI> [<ffffffff8006ad64>] default_idle+0x29/0x50 [<ffffffff80048b1d>] cpu_idle+0x95/0xb8 [<ffffffff80076667>] start_secondary+0x45a/0x469 CPU7: ffff8103039cff48 0000000000000000 ffff810203a17e70 ffffffff8019cf20 0000000000000000 0000000000000001 ffffffff803c8340 ffffffff8019cf4f ffffffff8019cf20 ffffffff80022dba ffffffff8006ad3b 0000000000000040 Call Trace: <IRQ> [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff8019cf4f>] showacpu+0x2f/0x3b [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75 [<ffffffff8006ad3b>] default_idle+0x0/0x50 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c <EOI> [<ffffffff8006ad64>] default_idle+0x29/0x50 [<ffffffff80048b1d>] cpu_idle+0x95/0xb8 [<ffffffff80076667>] start_secondary+0x45a/0x469 CPU6: ffff8101039c7f48 0000000000000000 ffff8103fffad7b0 ffffffff8019cf20 0000000000000000 ffff810303841400 ffff8100cdceb790 ffffffff8019cf4f ffffffff8005e2fc ffffffff80022dba ffffffff803053c0 0000000000000246 Call Trace: <IRQ> [<ffffffff8019cf20>] showacpu+0x0/0x3b [<ffffffff8019cf4f>] showacpu+0x2f/0x3b [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff80022dba>] smp_call_function_interrupt+0x57/0x75 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c <EOI> [<ffffffff80064aa8>] _spin_unlock_irqrestore+0x8/0x9 [<ffffffff8006498e>] __down+0xbe/0xd8 [<ffffffff8008ac03>] default_wake_function+0x0/0xe [<ffffffff80064651>] __down_failed+0x35/0x3a [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff80064c5f>] .text.lock.kernel_lock+0x1d/0x1f [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44 [<ffffffff80062fa7>] thread_return+0xac/0xdf [<ffffffff88090564>] :nfs:nfs_dentry_iput+0x0/0x4e [<ffffffff80039784>] generic_drop_inode+0x0/0x15a [<ffffffff8008c2ce>] __cond_resched+0x1c/0x44 [<ffffffff80063015>] cond_resched+0x3b/0x42 [<ffffffff8000d043>] dput+0x1d/0x114 [<ffffffff800e2014>] prune_one_dentry+0x66/0x76 [<ffffffff8002e88a>] prune_dcache+0xe0/0x149 [<ffffffff8004cd59>] shrink_dcache_parent+0x68/0xe1 [<ffffffff8004cd0d>] shrink_dcache_parent+0x1c/0xe1 [<ffffffff8808f41b>] :nfs:nfs_lookup_revalidate+0x35c/0x416 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 [<ffffffff800646ff>] __up_wakeup+0x35/0x67 [<ffffffff8002c525>] dummy_inode_permission+0x0/0x3 [<ffffffff80064c47>] .text.lock.kernel_lock+0x5/0x1f [<ffffffff80009523>] __d_lookup+0xb0/0xff [<ffffffff8000cc5e>] do_lookup+0x17d/0x1d4 [<ffffffff80009918>] __link_path_walk+0x3a6/0xf42 [<ffffffff8000e782>] link_path_walk+0x5c/0xe5 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 [<ffffffff8000c965>] do_path_lookup+0x270/0x2e8 [<ffffffff800234db>] __path_lookup_intent_open+0x56/0x97 [<ffffffff8001a954>] open_namei+0x73/0x6d5 [<ffffffff80017542>] cache_grow+0x356/0x395 [<ffffffff8002732a>] do_filp_open+0x1c/0x38 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186 [<ffffffff80019720>] do_sys_open+0x44/0xbe [<ffffffff8005d116>] system_call+0x7e/0x83 This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Created attachment 915143 [details]
Comment
(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
I added the following debug code to nfs_lookup_revalidate(): --- linux+rh+chaos.orig/fs/nfs/dir.c +++ linux+rh+chaos/fs/nfs/dir.c @@ -752,6 +752,9 @@ static int nfs_lookup_revalidate(struct dfprintk(LOOKUPCACHE, "%s: %s/%s has dud inode\\n", __FUNCTION__, dentry->d_parent->d_name.name, dentry->d_name.name); + printk("nfs_lookup_revalidate: is_bad_inode: %s/%s\\n", + dentry->d_parent->d_name.name, + dentry->d_name.name); goto out_bad; } @@ -764,19 +767,35 @@ static int nfs_lookup_revalidate(struct goto out_valid; } - if (NFS_STALE(inode)) + if (NFS_STALE(inode)) { + printk("nfs_lookup_revalidate: NFS_STALE(inode): %s/%s\\n", + dentry->d_parent->d_name.name, + dentry->d_name.name); goto out_bad; + } - if (nfs_register_inode_group(dir)) + if (nfs_register_inode_group(dir)) { + printk("nfs_lookup_revalidate: " + "nfs_register_inode_group(dir): %s/%s\\n", + dentry->d_parent->d_name.name, + dentry->d_name.name); goto out_bad; + } error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle, &fattr); - if (error) + if (error) { + printk("nfs_lookup_revalidate: " + "NFS_PROTO(dir)->lookup(dir): error=%d\\n", error); goto out_bad; - if (nfs_compare_fh(NFS_FH(inode), &fhandle)) + } + if (nfs_compare_fh(NFS_FH(inode), &fhandle)) { + printk("nfs_lookup_revalidate: nfs_compare_fh=true\\n"); goto out_bad; - if ((error = nfs_refresh_inode(inode, &fattr)) != 0) + } + if ((error = nfs_refresh_inode(inode, &fattr)) != 0) { + printk("nfs_lookup_revalidate: nfs_refresh_inode=%d\\n", error); goto out_bad; + } nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); out_valid: to determine how we're getting into shrink_dcache_parent(). I saw the following output on a node that hit the issue: 2008-05-08 17:07:52 nfs_lookup_revalidate: NFS_PROTO(dir)->lookup(dir): error=-512 2008-05-08 17:08:01 BUG: soft lockup detected on CPU#0! 2008-05-08 17:08:01 2008-05-08 17:08:01 Call Trace: 2008-05-08 17:08:01 <IRQ> [<ffffffff800b5e1b>] softlockup_tick+0xd5/0xf0 2008-05-08 17:08:01 [<ffffffff80093910>] update_process_times+0x54/0x7a 2008-05-08 17:08:01 [<ffffffff80075877>] smp_local_timer_interrupt+0x2c/0x61 2008-05-08 17:08:01 [<ffffffff80075f3f>] smp_apic_timer_interrupt+0x41/0x47 2008-05-08 17:08:01 [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c 2008-05-08 17:08:01 <EOI> [<ffffffff80007708>] _raw_spin_lock+0x6d/0xeb 2008-05-08 17:08:01 [<ffffffff8002292d>] d_alloc+0x17c/0x1c1 2008-05-08 17:08:01 [<ffffffff8020b46a>] sock_attach_fd+0x56/0xfd 2008-05-08 17:08:01 [<ffffffff80012429>] get_empty_filp+0x10d/0x177 2008-05-08 17:08:01 [<ffffffff8020bcc3>] sys_accept+0xd5/0x1ea 2008-05-08 17:08:01 [<ffffffff800312f5>] do_fork+0x10e/0x187 2008-05-08 17:08:01 [<ffffffff8005c116>] system_call+0x7e/0x83 2008-05-08 17:08:01 [<ffffffff8005c116>] system_call+0x7e/0x83 2008-05-08 17:08:01 2008-05-08 17:08:01 BUG: soft lockup detected on CPU#1! 2008-05-08 17:08:01 2008-05-08 17:08:01 Call Trace: 2008-05-08 17:08:01 <IRQ> [<ffffffff800b5e1b>] softlockup_tick+0xd5/0xf0 2008-05-08 17:08:01 [<ffffffff80090768>] tasklet_action+0x62/0xac 2008-05-08 17:08:01 [<ffffffff80093910>] update_process_times+0x54/0x7a 2008-05-08 17:08:01 [<ffffffff80075877>] smp_local_timer_interrupt+0x2c/0x61 2008-05-08 17:08:01 [<ffffffff8005d368>] call_softirq+0x1c/0x28 2008-05-08 17:08:02 [<ffffffff80075f3f>] smp_apic_timer_interrupt+0x41/0x47 2008-05-08 17:08:02 [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c 2008-05-08 17:08:02 <EOI> [<ffffffff8002eb1b>] prune_dcache+0x53/0x14d 2008-05-08 17:08:02 [<ffffffff8002eaf3>] prune_dcache+0x2b/0x14d 2008-05-08 17:08:02 [<ffffffff8004ba11>] shrink_dcache_parent+0x1c/0xf6 2008-05-08 17:08:02 [<ffffffff88090496>] :nfs:nfs_lookup_revalidate+0x470/0x530 2008-05-08 17:08:02 [<ffffffff800871c5>] __wake_up_common+0x3e/0x68 2008-05-08 17:08:02 [<ffffffff88031c72>] :sunrpc:rpcauth_lookup_credcache+0x193/0x24d 2008-05-08 17:08:02 [<ffffffff8002e491>] __wake_up+0x38/0x4f 2008-05-08 17:08:02 [<ffffffff8809114a>] :nfs:nfs_access_get_cached+0xba/0xfd 2008-05-08 17:08:02 [<ffffffff80009524>] __d_lookup+0xd3/0x110 2008-05-08 17:08:02 [<ffffffff8000cd2a>] do_lookup+0x17d/0x1d4 2008-05-08 17:08:02 [<ffffffff80009913>] __link_path_walk+0x3b2/0xf62 2008-05-08 17:08:02 [<ffffffff8000e80c>] link_path_walk+0x5c/0xe5 2008-05-08 17:08:02 [<ffffffff8009bd69>] autoremove_wake_function+0x0/0x2e 2008-05-08 17:08:02 [<ffffffff80015b07>] get_unused_fd+0xf9/0x107 2008-05-08 17:08:02 [<ffffffff8000ca32>] do_path_lookup+0x274/0x2eb 2008-05-08 17:08:02 [<ffffffff800237e4>] __path_lookup_intent_open+0x56/0x96 2008-05-08 17:08:02 [<ffffffff8001aab4>] open_namei+0x73/0x6d9 2008-05-08 17:08:02 [<ffffffff8001761f>] cache_grow+0x362/0x38d 2008-05-08 17:08:02 [<ffffffff80027607>] do_filp_open+0x1c/0x38 2008-05-08 17:08:02 [<ffffffff8009bd69>] autoremove_wake_function+0x0/0x2e 2008-05-08 17:08:02 [<ffffffff80015b07>] get_unused_fd+0xf9/0x107 2008-05-08 17:08:02 [<ffffffff80019869>] do_sys_open+0x44/0xbe 2008-05-08 17:08:02 [<ffffffff8005c116>] system_call+0x7e/0x83 2008-05-08 17:08:02 2008-05-08 17:08:10 BUG: soft lockup detected on CPU#6! 2008-05-08 17:08:10 2008-05-08 17:08:10 Call Trace: 2008-05-08 17:08:10 <IRQ> [<ffffffff800b5e1b>] softlockup_tick+0xd5/0xf0 2008-05-08 17:08:10 [<ffffffff80093910>] update_process_times+0x54/0x7a 2008-05-08 17:08:10 [<ffffffff80075877>] smp_local_timer_interrupt+0x2c/0x61 2008-05-08 17:08:10 [<ffffffff8005d368>] call_softirq+0x1c/0x28 2008-05-08 17:08:10 [<ffffffff80075f3f>] smp_apic_timer_interrupt+0x41/0x47 2008-05-08 17:08:10 [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c 2008-05-08 17:08:10 <EOI> [<ffffffff8002c801>] dummy_inode_permission+0x0/0x3 2008-05-08 17:08:10 [<ffffffff8000770a>] _raw_spin_lock+0x6f/0xeb 2008-05-08 17:08:10 [<ffffffff8002292d>] d_alloc+0x17c/0x1c1 2008-05-08 17:08:10 [<ffffffff800e78ad>] dcache_dir_open+0x0/0x26 2008-05-08 17:08:10 [<ffffffff800e78c1>] dcache_dir_open+0x14/0x26 2008-05-08 17:08:10 [<ffffffff8001e4b6>] __dentry_open+0xd9/0x1e2 2008-05-08 17:08:10 [<ffffffff80027615>] do_filp_open+0x2a/0x38 2008-05-08 17:08:10 [<ffffffff80015b07>] get_unused_fd+0xf9/0x107 2008-05-08 17:08:10 [<ffffffff80019869>] do_sys_open+0x44/0xbe 2008-05-08 17:08:10 [<ffffffff8005c116>] system_call+0x7e/0x83 2008-05-08 17:08:10 This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File uploaded: softlock-dcache.C This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 132459 Still haven't figured out a reproducer but have been working on it. We think that it has something to do with the job controller sending a TERM signal to all the processes pretty much simultaneously. Added a bunch of command line arguments which hopefully will allow us to tune things and cause the problem more easily. softlock-dcache [options] --dirtymem allocate a bunch of memory and then dirty it --tasks <num> how many processes to run --grouptasks make all the tasks part of a proceess group --finddir directory to do the find in --help print this message This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File softlock-dcache.C became obsolete. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File uploaded: job.patch This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 132461 File uploaded: job-fix-resource-leaks.patch This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 132462 File uploaded: job-fix-unchecked.patch This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 132463 Date: Thu, 08 May 2008 18:08:48 -0700 From: Ben Woodard <woodard> User-Agent: Thunderbird 2.0.0.12 (X11/20080306) MIME-Version: 1.0 To: Steve Dickson <SteveD> CC: Kent Baxley <kbaxley>, Jeffrey Layton <jlayton>, Peter Staubach <staubach> Subject: Re: RHEL5 softlockups in dcache References: <20080507163254.260b900e.redhat.com> <482236A6.4020909> <4822E2E6.1030107> In-Reply-To: <4822E2E6.1030107> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Jeff Layton wrote: > Sounds ugly. I don't see any reason right offhand why we'd need to do a > getattr when we're pruning the dcache, but dcache handing is very > obtuse stuff. Steve Dickson wrote: > It appears the reason shrink_dcache_parent() is being called is > due to something failing (i.e. goto out_bad). Do we know what is > failing and why? That failure (or error path) might be that real > cause of this lockup... Mark instrumented nfs_lookup_revalidate() to see how we're getting into shrink_parent_dcache() error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle, &fattr); if (error) { printk("nfs_lookup_revalidate: NFS_PROTO(dir)->lookup(dir): error=%d\\n", error); goto out_bad; } What we get is: 2008-05-08 17:07:52 nfs_lookup_revalidate: NFS_PROTO(dir)->lookup(dir): error=-512 -512 is -ERESTARTSYS We think that this is set in nfs3_rpc_wrapper() as a result of EJUKEBOX. I never ever heard of EJUKEBOX before. #define EJUKEBOX 528 /* Request initiated, but will not complete before timeout */ Jeff Layton wrote: > Exactly what kernel are they running here? I don't see any mention of > it in the IT. We were able to reproduce this on the -92 release of the kernel with one added kernel module. This is a job container module. I attached the three patches that make up this module to the IT. We currently do not think the module is causing the problem, we think that this allows us to expose the underlying race at a higher probability. It seems to take around 4 iterations before we can trigger it with the -92 kernel. We think that given enough iterations we will be able to hit it with the -92 kernel without the module but since the test takes so much time to run, we haven't tried more than a handful of iterations. > The locking there is some of the most fragile stuff in the > kernel. Even very small, subtle changes can have large effects...and > yes, the dcache_lock is very busy. > > It's very good that you have a reproducer. I'd definitely recommend > testing this on the latest and greatest RHEL5 kernel so that we can > rule out existing changes (it seems like Eric Sandeen did some dcache > changes a while back). > Do you guys have any ideas of things we can try? Or any ideas of what the problem might be? -ben This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Info from Steve Dickson: > -512 is -ERESTARTSYS > > We think that this is set in nfs3_rpc_wrapper() as a result of EJUKEBOX. > I never ever heard of EJUKEBOX before. > #define EJUKEBOX 528 /* Request initiated, but will not complete > before timeout */ Using tshark, what is the error that's being returned by the server? Also who is the server? > > Jeff Layton wrote: >> Exactly what kernel are they running here? I don't see any mention of >> it in the IT. > > We were able to reproduce this on the -92 release of the kernel with one > added kernel module. This is a job container module. I attached the > three patches that make up this module to the IT. We currently do not > think the module is causing the problem, we think that this allows us to > expose the underlying race at a higher probability. It seems to take > around 4 iterations before we can trigger it with the -92 kernel. We > think that given enough iterations we will be able to hit it with the > -92 kernel without the module but since the test takes so much time to > run, we haven't tried more than a handful of iterations. > >> The locking there is some of the most fragile stuff in the >> kernel. Even very small, subtle changes can have large effects...and >> yes, the dcache_lock is very busy. >> >> It's very good that you have a reproducer. I'd definitely recommend >> testing this on the latest and greatest RHEL5 kernel so that we can >> rule out existing changes (it seems like Eric Sandeen did some dcache >> changes a while back). >> > > Do you guys have any ideas of things we can try? Or any ideas of what > the problem might be? I would look in some error patch where a lock is not being released... Unfortunately, it could be anywhere including in the rpc code. steved. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Jeff Layton input: > We think that this is set in nfs3_rpc_wrapper() as a result of EJUKEBOX. > I never ever heard of EJUKEBOX before. > #define EJUKEBOX 528 /* Request initiated, but will not complete > before timeout */ > ERESTARTSYS is also often returned when a call is interrupted by a signal. It might be worthwhile to check and see if there's a signal pending when you get this error. > Jeff Layton wrote: > > Exactly what kernel are they running here? I don't see any mention of > > it in the IT. > > We were able to reproduce this on the -92 release of the kernel with one > added kernel module. This is a job container module. I attached the > three patches that make up this module to the IT. We currently do not > think the module is causing the problem, we think that this allows us to > expose the underlying race at a higher probability. It seems to take > around 4 iterations before we can trigger it with the -92 kernel. We > think that given enough iterations we will be able to hit it with the > -92 kernel without the module but since the test takes so much time to > run, we haven't tried more than a handful of iterations. > Ok, good to know. > > The locking there is some of the most fragile stuff in the > > kernel. Even very small, subtle changes can have large effects...and > > yes, the dcache_lock is very busy. > > > > It's very good that you have a reproducer. I'd definitely recommend > > testing this on the latest and greatest RHEL5 kernel so that we can > > rule out existing changes (it seems like Eric Sandeen did some dcache > > changes a while back). > > > > Do you guys have any ideas of things we can try? Or any ideas of what > the problem might be? > Whenever I see an ERESTARTSYS, my immediate thought is "signal". That's probably a more likely scenario than the server returning error, but I could be wrong here. Either way, I agree with Steve. It sounds like there may be a codepath that causes a return from shrink_dcache_parent w/o releasing the lock. If I'm correct about the signals then it may have something to do with calling into there with a signal pending. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Jeff Layton input, cont'd: Jeff Layton <jlayton> wrote: > > Whenever I see an ERESTARTSYS, my immediate thought is "signal". That's > probably a more likely scenario than the server returning error, but I > could be wrong here. > Actually, I'm not wrong here. If you look closer at nfs3_rpc_wrapper(), you'll see that it only returns -ERESTARTSYS if the task is signaled. Also, rpc_call_sync and its lower functions can return -ERESTARTSYS for the same reason. I'd definitely look closely at that. One possibility might be to experiment with a fault-injection patch. Change nfs_lookup_revalidate so that it can predictably call shrink_dcache_parent with a signal pending and see if that makes the problem happen. It may cause the whole thing to crash, but it might be worth experimenting with :-) > Either way, I agree with Steve. It sounds like there may be a codepath > that causes a return from shrink_dcache_parent w/o releasing the lock. > If I'm correct about the signals then it may have something to do with > calling into there with a signal pending. > I'd also add some instrumentation to check the dcache_lock after return from shrink_dcache_parent in this function. If it's locked, see if the owner is the current task. Good luck! -- Jeff Layton <jlayton> This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File uploaded: softlock-dcache.C This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 132865 I finally got something that causes a softlockup but it seems to be a different softlockup possibly related to utrace: strace -o foo -ff ./softlock-dcache --grouptasks --dirtymem On our own modified kernel the softlock looks like this: 2008-05-12 19:03:06 BUG: soft lockup detected on CPU#2! 2008-05-12 19:03:06 2008-05-12 19:03:06 Call Trace: 2008-05-12 19:03:06 <IRQ> [<ffffffff800b5ddb>] softlockup_tick+0xd5/0xe7 2008-05-12 19:03:06 [<ffffffff80093aec>] update_process_times+0x54/0x7a 2008-05-12 19:03:06 [<ffffffff800758b6>] smp_local_timer_interrupt+0x2c/0x61 2008-05-12 19:03:06 [<ffffffff8005d368>] call_softirq+0x1c/0x28 2008-05-12 19:03:06 [<ffffffff80075f7e>] smp_apic_timer_interrupt+0x41/0x47 2008-05-12 19:03:06 [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c 2008-05-12 19:03:06 <EOI> [<ffffffff880942de>] :nfs:nfs_destroy_inode+0x0/0x13 2008-05-12 19:03:06 [<ffffffff800c3481>] shrink_inactive_list+0x75e/0x7e7 2008-05-12 19:03:06 [<ffffffff800c3427>] shrink_inactive_list+0x704/0x7e7 2008-05-12 19:03:06 [<ffffffff8009bc9b>] keventd_create_kthread+0x0/0x61 2008-05-12 19:03:06 [<ffffffff880942de>] :nfs:nfs_destroy_inode+0x0/0x13 2008-05-12 19:03:06 [<ffffffff880942de>] :nfs:nfs_destroy_inode+0x0/0x13 2008-05-12 19:03:06 [<ffffffff80012b15>] shrink_zone+0xf6/0x11c 2008-05-12 19:03:06 [<ffffffff8009be5e>] autoremove_wake_function+0x0/0x2e 2008-05-12 19:03:06 [<ffffffff800c3c05>] try_to_free_pages+0x197/0x2b9 2008-05-12 19:03:06 [<ffffffff8000f135>] __alloc_pages+0x1cb/0x2ce 2008-05-12 19:03:06 [<ffffffff80008b95>] __handle_mm_fault+0x59b/0xdf4 2008-05-12 19:03:06 [<ffffffff80065a96>] do_page_fault+0x4b8/0x81d 2008-05-12 19:03:06 [<ffffffff80061f42>] thread_return+0x0/0xeb 2008-05-12 19:03:06 [<ffffffff800ba742>] utrace_quiescent+0x20f/0x256 2008-05-12 19:03:06 [<ffffffff8005ce55>] error_exit+0x0/0x84 2008-05-12 19:03:06 2008-05-12 19:03:07 BUG: soft lockup detected on CPU#3! 2008-05-12 19:03:07 2008-05-12 19:03:07 Call Trace: 2008-05-12 19:03:07 <IRQ> [<ffffffff800b5ddb>] softlockup_tick+0xd5/0xe7 2008-05-12 19:03:07 [<ffffffff800909dc>] tasklet_action+0x62/0xac 2008-05-12 19:03:07 [<ffffffff80093aec>] update_process_times+0x54/0x7a 2008-05-12 19:03:07 [<ffffffff800758b6>] smp_local_timer_interrupt+0x2c/0x61 2008-05-12 19:03:07 [<ffffffff8005d368>] call_softirq+0x1c/0x28 2008-05-12 19:03:07 [<ffffffff80075f7e>] smp_apic_timer_interrupt+0x41/0x47 2008-05-12 19:03:07 [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c 2008-05-12 19:03:07 <EOI> [<ffffffff800c299c>] shrink_active_list+0x1b0/0x426 2008-05-12 19:03:07 [<ffffffff800c2911>] shrink_active_list+0x125/0x426 2008-05-12 19:03:07 [<ffffffff80012af7>] shrink_zone+0xd8/0x11c 2008-05-12 19:03:07 [<ffffffff8009be5e>] autoremove_wake_function+0x0/0x2e 2008-05-12 19:03:07 [<ffffffff800c3c05>] try_to_free_pages+0x197/0x2b9 2008-05-12 19:03:07 [<ffffffff8000f135>] __alloc_pages+0x1cb/0x2ce 2008-05-12 19:03:07 [<ffffffff80008b95>] __handle_mm_fault+0x59b/0xdf4 2008-05-12 19:03:07 [<ffffffff80065a96>] do_page_fault+0x4b8/0x81d 2008-05-12 19:03:07 [<ffffffff80061f42>] thread_return+0x0/0xeb 2008-05-12 19:03:07 [<ffffffff800ba742>] utrace_quiescent+0x20f/0x256 2008-05-12 19:03:07 [<ffffffff8005ce55>] error_exit+0x0/0x84 2008-05-12 19:03:07 This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Gave Roland a heads up re the softlockup that I did manage to trigger this is how he responded: That is just a false backtrace from the kernel's lossy method and the address having been there on that kernel stack before. The meaningful part of those traces is the do_page_fault path. There is no utrace-related call in the real call chain for those threads. Thanks, Roland This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File softlock-dcache.C became obsolete. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File softlock-dcache.C became obsolete. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File uploaded: softlock-dcache.C This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 133546 Another version of softlock-dcache still no dice reproducing the problem. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Mark can you please take a look at my current version of the reproducer and see if you can see anything that I missed? Other than an account on chimp, what are the problems preventing me from playing with the reproducer that you have right now? Maybe I'll observe something that you didn't see. Can the code run on a machine other than chimp? This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 >josef< Wanted to ask you about: http://kerneltrap.org/mailarchive/linux-fsdevel/2008/4/4/1351064 >josef< it looks like you did that for RHEL3 -- what about RHEL5? >josef< I think it might solve the problem that we are seeing here at LLNL. >josef< can you look at https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=178773 and tell me what you think? <josef> rhel5 was completely different <josef> rhel3 i just did that to get it fixed <josef> its the same problem <josef> my preference would be to see where dchinners work went <neb> just to verify you mean, 178773 is the same problem as is in that lkml thread. <josef> i beleive so <neb> do you have a ref to dchinners work that I can look at? <josef> http://lwn.net/Articles/185465/ <neb> NM found it next message <josef> the cond_resched_lock() in the for loop in the shrink_dcach stuff would fix the softlockups <josef> and thats upstream <neb> how could cond_resched_lock() interact with the latency of NFS dentry revalidation? <neb> BTW we don't mount/unmount <josef> well if you are in the reclaim code via the nfs dentry revalidation doing the cond_resched is going to make things slower since it pops it off of hte processor to let something else run <neb> We were sort of working on the theory that maybe the BKL taken in the lookup path might be hamstringing forward progress. <josef> it will block of the process is holding it yes <josef> but if you resched the bkl is dropped <josef> so if the waiters are also waiting on the bkl they'll be fine <neb> one challenging thing is that all the cores are running the same code and they are all entering the lookup patch at practically the same time and all trying to shrink the dcache. <josef> well then they'll be rescheduling and other stuff will be allowed to work and eventually the dcach will be shrunk enough to continue to work <neb> we would hope so but the machine goes catatonic but then we get a soft lockup. <neb> Machine doesn't appear to recover. <josef> right thats what the cond_resched would fix <neb> This one? http://lxr.devel.redhat.com/RHEL5+2.6.18-53.1.14.el5/fs/dcache.c#L409 <josef> hm yeah, is that the kernel they are on? <neb> yeah <josef> well shit <neb> double shit <neb> ;-) <neb> That is sort of why we were sort of trying to come up with a way that it might be the BKL that NFS takes. <josef> may want to run this by jlayton, as he knows the NFS side much better than i do <neb> In all honesty, I don't think that we had it clear enough in our minds to articulate it until just now but I guess we were sort of thinking, that even though cond_resched_lock drops the dcache_lock it isn't dropping the BKL that NFS needs to progress. <josef> well when the process is scheduled the BKL is dropped automatically <neb> fair enough. It was just a hypothesis that we were starting to consider. <josef> well now that i think about it its all done on a counter <josef> so if a bunch of processes are doing stuff with it then it may starve other stuff <neb> In this case 8core machine all CPUs doing exactly the same thing at practically the same time. <josef> that is awesome <josef> i'd like to take some time and rework the dcache reclamation stuff <neb> HPC codes do that kind of thing all the time. <neb> Here's the actual scenario - HPC job computing using all 8 cores. runs out of time allocation, job scheduler sends the session a killpg(SIGTERM) and so all processes do their final checkpoint before they die. <neb> Can I have the TAM take this issue and send it up to you and jlayton? <josef> yeah <neb> I've been trying to come up with a portable reproducer for 3 weeks but have so far failed. <neb> thanks. <josef> np, make sure to hit jlayton, since i'm going on a week vacation starting tomorrow :) This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 http://lkml.org/lkml/2008/5/21/580 seems to be the final disposition of the chinner's patch. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 posting a link to Ben's email to rhkl: http://post-office.corp.redhat.com/archives/rhkernel-list/2008-June/msg00009.html This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 From: David Mair <dmair> To: Kent Baxley <kbaxley> Subject: Re: IT 178773 Date: Tue, 3 Jun 2008 14:58:12 -0400 Reply-To: David Mair <dmair> User-Agent: KMail/1.9.9 Organization: Red Hat Ack. If you already have Layton and Josef involved then (as if I need to say this) place that in the ticket. I have no problem with that at all. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Date problem reported: 5-6-08 Problem: 1) RHEL5.1, x86_64 with all Compute Nodes being diskless. Everything is done over NFS. 2) The nodes, over the last few months, have been running into a softlockup condition inside of dcache. When the systems get in this state, they're basically catatonic. The lengthy explanation can be found here: http://post-office.corp.redhat.com/archives/rhkernel-list/2008-June/msg00009.html 3) The problem was reproduced on stock 2.6.18-53.1.14 kernels as well as 2.6.18-92 based ones. 4) Ben has not been able to come up with a "portable" (i.e. something that we can use here) reproducer. The software that manifests the problem is a huge program that can't be shipped off-site. What we've got here is basically an 8-core machine with all CPUs doing exactly the same thing at practically the same time. 5) Ben and I have been touching base with Jeff Layton and Josef on this, and, they appear to be ok with us sending it up. What is needed from SEG: As stated, some direct conversations have taken place with engineering, and I got the OK from Dave Mair to send this on up to engineering. Thanks. Issue escalated to Support Engineering Group by: kbaxley. Category set to: Kernel Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 File uploaded: split-lru-old.patch This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 136542 File uploaded: split-lru-rhel5.patch This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 it_file 136543 I tried to take Kentaro Makita's patch from: http://lkml.org/lkml/2008/5/21/580 and backport it to RHEL5 but I failed because too much of the context had changed. I felt that I'd likely screw something up trying to hand apply every part of the patch and I was afraid that I might introduce an additional bug. So as a starting point, I grabbed David Chinner's patch from http://lkml.org/lkml/2006/5/25/318 and had a lot more success applying it. It wasn't that difficult to sort out the few conflicts where the patch failed to apply. What I started out with is the patch called split-lru-old.patch and what I ended up with is split-lru-rhel5.patch, both attached. The kernel based upon these booted just fine but unfortunately, the softlockup in the dcache still reproduced. This event sent from IssueTracker by streeter [SEG - Kernel] issue 178773 Created attachment 308995 [details]
backport of dcache sb patch.
well this compiles, let me know if this works, this is the patch that is
upstream. If it doesn't work I'd like to see the softlockup messages.
Looks like we are going to need something more than that. 16:19 <grondo> Ok, hit dcache lockup right away with the new patch What do we try next? This event sent from IssueTracker by woodard issue 178773 gonna need to see the softlockup messages. Updating PM score. closing due to inactivity. *** This bug has been marked as a duplicate of bug 526612 *** |