113215 – iprune_sem

Bug 113215 - iprune_sem

Summary: iprune_sem

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jim Paradis
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-01-09 20:08 UTC by Wendy Cheng
Modified:	2013-08-06 01:03 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-11-18 16:03:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
iprune semaphore patch (1.09 KB, patch) 2004-01-14 16:10 UTC, Wendy Cheng	no flags	Details \| Diff
View All

Description Wendy Cheng 2004-01-09 20:08:33 UTC

Description of problem:
This is a request to backport the "iprune_sem" from 2.5.62/2.6.0
into AS2.1 as reported in IT#30898. Without the semaphore, the
system could crash as:

Oops: 0000
Kernel 2.4.9-e.12enterprise
CPU:    0
EIP:    0010:[<c015caea>]    Tainted: P
EFLAGS: 00010286
EIP is at clear_inode [kernel] 0x11a
eax: ba860e90   ebx: 00000000   ecx: e5d98e48   edx: d21edf80
esi: e5d98e40   edi: f094f3c8   ebp: 00000001   esp: d21edf38
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 18, stackpage=d21ed000)
Stack: f094f3c8 c015bdba d2106564 e7e6fe40 e5d98e40 d21edf80 c015cb85
e5d98e40
      00000000 00000000 c015d007 d21edf80 000002e0 000005bf 00000001
00000003
      000002dd e641c588 e4ffee48 e39dee48 d21ec000 00000008 0008e000
c01385b3
Call Trace: [<c015bdba>] destroy_inode [kernel] 0x2a
[<c015cb85>] dispose_list [kernel] 0x45
[<c015d007>] prune_icache [kernel] 0x2d7
[<c01385b3>] __kmem_cache_shrink_locked [kernel] 0x53
[<c015d061>] shrink_icache_memory [kernel] 0x21
[<c013d76c>] kswapd [kernel] 0x13c
[<c013d630>] kswapd [kernel] 0x0
[<c0105000>] stext [kernel] 0x0
[<c02f0018>] __kallsyms [kernel] 0x72e00
[<c0105000>] stext [kernel] 0x0
[<c0105836>] kernel_thread [kernel] 0x26
[<c013d630>] kswapd [kernel] 0x0
Code: 8b 40 30 85 c0 74 04 56 ff d0 58 8b 86 fc 00 00 00 85 c0 74
<0>Kernel panic: not continuing
<0>Rebooting in 120 seconds..{{


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
1. Patch will follow shortly.
2. Don Howard from sustainig engineering is handling this issue.

Comment 1 Arjan van de Ven 2004-01-09 20:12:22 UTC

does this happen on untainted kernels too ????

Comment 2 Wendy Cheng 2004-01-09 20:22:04 UTC

It'll but we're not able to reliably recreate it.

Comment 3 Don Howard 2004-01-10 00:04:36 UTC

Wendy, can you point me to a case where this happens on an untainted 
kernel? 
 
I've looked at IT 26999, 31051, 30898 - They are all tainted. 
 
Veritas has a known issue with a similar footprint to 31051 and 
30898 where vx inodes end up being free()d via kernel inode reap 
code.  Veritas has pushed a patch upstream that corrects this.  It's 
included in e.31. 
 
I've been trying to wade through 26999 today to see if veritas is 
the cause there.  Blocker bug clerical chores are getting in the way 
though.

Comment 4 Wendy Cheng 2004-01-14 16:10:21 UTC

Created attachment 96970 [details]
iprune semaphore patch

Comment 5 Wendy Cheng 2004-01-14 16:13:53 UTC

The problem is in the base kernel - tainted kernel is not relevant. If
this is accepted, we may want to do a parallel semaphore for dcache list.

Comment 6 Arjan van de Ven 2004-01-14 16:22:04 UTC

ehm the vm won't destroy inodes with a usage count... and the other
code is guaranteed to have usage count set afaics

Comment 7 Arjan van de Ven 2004-01-14 18:36:39 UTC

also what is the point of the semaphore if the spinlock protects
exactly the same....

Comment 8 Wendy Cheng 2004-01-14 18:51:43 UTC

I thought about this (why not just extending the inode_lock's
coverage) but concluded that a new semaphore was probably added for
performance reason. Note that the inode_lock is widely used so you
want to release it as soon as possible while the new sempahore is just
to prevent the inode code and cache releasing code step on each
other's toes.

Comment 9 Arjan van de Ven 2004-01-14 18:53:10 UTC

but.. if you look at your patch you take the semaphore right before
you take the spinlock, and drop it right after you drop the spinlock...
eg whats the point ??

Comment 10 Wendy Cheng 2004-01-14 19:19:38 UTC

First, it is not "MY" patch but I redo it on e.34 for testing. And
second, read the code carefully, there is a "dispose_list()" between
the unlock and the up.

        spin_unlock(&inode_lock);

        dispose_list(&throw_away);
+       up(&iprune_sem);

Comment 11 Arjan van de Ven 2004-01-14 19:27:49 UTC

yes there is; a list of inodes nothing can get to anymore since it's a
local (stack variable) list of inodes....
Finding inodes means searching the real list, something you need the
inode lock for.

Comment 12 Wendy Cheng 2004-01-14 20:04:42 UTC

Ok, my dear gate keeper, this bugzilla is tracked by issue tracker.
Our ping-pong discussions would make the issue record extremely long
and hard to read. Let's move to email exchange. After we agree on
something, we'll update it here. Make sense ?

Comment 14 Wendy Cheng 2004-01-15 06:52:15 UTC

We're going to hold the discussion using issue tracker #30898.

Comment 15 Frank Hirtz 2004-01-16 21:25:09 UTC

 lsmod output from one of the affected hosts:

Module                  Size  Used by    Tainted: P  
ide-cd                 35328   0 (autoclean)
cdrom                  35520   0 (autoclean) [ide-cd]
sg                     35076   0 (autoclean)
gab                   238400   3
iscsi                  40704   0
nfs                    92192   3 (autoclean)
lockd                  61184   1 (autoclean) [nfs]
sunrpc                 86128   1 (autoclean) [nfs lockd]
llt                   111296   8
autofs                 13796   0 (autoclean) (unused)
openafs               560880   2
e100_2124k2            66968   1
tg3_12e3               49376   2
appletalk              29708   0 (autoclean)
ipx                    25492   0 (autoclean)
vxio                  657056   3 (autoclean)
vxspec                  4872   2
vxdmp                 121304   4
lpfcdd                298216   4
megaraid               28288   6
aic7xxx               127200   0
cpqarray               23936   0 (unused)
sd_mod                 13888  10
scsi_mod              125980   6 [sg iscsi lpfcdd megaraid aic7xxx sd_mod]
ext3                   74240   4
jbd                    55176   4 [ext3]

Comment 16 Wendy Cheng 2004-01-17 05:29:33 UTC

Havn't been able to find any hole within the dispose_list() that would
cause this oops. The merits for this semaphore we identify so far is
the protection of nodes_stat.nr_inodes which is for statistics report.
The customer will be told that this patch doesn't pass code reivew and
we no longer consider this is a high priority item unless it can be
recreated and/or a full dump is provided.

Comment 17 Jim Paradis 2004-11-18 16:03:30 UTC

Cleaning up old RHEL21 BZs.  Closing this one as WONTFIX.

Note You need to log in before you can comment on or make changes to this bug.