Bug 168453

Summary: Corruption of file->f_ep_lock
Product: [Fedora] Fedora Reporter: David Woodhouse <dwmw2>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED UPSTREAM QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: powerpc   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-04 23:37:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Woodhouse 2005-09-16 07:33:30 UTC
Description of problem:
One CPU of a dual G4 deadlocks, printing the following:
_spin_lock(c8cbf250) CPU#1 NIP c02bb740 holder: cpu 2305 pc 00000000 (lock 24000484)


This happens when spinlock debugging is enabled and it thinks it's spent too
long waiting for a given lock (see arch/ppc/lib/locks.c). Adding a WARN_ON and
making it break the spinlock gives the following:

Badness in _raw_spin_lock at arch/ppc/lib/locks.c:58
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00155a8] _raw_spin_lock+0x98/0xe0
 [c02bb740] _spin_lock+0x10/0x20
 [c00b5bec] sys_epoll_ctl+0x45c/0x610
 [c0004980] ret_from_syscall+0x0/0x44

It may not be the epoll code that's at fault; this could have been corrupted
already, and only detected by sys_epoll_ctl. Currently booting with a sanity
check on file->f_ep_lock in the fput() function to check....

This is 2.6.12-1.1423_FC4smp

Comment 1 David Woodhouse 2005-09-16 08:20:51 UTC
Just one warning during boot:

File cf93a8a0 (fops d107c980) has corrupted f_epoll_lock!
Pid 1509, comm S30nscd
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44

d107c980 is in the ext3 module -- it's a regular file. Rebuilding with slab
debugging...

Comment 2 David Woodhouse 2005-09-16 09:04:59 UTC
More warnings, with files in different memory locations.
File cf93a8a0 (fops d107c980) has corrupted f_epoll_lock!
Pid 1509, comm S30nscd
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
NET: Registered protocol family 10
Disabled Privacy Extensions on device c0331364(lo)
IPv6 over IPv4 tunneling driver
eth0: Link down
eth0: Link is up at 100 Mbps, full-duplex.
eth0: Pause is enabled (rxfifo: 10240 off: 7168 on: 5632)
eth0: Link down
eth0: Link is up at 100 Mbps, full-duplex.
eth0: Pause is enabled (rxfifo: 10240 off: 7168 on: 5632)
File cc4c9800 (fops c031ea58) has corrupted f_epoll_lock!
Pid 4988, comm cc1
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File c4fd0be0 (fops d107c980) has corrupted f_epoll_lock!
Pid 8028, comm cc1
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File c4fd0280 (fops d107c980) has corrupted f_epoll_lock!
Pid 8350, comm cc1
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File c4fd06e0 (fops d107c980) has corrupted f_epoll_lock!
Pid 9560, comm sh
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File c4fd0f00 (fops d107c980) has corrupted f_epoll_lock!
Pid 9753, comm sh
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File caa6b0c0 (fops d107c980) has corrupted f_epoll_lock!
Pid 13145, comm modpost
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File c5f968e0 (fops d107c980) has corrupted f_epoll_lock!
Pid 14637, comm fixdep
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File cabed940 (fops d107c980) has corrupted f_epoll_lock!
Pid 15409, comm gcc
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File cc4c9260 (fops d107c980) has corrupted f_epoll_lock!
Pid 16456, comm sh
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File c6b1b260 (fops d107c980) has corrupted f_epoll_lock!
Pid 16933, comm fixdep
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File caff3e00 (fops d107c980) has corrupted f_epoll_lock!
Pid 19524, comm rm
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c007d110] filp_close+0x90/0xf0
 [c00ba8a8] load_elf_binary+0x988/0x1610
 [c0090fec] search_binary_handler+0xdc/0x330
 [c0091484] do_execve+0x244/0x280
 [c0008594] sys_execve+0x64/0xd0
 [c0004980] ret_from_syscall+0x0/0x44
File cafab640 (fops d107c980) has corrupted f_epoll_lock!
Pid 22140, comm fixdep
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44
File ce0f4ae0 (fops c031ea58) has corrupted f_epoll_lock!
Pid 22422, comm mv
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c007d110] filp_close+0x90/0xf0
 [c0004980] ret_from_syscall+0x0/0x44
File c9f1dd60 (fops d107c980) has corrupted f_epoll_lock!
Pid 23243, comm fixdep
Badness in fput at fs/file_table.c:116
Call trace:
 [c00059b8] check_bug_trap+0xa8/0x120
 [c0005c94] ProgramCheckException+0x264/0x4e0
 [c00050a8] ret_from_except_full+0x0/0x4c
 [c00810d0] fput+0xc0/0xf0
 [c008dea0] vfs_fstat+0x40/0x60
 [c008e35c] sys_fstat64+0x1c/0x50
 [c0004980] ret_from_syscall+0x0/0x44


Comment 3 David Woodhouse 2005-09-16 18:06:53 UTC
It's only f_ep_lock which is getting scribbled on, and it's very repeatable. The
 owner_cpu field is almost always set to 0x901, and occasionally 0x501. 
f_ep_links is always fine (usually an empty list), and f_mapping is also fine. 

Slab debugging doesn't show anything interesting, but does make the problem seem
to occur slightly less frequently. Here's samples of what we find in f_ep_lock. 

lock 20282484, owner_pc 0, owner_cpu 901
lock 24042884, owner_pc 0, owner_cpu 901
lock 28022484, owner_pc 0, owner_cpu 901
lock 24042084, owner_pc 0, owner_cpu 901
lock 20000484, owner_pc 0, owner_cpu 901
lock 28000484, owner_pc 0, owner_cpu 901
lock 24042084, owner_pc 0, owner_cpu 901
lock 24002484, owner_pc 0, owner_cpu 901
lock 28000484, owner_pc 0, owner_cpu 901
lock 24044284, owner_pc 0, owner_cpu 901
lock 24042284, owner_pc 0, owner_cpu 901
lock 22000424, owner_pc 0, owner_cpu 901
lock 28022484, owner_pc 0, owner_cpu 901
lock 28022484, owner_pc 0, owner_cpu 901
lock 28000484, owner_pc 0, owner_cpu 901
lock 24002484, owner_pc 0, owner_cpu 901
lock 24000424, owner_pc 0, owner_cpu 901
lock 28000484, owner_pc 0, owner_cpu 901
lock 24022484, owner_pc 0, owner_cpu 901
lock 20000484, owner_pc 0, owner_cpu 901
lock 24022484, owner_pc 0, owner_cpu 901
lock 28000484, owner_pc 0, owner_cpu 901
lock 24022484, owner_pc 0, owner_cpu 901
lock 28000484, owner_pc 0, owner_cpu 901
lock 28000484, owner_pc 0, owner_cpu 901
lock 24022484, owner_pc 0, owner_cpu 901
lock 28022284, owner_pc 0, owner_cpu 901
lock 28022484, owner_pc 0, owner_cpu 901
lock 24000484, owner_pc 0, owner_cpu 901
lock 28022484, owner_pc 0, owner_cpu 901
lock 28000484, owner_pc 0, owner_cpu 901
lock 24022484, owner_pc 0, owner_cpu 901
lock 22042484, owner_pc 0, owner_cpu 901
lock 28202484, owner_pc 0, owner_cpu 501


Comment 4 Dave Jones 2005-09-28 09:49:41 UTC
ISTR you talking about this on irc a week or so ago. did you arrive at any
conclusion ?


Comment 5 David Woodhouse 2005-09-28 10:02:06 UTC
Not yet. Paulus pointed out that it looks a lot like an exception frame, but we
haven't come up with any suggestions about why only three words of it would be
turning up at a consistent offset within various instances of 'struct file'.

I haven't yet tried the rawhide kernel. There's been a lot of change there. 

Comment 6 Dave Jones 2005-09-30 06:23:20 UTC
Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.


Comment 7 David Woodhouse 2005-10-19 12:52:45 UTC
It still happens with 2.6.13-1.1526_FC4, although it's a lot harder to reproduce
and diagnose because of other changes.

Oct 19 12:33:22 peach kernel: _spin_lock(c03d1474) CPU#0 NIP c02d0a80 holder:
cpu 0 pc C02D0A80



Comment 8 Dave Jones 2005-11-10 19:22:55 UTC
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.


Comment 9 Dave Jones 2005-12-13 02:38:43 UTC
If this is still happening, it's probably something thats worth reporting in
upstream bugzilla for tracking.


Comment 10 Dave Jones 2006-02-03 05:28:00 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 11 Dave Jones 2006-03-04 23:37:56 UTC
if this is still happening, file it upstream, where it'll get more attention.