Bug 229831 - [GFS2] gfs2 panic's occasionally in gfs2_unlink while running bonnie++
[GFS2] gfs2 panic's occasionally in gfs2_unlink while running bonnie++
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
All Linux
high Severity medium
: ---
: ---
Assigned To: Don Zickus
GFS Bugs
:
: 224015 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-23 13:06 EST by Josef Bacik
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 14:41:31 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test patch (441 bytes, patch)
2007-02-28 03:55 EST, Steve Whitehouse
no flags Details | Diff
Patch to fix incorrect lookup (423 bytes, patch)
2007-02-28 11:18 EST, Steve Whitehouse
no flags Details | Diff

  None (edit)
Description Josef Bacik 2007-02-23 13:06:26 EST
If you run the bonnie++ stress tester on the latest gfs2 sources, sometimes 
you will get a null pointer dereference in gfs2_holder_init.  This is the 
backtrace from the panic

BUG: unable to handle kernel NULL pointer dereference at virtual address 
00000018
 printing eip:
d0a5a75b
*pde = 00000000
Oops: 0002 [#1]
SMP 
Modules linked in: sg qla2xxx scsi_transport_fc autofs4 hidp rfcomm l2cap 
bluetooth lock_dlm gfs2 dlm configfs sunrpc iptable_filter ip_tables 
ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 video sbs 
i2c_ec button battery asus_acpi backlight ac parport_pc lp parport floppy 
pcspkr i2c_piix4 i2c_core tg3 cfi_probe gen_probe scb2_flash mtdcore chipreg 
ide_cd serio_raw cdrom dm_snapshot dm_zero dm_mirror dm_mod sd_mod scsi_mod 
ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<d0a5a75b>]    Not tainted VLI
EFLAGS: 00010246   (2.6.20 #2)
EIP is at gfs2_holder_init+0x34/0x3a [gfs2]
eax: 00000000   ebx: ca44eec0   ecx: 00000000   edx: 00000001
esi: cb2ea560   edi: cec920ac   ebp: cd55322c   esp: ca44ee88
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process bonnie++ (pid: 3012, ti=ca44e000 task=cb2ea560 task.ti=ca44e000)
Stack: ca44ee9c ce5e4324 d0a66b50 ca44eec0 cd83f000 ca44ee9c ca44ee9c c9f6c93c 
       cb2ea560 00000001 00000000 00000000 00000000 d0a66b37 ca44eec0 ca44eec0 
       00000000 cb2ea560 00000001 00000000 00000000 00000000 d0a66b50 ca44eee0 
Call Trace:
 [<d0a66b50>] gfs2_unlink+0x66/0x130 [gfs2]
 [<d0a66b37>] gfs2_unlink+0x4d/0x130 [gfs2]
 [<d0a66b50>] gfs2_unlink+0x66/0x130 [gfs2]
 [<d0a6bd87>] gfs2_rindex_hold+0x31/0x44e [gfs2]
 [<c0473ddb>] vfs_unlink+0x9f/0xd5
 [<c047579b>] do_unlinkat+0x8b/0x119
 [<c040794d>] do_syscall_trace+0x124/0x16b
 [<c0404e4c>] syscall_call+0x7/0xb
 =======================
Code: 5b 04 89 43 08 8b 74 24 08 89 73 20 64 8b 35 08 00 00 00 89 53 10 89 73 
0c 89 4b 14 c7 43 18 00 00 00 00 c7 43 1c 00 00 00 00 90 <ff> 40 18 5b 5e c3 
56 89 c6 53 83 ec 04 8b 98 80 00 00 00 8d 83 
EIP: [<d0a5a75b>] gfs2_holder_init+0x34/0x3a [gfs2] SS:ESP 0068:ca44ee88

I have a vmcore but crash doesn't like looking at it for some reason, so I'm 
rebuilding my kernel with -g so I can take a look at it.

To reproduce just run bonnie++ on the filesystem.  You will need to have the 
patch from bz 229080, otherwise bonnie won't get past one of its tests to get 
to the deleting test.
Comment 1 Josef Bacik 2007-02-23 16:13:32 EST
Ok we are panicing here

Feb 23 15:35:12 rh5cluster2 kernel: EIP: [<d0a4375b>] 
gfs2_holder_init+0x34/0x3a [gfs2] SS:ESP 0068:cefede88

0x00008753 <gfs2_holder_init+44>:       movl   $0x0,0x1c(%ebx)
0x0000875a <gfs2_holder_init+51>:       lock incl 0x18(%eax)
0x0000875e <gfs2_holder_init+55>:       pop    %ebx
0x0000875f <gfs2_holder_init+56>:       pop    %esi

void gfs2_glock_hold(struct gfs2_glock *gl) 
{   
        atomic_inc(&gl->gl_ref);
}   

and %eax is

eax: 00000000

so for some reason the gl is null when we get here.  I will look into why this 
is happening.  I may have problems tracking this down since crash isn't 
reading the core and I'm kind of hobbling around with gdb and such.
Comment 2 Josef Bacik 2007-02-23 16:17:57 EST
hmm now that I'm thinking about this, i bet what is happening is that the 
glock is going away for the inode because there is such a long delay between 
the period where we create the file, and then subsequently unlink it, which is 
why we are OOPsing.  I'm going to write a test case to test this and see if 
i'm right.
Comment 3 Josef Bacik 2007-02-23 16:51:21 EST
well i dont really need a test case, just create _alot_ of files in a 
directory, and then rm -rf the directory, it will panic.  Again I'm going to 
assume this is because we are removing glocks from the inodes in our 
normal "hey thats alot of glocks hanging around, lets lose some of them" 
action, and then they never get recreated when we go to do the unlink.  I'm 
going to look and see what ocfs2 does.
Comment 4 Josef Bacik 2007-02-23 17:19:10 EST
ok we are panicing when we try to grab the dentry->d_inode's i_gl.  So I'm 
going to track down what is happening and see if I can fix it.
Comment 5 Josef Bacik 2007-02-23 18:47:34 EST
hmm so i'm wrong, because if we freed up the glock, the inode's i_gl would be
some invalid address, not null, and the only time i_gl is set to null is during
creation and clearing.  Usually we clear an inode by calling clear_inode(),
which sents the inodes i_flags to I_CLEAR, but i_flags was 0, so either we are
calling gfs2_clear_inode and then somewhere along the way i_flags is getting
reset to 0, or this inode was greated without a glock.  I will look into this
weekend, or more likely monday.
Comment 6 Steve Whitehouse 2007-02-26 09:03:29 EST
Well its certainly odd. It looks like something which I was trying to track down
a little while ago. I did a number of tests to try and work out exactly how we
could be coming across the NULL glocks when the inode holds a ref count on its
own glock for the whole lifetime of the inode.

There was one thing however which I did look into but which never resulted in
any final conclusion. I wondered whether the problem wasn't that the inode
itself was wrongly set up, but that we were looking up the wrong inode in the
first place. Each "real" inode actually results in two incore inode structures.
One is the one which the VFS uses, and which contains the data within its
address space. The other exists only to contain the metadata address space and
thus will have a NULL pointer for its i_gl etc pointers.

This extra inode is a waste of space and needs to be removed at some stage in
the future. This will probably involve adding a struct address_space to a struct
glock.

In the mean time, I wonder whether its possible for the lookup function to
return the wrong inode. I've had debugging code looking for that before but
never found that it was actually possible. Its certainly worth in this case
looking to see if thats what is happening, since that can be fixed in the lookup
function fairly easily.
Comment 7 Josef Bacik 2007-02-27 15:46:26 EST
ok looks like a problem in the lookup code, I put some debug stuff in and
trapped it,

Hmm, this is going to be disasterous: c4f8702c
 [<d0a9d6d4>] gfs2_inode_lookup+0x1b4/0x1f1 [gfs2]
 [<d0a966e8>] gfs2_dir_search+0x62/0x6a [gfs2]
 [<d0a9d849>] gfs2_lookupi+0x138/0x166 [gfs2]
 [<d0a9d7db>] gfs2_lookupi+0xca/0x166 [gfs2]
 [<d0aa7737>] gfs2_lookup+0x1a/0x47 [gfs2]
 [<c0473fac>] __lookup_hash+0xb1/0xe1
 [<c047576b>] do_unlinkat+0x5b/0x119
 [<c040794d>] do_syscall_trace+0x124/0x16b
 [<c0404e4c>] syscall_call+0x7/0xb
 =======================
dentry inode gl is null, panicing soon
inode state is 0
josef flag is 1
inode is c4f8702c
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018

so the "hmm this is going to be disasterous" gets printed if we come into lookup
and inode->i_state doesnt have I_NEW and ip->i_gl is null.  Then in gfs2_unlink
i check to see who's i_gl is null and print some stuff out, so looks like we
sometimes screw the lookup up, I will look into this.
Comment 8 Josef Bacik 2007-02-27 15:48:42 EST
also, i got this to happen before, but is screwed up my reproducer so i didnt rm
-rf fast enough to cause the panic, it looks like it may be a problem with
gfs2_dir_search.  I'm going to trap one of the lookups that does the correct
thing and see how it gets there

Hmm, this is going to be disasterous: c6af802c
 [<d0a9d6d4>] gfs2_inode_lookup+0x1b4/0x1f1 [gfs2]
 [<d0a966e8>] gfs2_dir_search+0x62/0x6a [gfs2]
 [<d0a9d849>] gfs2_lookupi+0x138/0x166 [gfs2]
 [<d0a9d7db>] gfs2_lookupi+0xca/0x166 [gfs2]
 [<d0aa7727>] gfs2_lookup+0x1a/0x47 [gfs2]
 [<c0472f5a>] do_lookup+0xa3/0x140
 [<c0474c7a>] __link_path_walk+0x7d7/0xc2c
 [<c04bf418>] inode_has_perm+0x66/0x6e
 [<d0a9b063>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<c0475113>] link_path_walk+0x44/0xb3
 [<c047540b>] do_path_lookup+0x172/0x1c2
 [<c04742fd>] getname+0x59/0xad
 [<c0475bca>] __user_walk_fd+0x2f/0x40
 [<c046fada>] vfs_lstat_fd+0x16/0x3d
 [<c046fb46>] sys_lstat64+0xf/0x23
 [<c044cd06>] audit_syscall_entry+0x111/0x143
 [<c040794d>] do_syscall_trace+0x124/0x16b
 [<c0404e4c>] syscall_call+0x7/0xb
 =======================
Comment 9 Josef Bacik 2007-02-27 17:41:44 EST
you know, looking at the good runs, I'm starting to think that we may be
panicing when we remove the directory, not removing one of the files.
Comment 10 Josef Bacik 2007-02-27 21:48:30 EST
ok looks like it happens on normal files, not just the directory

Ok a good one
 [<d0a9d6d1>] gfs2_inode_lookup+0x1b1/0x21f [gfs2]
 [<d0a9d877>] gfs2_lookupi+0x138/0x166 [gfs2]
 [<d0a9d809>] gfs2_lookupi+0xca/0x166 [gfs2]
 [<d0aa7792>] gfs2_lookup+0x1a/0x47 [gfs2]
 [<c0473fac>] __lookup_hash+0xb1/0xe1
 [<c047576b>] do_unlinkat+0x5b/0x119
 [<c040794d>] do_syscall_trace+0x124/0x16b
 [<c0404e4c>] syscall_call+0x7/0xb
 =======================
Hmm, this is going to be disasterous: ccdfa0ac
 [<d0a9d702>] gfs2_inode_lookup+0x1e2/0x21f [gfs2]
 [<d0a9d877>] gfs2_lookupi+0x138/0x166 [gfs2]
 [<d0a9d809>] gfs2_lookupi+0xca/0x166 [gfs2]
 [<d0aa7792>] gfs2_lookup+0x1a/0x47 [gfs2]
 [<c0473fac>] __lookup_hash+0xb1/0xe1
 [<c047576b>] do_unlinkat+0x5b/0x119
 [<c040794d>] do_syscall_trace+0x124/0x16b
 [<c0404e4c>] syscall_call+0x7/0xb
 =======================
dentry inode gl is null, panicing soon
inode state is 0
josef flag is 1
inode is ccdfa0ac
file name is output_32396

so i'm going to stop poking around and just start reading and see if i can
figure out whats going wrong.
Comment 11 Josef Bacik 2007-02-27 23:13:39 EST
ok looks like we are screwing up here

        struct inode *inode = gfs2_iget(sb, inum);

gfs2_iget cals iget5_locked, which calls ifind() to look for the inode in cache,
which its found and returned.  So basically we are finding an indoe without a
glock.  now to figure out why this is happening.
Comment 12 Steve Whitehouse 2007-02-28 03:55:13 EST
Created attachment 148899 [details]
Test patch

So it looks as if I was probably right in comment #6. Can you apply the
following patch and see if you can get the BUG_ON to trigger? It only needs to
trigger once to prove my theory, but I've never managed to get it to happen
myself and neither has anybody else that I've asked to try it so far.
Comment 13 Josef Bacik 2007-02-28 11:07:17 EST
i've been able to make the bug trip twice (i missed it the first time for some
reason).
Comment 14 Steve Whitehouse 2007-02-28 11:18:06 EST
Created attachment 148937 [details]
Patch to fix incorrect lookup

This should fix the bug. I'm glad to have finally been able to verify that this
can happen. Eventually I'd like to eliminate the "extra" inodes as they are
rather too large and not really what we ought to be using. For now though, this
will prevent any more being returned through lookup.
Comment 15 RHEL Product and Program Management 2007-03-06 06:24:01 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 16 Steve Whitehouse 2007-03-07 03:36:38 EST
*** Bug 224015 has been marked as a duplicate of this bug. ***
Comment 17 Kiersten (Kerri) Anderson 2007-03-09 13:43:15 EST
Devel ACK for 5.1.  Would like to get the patch incorporated into an early 5.1
cycle kernel build.
Comment 18 RHEL Product and Program Management 2007-03-09 18:54:24 EST
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.
Comment 19 Don Zickus 2007-04-10 12:27:30 EDT
in 2.6.18-14.el5
Comment 21 Mike Gahagan 2007-09-05 17:40:29 EDT
I was able to run the bonnie++ command below two times without a panic. This is
the same bonnie++ config we run in RHTS.

./bonnie++ -u root -s 2020 -d /mnt/gfstest/ -q -n 2 -x 2
Comment 23 errata-xmlrpc 2007-11-07 14:41:31 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.