Bug 229831
| Summary: | [GFS2] gfs2 panic's occasionally in gfs2_unlink while running bonnie++ | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Josef Bacik <jbacik> | ||||||
| Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 5.1 | CC: | kanderso, rkenna, rpeterso | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2007-11-07 19:41:31 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Ok we are panicing here
Feb 23 15:35:12 rh5cluster2 kernel: EIP: [<d0a4375b>]
gfs2_holder_init+0x34/0x3a [gfs2] SS:ESP 0068:cefede88
0x00008753 <gfs2_holder_init+44>: movl $0x0,0x1c(%ebx)
0x0000875a <gfs2_holder_init+51>: lock incl 0x18(%eax)
0x0000875e <gfs2_holder_init+55>: pop %ebx
0x0000875f <gfs2_holder_init+56>: pop %esi
void gfs2_glock_hold(struct gfs2_glock *gl)
{
atomic_inc(&gl->gl_ref);
}
and %eax is
eax: 00000000
so for some reason the gl is null when we get here. I will look into why this
is happening. I may have problems tracking this down since crash isn't
reading the core and I'm kind of hobbling around with gdb and such.
hmm now that I'm thinking about this, i bet what is happening is that the glock is going away for the inode because there is such a long delay between the period where we create the file, and then subsequently unlink it, which is why we are OOPsing. I'm going to write a test case to test this and see if i'm right. well i dont really need a test case, just create _alot_ of files in a directory, and then rm -rf the directory, it will panic. Again I'm going to assume this is because we are removing glocks from the inodes in our normal "hey thats alot of glocks hanging around, lets lose some of them" action, and then they never get recreated when we go to do the unlink. I'm going to look and see what ocfs2 does. ok we are panicing when we try to grab the dentry->d_inode's i_gl. So I'm going to track down what is happening and see if I can fix it. hmm so i'm wrong, because if we freed up the glock, the inode's i_gl would be some invalid address, not null, and the only time i_gl is set to null is during creation and clearing. Usually we clear an inode by calling clear_inode(), which sents the inodes i_flags to I_CLEAR, but i_flags was 0, so either we are calling gfs2_clear_inode and then somewhere along the way i_flags is getting reset to 0, or this inode was greated without a glock. I will look into this weekend, or more likely monday. Well its certainly odd. It looks like something which I was trying to track down a little while ago. I did a number of tests to try and work out exactly how we could be coming across the NULL glocks when the inode holds a ref count on its own glock for the whole lifetime of the inode. There was one thing however which I did look into but which never resulted in any final conclusion. I wondered whether the problem wasn't that the inode itself was wrongly set up, but that we were looking up the wrong inode in the first place. Each "real" inode actually results in two incore inode structures. One is the one which the VFS uses, and which contains the data within its address space. The other exists only to contain the metadata address space and thus will have a NULL pointer for its i_gl etc pointers. This extra inode is a waste of space and needs to be removed at some stage in the future. This will probably involve adding a struct address_space to a struct glock. In the mean time, I wonder whether its possible for the lookup function to return the wrong inode. I've had debugging code looking for that before but never found that it was actually possible. Its certainly worth in this case looking to see if thats what is happening, since that can be fixed in the lookup function fairly easily. ok looks like a problem in the lookup code, I put some debug stuff in and trapped it, Hmm, this is going to be disasterous: c4f8702c [<d0a9d6d4>] gfs2_inode_lookup+0x1b4/0x1f1 [gfs2] [<d0a966e8>] gfs2_dir_search+0x62/0x6a [gfs2] [<d0a9d849>] gfs2_lookupi+0x138/0x166 [gfs2] [<d0a9d7db>] gfs2_lookupi+0xca/0x166 [gfs2] [<d0aa7737>] gfs2_lookup+0x1a/0x47 [gfs2] [<c0473fac>] __lookup_hash+0xb1/0xe1 [<c047576b>] do_unlinkat+0x5b/0x119 [<c040794d>] do_syscall_trace+0x124/0x16b [<c0404e4c>] syscall_call+0x7/0xb ======================= dentry inode gl is null, panicing soon inode state is 0 josef flag is 1 inode is c4f8702c BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018 so the "hmm this is going to be disasterous" gets printed if we come into lookup and inode->i_state doesnt have I_NEW and ip->i_gl is null. Then in gfs2_unlink i check to see who's i_gl is null and print some stuff out, so looks like we sometimes screw the lookup up, I will look into this. also, i got this to happen before, but is screwed up my reproducer so i didnt rm -rf fast enough to cause the panic, it looks like it may be a problem with gfs2_dir_search. I'm going to trap one of the lookups that does the correct thing and see how it gets there Hmm, this is going to be disasterous: c6af802c [<d0a9d6d4>] gfs2_inode_lookup+0x1b4/0x1f1 [gfs2] [<d0a966e8>] gfs2_dir_search+0x62/0x6a [gfs2] [<d0a9d849>] gfs2_lookupi+0x138/0x166 [gfs2] [<d0a9d7db>] gfs2_lookupi+0xca/0x166 [gfs2] [<d0aa7727>] gfs2_lookup+0x1a/0x47 [gfs2] [<c0472f5a>] do_lookup+0xa3/0x140 [<c0474c7a>] __link_path_walk+0x7d7/0xc2c [<c04bf418>] inode_has_perm+0x66/0x6e [<d0a9b063>] gfs2_holder_uninit+0xb/0x1b [gfs2] [<c0475113>] link_path_walk+0x44/0xb3 [<c047540b>] do_path_lookup+0x172/0x1c2 [<c04742fd>] getname+0x59/0xad [<c0475bca>] __user_walk_fd+0x2f/0x40 [<c046fada>] vfs_lstat_fd+0x16/0x3d [<c046fb46>] sys_lstat64+0xf/0x23 [<c044cd06>] audit_syscall_entry+0x111/0x143 [<c040794d>] do_syscall_trace+0x124/0x16b [<c0404e4c>] syscall_call+0x7/0xb ======================= you know, looking at the good runs, I'm starting to think that we may be panicing when we remove the directory, not removing one of the files. ok looks like it happens on normal files, not just the directory Ok a good one [<d0a9d6d1>] gfs2_inode_lookup+0x1b1/0x21f [gfs2] [<d0a9d877>] gfs2_lookupi+0x138/0x166 [gfs2] [<d0a9d809>] gfs2_lookupi+0xca/0x166 [gfs2] [<d0aa7792>] gfs2_lookup+0x1a/0x47 [gfs2] [<c0473fac>] __lookup_hash+0xb1/0xe1 [<c047576b>] do_unlinkat+0x5b/0x119 [<c040794d>] do_syscall_trace+0x124/0x16b [<c0404e4c>] syscall_call+0x7/0xb ======================= Hmm, this is going to be disasterous: ccdfa0ac [<d0a9d702>] gfs2_inode_lookup+0x1e2/0x21f [gfs2] [<d0a9d877>] gfs2_lookupi+0x138/0x166 [gfs2] [<d0a9d809>] gfs2_lookupi+0xca/0x166 [gfs2] [<d0aa7792>] gfs2_lookup+0x1a/0x47 [gfs2] [<c0473fac>] __lookup_hash+0xb1/0xe1 [<c047576b>] do_unlinkat+0x5b/0x119 [<c040794d>] do_syscall_trace+0x124/0x16b [<c0404e4c>] syscall_call+0x7/0xb ======================= dentry inode gl is null, panicing soon inode state is 0 josef flag is 1 inode is ccdfa0ac file name is output_32396 so i'm going to stop poking around and just start reading and see if i can figure out whats going wrong. ok looks like we are screwing up here
struct inode *inode = gfs2_iget(sb, inum);
gfs2_iget cals iget5_locked, which calls ifind() to look for the inode in cache,
which its found and returned. So basically we are finding an indoe without a
glock. now to figure out why this is happening.
Created attachment 148899 [details] Test patch So it looks as if I was probably right in comment #6. Can you apply the following patch and see if you can get the BUG_ON to trigger? It only needs to trigger once to prove my theory, but I've never managed to get it to happen myself and neither has anybody else that I've asked to try it so far. i've been able to make the bug trip twice (i missed it the first time for some reason). Created attachment 148937 [details]
Patch to fix incorrect lookup
This should fix the bug. I'm glad to have finally been able to verify that this
can happen. Eventually I'd like to eliminate the "extra" inodes as they are
rather too large and not really what we ought to be using. For now though, this
will prevent any more being returned through lookup.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. *** Bug 224015 has been marked as a duplicate of this bug. *** Devel ACK for 5.1. Would like to get the patch incorporated into an early 5.1 cycle kernel build. This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST. in 2.6.18-14.el5 I was able to run the bonnie++ command below two times without a panic. This is the same bonnie++ config we run in RHTS. ./bonnie++ -u root -s 2020 -d /mnt/gfstest/ -q -n 2 -x 2 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |
If you run the bonnie++ stress tester on the latest gfs2 sources, sometimes you will get a null pointer dereference in gfs2_holder_init. This is the backtrace from the panic BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018 printing eip: d0a5a75b *pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: sg qla2xxx scsi_transport_fc autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2 dlm configfs sunrpc iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 video sbs i2c_ec button battery asus_acpi backlight ac parport_pc lp parport floppy pcspkr i2c_piix4 i2c_core tg3 cfi_probe gen_probe scb2_flash mtdcore chipreg ide_cd serio_raw cdrom dm_snapshot dm_zero dm_mirror dm_mod sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<d0a5a75b>] Not tainted VLI EFLAGS: 00010246 (2.6.20 #2) EIP is at gfs2_holder_init+0x34/0x3a [gfs2] eax: 00000000 ebx: ca44eec0 ecx: 00000000 edx: 00000001 esi: cb2ea560 edi: cec920ac ebp: cd55322c esp: ca44ee88 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process bonnie++ (pid: 3012, ti=ca44e000 task=cb2ea560 task.ti=ca44e000) Stack: ca44ee9c ce5e4324 d0a66b50 ca44eec0 cd83f000 ca44ee9c ca44ee9c c9f6c93c cb2ea560 00000001 00000000 00000000 00000000 d0a66b37 ca44eec0 ca44eec0 00000000 cb2ea560 00000001 00000000 00000000 00000000 d0a66b50 ca44eee0 Call Trace: [<d0a66b50>] gfs2_unlink+0x66/0x130 [gfs2] [<d0a66b37>] gfs2_unlink+0x4d/0x130 [gfs2] [<d0a66b50>] gfs2_unlink+0x66/0x130 [gfs2] [<d0a6bd87>] gfs2_rindex_hold+0x31/0x44e [gfs2] [<c0473ddb>] vfs_unlink+0x9f/0xd5 [<c047579b>] do_unlinkat+0x8b/0x119 [<c040794d>] do_syscall_trace+0x124/0x16b [<c0404e4c>] syscall_call+0x7/0xb ======================= Code: 5b 04 89 43 08 8b 74 24 08 89 73 20 64 8b 35 08 00 00 00 89 53 10 89 73 0c 89 4b 14 c7 43 18 00 00 00 00 c7 43 1c 00 00 00 00 90 <ff> 40 18 5b 5e c3 56 89 c6 53 83 ec 04 8b 98 80 00 00 00 8d 83 EIP: [<d0a5a75b>] gfs2_holder_init+0x34/0x3a [gfs2] SS:ESP 0068:ca44ee88 I have a vmcore but crash doesn't like looking at it for some reason, so I'm rebuilding my kernel with -g so I can take a look at it. To reproduce just run bonnie++ on the filesystem. You will need to have the patch from bz 229080, otherwise bonnie won't get past one of its tests to get to the deleting test.