Bug 628795
| Summary: | panic in ext3_get_inode_loc() due to corrupt inode orphan list | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Lachlan McIlroy <lmcilroy> | ||||
| Component: | kernel | Assignee: | Josef Bacik <jbacik> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 4.6.z | CC: | esandeen, jbacik, vgaikwad | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2010-09-09 15:40:04 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Looking at the vmcore:
KERNEL: /cores/20100719105748/work/vmlinux
DUMPFILE: /cores/20100719105748/work/2041314-ap15.1279420796.dmp
CPUS: 8
DATE: Sat Jul 17 22:39:56 2010
UPTIME: 41 days, 20:06:37
LOAD AVERAGE: 3.04, 3.29, 4.25
TASKS: 1448
NODENAME: <hostname>
RELEASE: 2.6.9-78.0.17.ELsmp
VERSION: #1 SMP Thu Mar 5 04:58:34 EST 2009
MACHINE: x86_64 (2792 Mhz)
MEMORY: 64 GB
PANIC: "vmdump"
PID: 955
COMMAND: "oracle"
TASK: 10bc9f297f0 [THREAD_INFO: 10777b58000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
general protection fault: 0000 [1] SMP
CPU 0
Modules linked in: nfs lockd iptable_filter ip_tables hangcheck_timer md5 ipv6 oracleasm(U) autofs4 i2c_dev i2c_core dm_mirror hw_random nfs_acl sunrpc egenera_nmi(U) egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod egenera_vmdump(U) egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U) egenera_virtual_bus(U) egenera_fs(U) ext3 jbd dm_mod
Pid: 955, comm: oracle Tainted: PF 2.6.9-78.0.17.ELsmp
RIP: 0010:[<ffffffffa002bfa2>] <ffffffffa002bfa2>{:ext3:ext3_get_inode_loc+61}
RSP: 0018:0000010777b59d18 EFLAGS: 00010216
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0a0d3e6465646f43
RDX: 0000000000000001 RSI: 0000010777b59da8 RDI: 7470697263736544
RBP: 0000010029865fd0 R08: 0000000000000000 R09: 0000000000000088
R10: 0000000000001000 R11: 000001092ffcfad8 R12: 0000010777b59da8
R13: 0000010777b59da8 R14: 0000010a2e407990 R15: 0000010029865fd0
FS: 0000002a970536e0(0000) GS:ffffffff8050d100(0000) knlGS:0000000008095440
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000002a960728c8 CR3: 0000000000101000 CR4: 00000000000006e0
Process oracle (pid: 955, threadinfo 0000010777b58000, task 0000010bc9f297f0)
Stack: 0000010029865fd0 0000010e294bb620 0000010777b59da8 0000000000000000
0000010029865fd0 0000010777b59da8 0000010777b59da8 0000010a2e407990
0000010029865fd0 ffffffffa002c9cb
Call Trace:<ffffffffa002c9cb>{:ext3:ext3_reserve_inode_write+40}
<ffffffffa0031a8d>{:ext3:ext3_orphan_del+379} <ffffffffa00164c6>{:jbd:journal_start+223}
<ffffffffa002e73f>{:ext3:ext3_delete_inode+116} <ffffffffa002e6cb>{:ext3:ext3_delete_inode+0}
<ffffffff80195b0c>{generic_delete_inode+190} <ffffffff80192214>{dput+463}
<ffffffff8017cb16>{__fput+278} <ffffffff8017b645>{filp_close+103}
<ffffffff8013a39d>{put_files_struct+101} <ffffffff8013ac14>{do_exit+711}
<ffffffff8014533f>{sys_rt_sigaction+133} <ffffffff8013b754>{sys_exit_group+0}
<ffffffff801102f6>{system_call+126}
Code: 48 8b 87 60 02 00 00 48 63 80 88 00 00 00 48 39 c1 72 12 48
RIP <ffffffffa002bfa2>{:ext3:ext3_get_inode_loc+61} RSP <0000010777b59d18>
We've crashed at ext3_get_inode_loc+61, so let's look at that instruction:
crash> dis ext3_get_inode_loc
0xffffffffa002bf65 <ext3_get_inode_loc>: push %r15
0xffffffffa002bf67 <ext3_get_inode_loc+2>: push %r14
0xffffffffa002bf69 <ext3_get_inode_loc+4>: mov %rdi,%r14
0xffffffffa002bf6c <ext3_get_inode_loc+7>: push %r13
0xffffffffa002bf6e <ext3_get_inode_loc+9>: push %r12
0xffffffffa002bf70 <ext3_get_inode_loc+11>: push %rbp
0xffffffffa002bf71 <ext3_get_inode_loc+12>: push %rbx
0xffffffffa002bf72 <ext3_get_inode_loc+13>: mov %edx,%ebx
0xffffffffa002bf74 <ext3_get_inode_loc+15>: sub $0x18,%rsp
0xffffffffa002bf78 <ext3_get_inode_loc+19>: mov 0x30(%r14),%rcx
0xffffffffa002bf7c <ext3_get_inode_loc+23>: mov %rsi,0x10(%rsp)
0xffffffffa002bf81 <ext3_get_inode_loc+28>: mov 0x108(%rdi),%rdi
0xffffffffa002bf88 <ext3_get_inode_loc+35>: cmp $0x2,%rcx
0xffffffffa002bf8c <ext3_get_inode_loc+39>: setne %dl
0xffffffffa002bf8f <ext3_get_inode_loc+42>: xor %eax,%eax
0xffffffffa002bf91 <ext3_get_inode_loc+44>: cmp $0x8,%rcx
0xffffffffa002bf95 <ext3_get_inode_loc+48>: setne %al
0xffffffffa002bf98 <ext3_get_inode_loc+51>: test %eax,%edx
0xffffffffa002bf9a <ext3_get_inode_loc+53>: je 0xffffffffa002bfb5
0xffffffffa002bf9c <ext3_get_inode_loc+55>: cmp $0x7,%rcx
0xffffffffa002bfa0 <ext3_get_inode_loc+59>: je 0xffffffffa002bfb5
0xffffffffa002bfa2 <ext3_get_inode_loc+61>: mov 0x260(%rdi),%rax
We were dereferencing %rdi and it's value is 7470697263736544. That's definitely not a valid kernel address. Based on these instructions we are actually in this function:
static unsigned long ext3_get_inode_block(struct super_block *sb,
unsigned long ino, struct ext3_iloc *iloc)
{
unsigned long desc, group_desc, block_group;
unsigned long offset, block;
struct buffer_head *bh;
struct ext3_group_desc * gdp;
if ((ino != EXT3_ROOT_INO &&
ino != EXT3_JOURNAL_INO &&
ino != EXT3_RESIZE_INO &&
ino < EXT3_FIRST_INO(sb)) ||
#define EXT3_FIRST_INO(s) (EXT3_SB(s)->s_first_ino)
static inline struct ext3_sb_info * EXT3_SB(struct super_block *sb)
{
return sb->s_fs_info;
}
So it looks like we have a bad superblock pointer value. Let's see what we do have - the inode has been saved into %r14:
crash> inode 0000010a2e407990
struct inode struct: seek error: kernel virtual address: 10a2e407990 type: "gdb_readmem_callback"
Cannot access memory at address 0x10a2e407990
Well that's no good either. The inode pointer has been successfully dereferenced earlier in the assembly above but for some reason it's no good now. It was already in %r14 when the function was called and has been saved onto the stack:
crash> bt -f
PID: 955 TASK: 10bc9f297f0 CPU: 0 COMMAND: "oracle"
...
#5 [10777b59c60] error_exit at ffffffff80110e1d
[exception RIP: ext3_get_inode_loc+61]
RIP: ffffffffa002bfa2 RSP: 0000010777b59d18 RFLAGS: 00010216
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0a0d3e6465646f43
RDX: 0000000000000001 RSI: 0000010777b59da8 RDI: 7470697263736544
RBP: 0000010029865fd0 R8: 0000000000000000 R9: 0000000000000088
R10: 0000000000001000 R11: 000001092ffcfad8 R12: 0000010777b59da8
R13: 0000010777b59da8 R14: 0000010a2e407990 R15: 0000010029865fd0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
10777b59c68: 0000010029865fd0 0000010a2e407990
10777b59c78: 0000010777b59da8 0000010777b59da8
10777b59c88: 0000010029865fd0 0000000000000001
10777b59c98: 000001092ffcfad8 0000000000001000
10777b59ca8: 0000000000000088 0000000000000000
10777b59cb8: 0000000000000001 0a0d3e6465646f43
10777b59cc8: 0000000000000001 0000010777b59da8
10777b59cd8: 7470697263736544 ffffffffffffffff
10777b59ce8: ffffffffa002bfa2 0000000000000010
10777b59cf8: 0000000000010216 0000010777b59d18
10777b59d08: 0000000000000018 0000000000000000
10777b59d18: 0000010029865fd0 0000010e294bb620
10777b59d28: 0000010777b59da8 0000000000000000
10777b59d38: 0000010029865fd0 0000010777b59da8
10777b59d48: 0000010777b59da8 0000010a2e407990 <----
10777b59d58: 0000010029865fd0 ffffffffa002c9cb
#6 [10777b59d60] ext3_reserve_inode_write at ffffffffa002c9cb
That's the same value. It was also saved onto the stack in ext3_reserve_inode_write():
#6 [10777b59d60] ext3_reserve_inode_write at ffffffffa002c9cb
10777b59d68: 0000010a2e407958 0000000000000000
10777b59d78: 0000010a2e407990 0000010777b59da8
10777b59d88: 0000010bbac212c0 ffffffffa0031a8d
^^^
crash> inode 0000010bbac212c0
struct inode {
i_hash = {
next = 0x0,
pprev = 0x10c11b76f10
},
i_list = {
next = 0x10bbac212d0,
prev = 0x10bbac212d0
},
i_dentry = {
next = 0x10bbac212e0,
prev = 0x10bbac212e0
},
i_ino = 2224091,
i_count = {
counter = 0
},
i_mode = 33200,
i_nlink = 0,
i_uid = 2735,
i_gid = 28145,
i_rdev = 0,
i_size = 0,
i_atime = {
tv_sec = 1275886442,
tv_nsec = 0
},
i_mtime = {
tv_sec = 1279420796,
tv_nsec = 0
},
i_ctime = {
tv_sec = 1279420796,
tv_nsec = 0
},
i_blkbits = 12,
i_blksize = 4096,
i_version = 1,
i_blocks = 0,
i_bytes = 0,
i_sock = 0 '\000',
i_lock = {
lock = 1,
magic = 3735899821
},
i_sem = {
count = {
counter = 1
},
sleepers = 0,
wait = {
lock = {
lock = 1,
magic = 3735899821
},
task_list = {
next = 0x10bbac21388,
prev = 0x10bbac21388
}
}
},
i_alloc_sem = {
activity = 0,
wait_lock = {
lock = 1,
magic = 3735899821
},
wait_list = {
next = 0x10bbac213a8,
prev = 0x10bbac213a8
}
},
i_op = 0xffffffffa0048d60,
i_fop = 0xffffffffa0048e00,
i_sb = 0x100081da400,
i_flock = 0x0,
i_mapping = 0x10bbac213e0,
i_data = {
host = 0x10bbac212c0,
page_tree = {
height = 0,
gfp_mask = 544,
rnode = 0x0
},
tree_lock = {
lock = 1,
magic = 3735899821
},
i_mmap_writable = 0,
i_mmap = {
prio_tree_node = 0x0,
index_bits = 1
},
i_mmap_nonlinear = {
next = 0x10bbac21418,
prev = 0x10bbac21418
},
i_mmap_lock = {
lock = 1,
magic = 3735899821
},
truncate_count = {
counter = 0
},
nrpages = 0,
writeback_index = 0,
a_ops = 0xffffffffa0048fa0,
flags = 210,
backing_dev_info = 0x10bf63a43d0,
private_lock = {
lock = 1,
magic = 3735899821
},
private_list = {
next = 0x10bbac21468,
prev = 0x10bbac21468
},
assoc_mapping = 0x0
},
i_dquot = {0x0, 0x0},
i_devices = {
next = 0x10bbac21490,
prev = 0x10bbac21490
},
i_pipe = 0x0,
i_bdev = 0x0,
i_cdev = 0x0,
i_cindex = 0,
i_generation = 3739034195,
i_dnotify_mask = 0,
i_dnotify = 0x0,
i_state = 23,
dirtied_when = 4375203442,
i_flags = 0,
i_writecount = {
counter = 0
},
i_security = 0x0,
u = {
generic_ip = 0x0
}
}
That's better. Strange that it has been corrupted on the stack.
Taking a closer look at ext3_orphan_del() there are two inodes being accessed. The primary inode passed into ext3_orphan_del() is okay. It's the previous inode on the orphan list that is corrupt and may be related to the bad orphan message:
EXT3-fs warning (device dm-0): ext3_orphan_get: bad orphan inode 17235! e2fsck was run?
This looks to be identical to https://bugzilla.redhat.com/show_bug.cgi?id=434878. *nod* and bug #434878 as you noted can't be a dup of bug #196915 since it was hit on a kernel which contains that fix already.... Hrm well the only thing I can think of is the DIO problem. The nlink check in ext3_link is protected by the i_sem so there shouldn't be a race. Is their app doing DIO? Can they test a patch to make sure it works? Created attachment 446064 [details]
possible fix
Heh so I finally get the core opened and it's oracle thats running, so it could very well be the DIO problem. Here is a RHEL4 version of the DIO+orphan patch, let me know if this fixes their problem.
Josef, is that the same patch I posted to 629143? I've confirmed that patch fixes BZ629143 using the test case from BZ588599 but I wasn't entirely convinced it was the same bug as this one since I was never able to get it to crash with the stack trace above - but it makes sense that it could. The customer's panic was a one-off and we currently have no reproducer for this bug. If we can get the patch into BZ629143 and you think it's the same thing then we can dupe this bug to it. Lachlan, Aha, I knew I had seen that patch recently. Yes it's the same thing, if we don't clean up the orphan entry when doing DIO and then remove the file, we'll get a corrupted list and it will blow up like this. I'll close this as a dupe of 629143, thanks. *** This bug has been marked as a duplicate of bug 629143 *** |
Description of problem: Customer reported a panic due to a corrupted ext3 inode orphan list. general protection fault: 0000 [1] SMP CPU 0 Modules linked in: nfs lockd iptable_filter ip_tables hangcheck_timer md5 ipv6 oracleasm(U) autofs4 i2c_dev i2c_core dm_mirror hw_random nfs_acl sunrpc egenera_nmi(U) egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod egenera_vmdump(U) egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U) egenera_virtual_bus(U) egenera_fs(U) ext3 jbd dm_mod Pid: 955, comm: oracle Tainted: PF 2.6.9-78.0.17.ELsmp RIP: 0010:[<ffffffffa002bfa2>] <ffffffffa002bfa2>{:ext3:ext3_get_inode_loc+61} RSP: 0018:0000010777b59d18 EFLAGS: 00010216 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0a0d3e6465646f43 RDX: 0000000000000001 RSI: 0000010777b59da8 RDI: 7470697263736544 RBP: 0000010029865fd0 R08: 0000000000000000 R09: 0000000000000088 R10: 0000000000001000 R11: 000001092ffcfad8 R12: 0000010777b59da8 R13: 0000010777b59da8 R14: 0000010a2e407990 R15: 0000010029865fd0 FS: 0000002a970536e0(0000) GS:ffffffff8050d100(0000) knlGS:0000000008095440 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000002a960728c8 CR3: 0000000000101000 CR4: 00000000000006e0 Process oracle (pid: 955, threadinfo 0000010777b58000, task 0000010bc9f297f0) Stack: 0000010029865fd0 0000010e294bb620 0000010777b59da8 0000000000000000 0000010029865fd0 0000010777b59da8 0000010777b59da8 0000010a2e407990 0000010029865fd0 ffffffffa002c9cb Call Trace:<ffffffffa002c9cb>{:ext3:ext3_reserve_inode_write+40} <ffffffffa0031a8d>{:ext3:ext3_orphan_del+379} <ffffffffa00164c6>{:jbd:journal_start+223} <ffffffffa002e73f>{:ext3:ext3_delete_inode+116} <ffffffffa002e6cb>{:ext3:ext3_delete_inode+0} <ffffffff80195b0c>{generic_delete_inode+190} <ffffffff80192214>{dput+463} <ffffffff8017cb16>{__fput+278} <ffffffff8017b645>{filp_close+103} <ffffffff8013a39d>{put_files_struct+101} <ffffffff8013ac14>{do_exit+711} <ffffffff8014533f>{sys_rt_sigaction+133} <ffffffff8013b754>{sys_exit_group+0} <ffffffff801102f6>{system_call+126} Code: 48 8b 87 60 02 00 00 48 63 80 88 00 00 00 48 39 c1 72 12 48 RIP <ffffffffa002bfa2>{:ext3:ext3_get_inode_loc+61} RSP <0000010777b59d18> Version-Release number of selected component (if applicable): kernel-2.6.9-78.0.17.ELsmp How reproducible: Not easily. Steps to Reproduce: There's a reproducer in BZ196915 that might work but I haven't had any luck so far. Additional info: Vmcore is available at megatron.gsslab.rdu.redhat.com:/cores/20100719105748/work This bug looks a lot like: https://bugzilla.redhat.com/show_bug.cgi?id=196915 The fix for that bug is in the customer's kernel although it may still be possible that a file is unlinked after ext3_link() checks for a zero i_nlink but before it increments i_nlink so the race is still there. Or it could be a RHEL4 version of: https://bugzilla.redhat.com/show_bug.cgi?id=588599