Bug 628795

Summary:

panic in ext3_get_inode_loc() due to corrupt inode orphan list

Product:

Red Hat Enterprise Linux 4

Reporter:

Lachlan McIlroy <lmcilroy>

Component:

kernel

Assignee:

Josef Bacik <jbacik>

Status:

CLOSED DUPLICATE

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.6.z

CC:

esandeen, jbacik, vgaikwad

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2010-09-09 15:40:04 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
possible fix	none

Description Lachlan McIlroy 2010-08-31 04:56:35 UTC

Description of problem:
Customer reported a panic due to a corrupted ext3 inode orphan list.

general protection fault: 0000 [1] SMP
CPU 0
Modules linked in: nfs lockd iptable_filter ip_tables hangcheck_timer md5 ipv6 oracleasm(U) autofs4 i2c_dev i2c_core dm_mirror hw_random nfs_acl sunrpc egenera_nmi(U) egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod egenera_vmdump(U) egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U) egenera_virtual_bus(U) egenera_fs(U) ext3 jbd dm_mod
Pid: 955, comm: oracle Tainted: PF     2.6.9-78.0.17.ELsmp
RIP: 0010:[<ffffffffa002bfa2>] <ffffffffa002bfa2>{:ext3:ext3_get_inode_loc+61}
RSP: 0018:0000010777b59d18  EFLAGS: 00010216
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0a0d3e6465646f43
RDX: 0000000000000001 RSI: 0000010777b59da8 RDI: 7470697263736544
RBP: 0000010029865fd0 R08: 0000000000000000 R09: 0000000000000088
R10: 0000000000001000 R11: 000001092ffcfad8 R12: 0000010777b59da8
R13: 0000010777b59da8 R14: 0000010a2e407990 R15: 0000010029865fd0
FS:  0000002a970536e0(0000) GS:ffffffff8050d100(0000) knlGS:0000000008095440
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000002a960728c8 CR3: 0000000000101000 CR4: 00000000000006e0
Process oracle (pid: 955, threadinfo 0000010777b58000, task 0000010bc9f297f0)
Stack: 0000010029865fd0 0000010e294bb620 0000010777b59da8 0000000000000000
      0000010029865fd0 0000010777b59da8 0000010777b59da8 0000010a2e407990
      0000010029865fd0 ffffffffa002c9cb
Call Trace:<ffffffffa002c9cb>{:ext3:ext3_reserve_inode_write+40}
      <ffffffffa0031a8d>{:ext3:ext3_orphan_del+379}
      <ffffffffa00164c6>{:jbd:journal_start+223}
      <ffffffffa002e73f>{:ext3:ext3_delete_inode+116}
      <ffffffffa002e6cb>{:ext3:ext3_delete_inode+0}
      <ffffffff80195b0c>{generic_delete_inode+190}
      <ffffffff80192214>{dput+463}
      <ffffffff8017cb16>{__fput+278}
      <ffffffff8017b645>{filp_close+103}
      <ffffffff8013a39d>{put_files_struct+101}
      <ffffffff8013ac14>{do_exit+711}
      <ffffffff8014533f>{sys_rt_sigaction+133}
      <ffffffff8013b754>{sys_exit_group+0}
      <ffffffff801102f6>{system_call+126}

Code: 48 8b 87 60 02 00 00 48 63 80 88 00 00 00 48 39 c1 72 12 48
RIP <ffffffffa002bfa2>{:ext3:ext3_get_inode_loc+61} RSP <0000010777b59d18>

Version-Release number of selected component (if applicable):
kernel-2.6.9-78.0.17.ELsmp

How reproducible:
Not easily.

Steps to Reproduce:
There's a reproducer in BZ196915 that might work but I haven't had any luck so far.
  
Additional info:
Vmcore is available at megatron.gsslab.rdu.redhat.com:/cores/20100719105748/work

This bug looks a lot like:
https://bugzilla.redhat.com/show_bug.cgi?id=196915
The fix for that bug is in the customer's kernel although it may still be possible that a file is unlinked after ext3_link() checks for a zero i_nlink but before it increments i_nlink so the race is still there.

Or it could be a RHEL4 version of:
https://bugzilla.redhat.com/show_bug.cgi?id=588599

Comment 1 Lachlan McIlroy 2010-08-31 05:00:43 UTC

Looking at the vmcore:

     KERNEL: /cores/20100719105748/work/vmlinux
   DUMPFILE: /cores/20100719105748/work/2041314-ap15.1279420796.dmp
       CPUS: 8
       DATE: Sat Jul 17 22:39:56 2010
     UPTIME: 41 days, 20:06:37
LOAD AVERAGE: 3.04, 3.29, 4.25
      TASKS: 1448
   NODENAME: <hostname>
    RELEASE: 2.6.9-78.0.17.ELsmp
    VERSION: #1 SMP Thu Mar 5 04:58:34 EST 2009
    MACHINE: x86_64  (2792 Mhz)
     MEMORY: 64 GB
      PANIC: "vmdump"
        PID: 955
    COMMAND: "oracle"
       TASK: 10bc9f297f0  [THREAD_INFO: 10777b58000]
        CPU: 0
      STATE: TASK_RUNNING (PANIC)

general protection fault: 0000 [1] SMP
CPU 0
Modules linked in: nfs lockd iptable_filter ip_tables hangcheck_timer md5 ipv6 oracleasm(U) autofs4 i2c_dev i2c_core dm_mirror hw_random nfs_acl sunrpc egenera_nmi(U) egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod egenera_vmdump(U) egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U) egenera_virtual_bus(U) egenera_fs(U) ext3 jbd dm_mod
Pid: 955, comm: oracle Tainted: PF     2.6.9-78.0.17.ELsmp
RIP: 0010:[<ffffffffa002bfa2>] <ffffffffa002bfa2>{:ext3:ext3_get_inode_loc+61}
RSP: 0018:0000010777b59d18  EFLAGS: 00010216
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0a0d3e6465646f43
RDX: 0000000000000001 RSI: 0000010777b59da8 RDI: 7470697263736544
RBP: 0000010029865fd0 R08: 0000000000000000 R09: 0000000000000088
R10: 0000000000001000 R11: 000001092ffcfad8 R12: 0000010777b59da8
R13: 0000010777b59da8 R14: 0000010a2e407990 R15: 0000010029865fd0
FS:  0000002a970536e0(0000) GS:ffffffff8050d100(0000) knlGS:0000000008095440
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000002a960728c8 CR3: 0000000000101000 CR4: 00000000000006e0
Process oracle (pid: 955, threadinfo 0000010777b58000, task 0000010bc9f297f0)
Stack: 0000010029865fd0 0000010e294bb620 0000010777b59da8 0000000000000000
      0000010029865fd0 0000010777b59da8 0000010777b59da8 0000010a2e407990
      0000010029865fd0 ffffffffa002c9cb
Call Trace:<ffffffffa002c9cb>{:ext3:ext3_reserve_inode_write+40}
      <ffffffffa0031a8d>{:ext3:ext3_orphan_del+379} <ffffffffa00164c6>{:jbd:journal_start+223}
      <ffffffffa002e73f>{:ext3:ext3_delete_inode+116} <ffffffffa002e6cb>{:ext3:ext3_delete_inode+0}
      <ffffffff80195b0c>{generic_delete_inode+190} <ffffffff80192214>{dput+463}
      <ffffffff8017cb16>{__fput+278} <ffffffff8017b645>{filp_close+103}
      <ffffffff8013a39d>{put_files_struct+101} <ffffffff8013ac14>{do_exit+711}
      <ffffffff8014533f>{sys_rt_sigaction+133} <ffffffff8013b754>{sys_exit_group+0}
      <ffffffff801102f6>{system_call+126}

Code: 48 8b 87 60 02 00 00 48 63 80 88 00 00 00 48 39 c1 72 12 48
RIP <ffffffffa002bfa2>{:ext3:ext3_get_inode_loc+61} RSP <0000010777b59d18>

We've crashed at ext3_get_inode_loc+61, so let's look at that instruction:

crash> dis ext3_get_inode_loc
0xffffffffa002bf65 <ext3_get_inode_loc>:        push   %r15
0xffffffffa002bf67 <ext3_get_inode_loc+2>:      push   %r14
0xffffffffa002bf69 <ext3_get_inode_loc+4>:      mov    %rdi,%r14
0xffffffffa002bf6c <ext3_get_inode_loc+7>:      push   %r13
0xffffffffa002bf6e <ext3_get_inode_loc+9>:      push   %r12
0xffffffffa002bf70 <ext3_get_inode_loc+11>:     push   %rbp
0xffffffffa002bf71 <ext3_get_inode_loc+12>:     push   %rbx
0xffffffffa002bf72 <ext3_get_inode_loc+13>:     mov    %edx,%ebx
0xffffffffa002bf74 <ext3_get_inode_loc+15>:     sub    $0x18,%rsp
0xffffffffa002bf78 <ext3_get_inode_loc+19>:     mov    0x30(%r14),%rcx
0xffffffffa002bf7c <ext3_get_inode_loc+23>:     mov    %rsi,0x10(%rsp)
0xffffffffa002bf81 <ext3_get_inode_loc+28>:     mov    0x108(%rdi),%rdi
0xffffffffa002bf88 <ext3_get_inode_loc+35>:     cmp    $0x2,%rcx
0xffffffffa002bf8c <ext3_get_inode_loc+39>:     setne  %dl
0xffffffffa002bf8f <ext3_get_inode_loc+42>:     xor    %eax,%eax
0xffffffffa002bf91 <ext3_get_inode_loc+44>:     cmp    $0x8,%rcx
0xffffffffa002bf95 <ext3_get_inode_loc+48>:     setne  %al
0xffffffffa002bf98 <ext3_get_inode_loc+51>:     test   %eax,%edx
0xffffffffa002bf9a <ext3_get_inode_loc+53>:     je     0xffffffffa002bfb5
0xffffffffa002bf9c <ext3_get_inode_loc+55>:     cmp    $0x7,%rcx
0xffffffffa002bfa0 <ext3_get_inode_loc+59>:     je     0xffffffffa002bfb5
0xffffffffa002bfa2 <ext3_get_inode_loc+61>:     mov    0x260(%rdi),%rax

We were dereferencing %rdi and it's value is 7470697263736544.  That's definitely not a valid kernel address.  Based on these instructions we are actually in this function:

static unsigned long ext3_get_inode_block(struct super_block *sb,
unsigned long ino, struct ext3_iloc *iloc)
{
unsigned long desc, group_desc, block_group;
unsigned long offset, block;
struct buffer_head *bh;
struct ext3_group_desc * gdp;


if ((ino != EXT3_ROOT_INO &&
ino != EXT3_JOURNAL_INO &&
ino != EXT3_RESIZE_INO &&
ino < EXT3_FIRST_INO(sb)) ||

#define EXT3_FIRST_INO(s) (EXT3_SB(s)->s_first_ino)

static inline struct ext3_sb_info * EXT3_SB(struct super_block *sb)
{
return sb->s_fs_info;
}

So it looks like we have a bad superblock pointer value.  Let's see what we do have - the inode has been saved into %r14:

crash> inode 0000010a2e407990
struct inode struct: seek error: kernel virtual address: 10a2e407990  type: "gdb_readmem_callback"
Cannot access memory at address 0x10a2e407990

Well that's no good either.  The inode pointer has been successfully dereferenced earlier in the assembly above but for some reason it's no good now.  It was already in %r14 when the function was called and has been saved onto the stack:

crash> bt -f
PID: 955    TASK: 10bc9f297f0       CPU: 0   COMMAND: "oracle"
...
#5 [10777b59c60] error_exit at ffffffff80110e1d
   [exception RIP: ext3_get_inode_loc+61]
   RIP: ffffffffa002bfa2  RSP: 0000010777b59d18  RFLAGS: 00010216
   RAX: 0000000000000001  RBX: 0000000000000001  RCX: 0a0d3e6465646f43
   RDX: 0000000000000001  RSI: 0000010777b59da8  RDI: 7470697263736544
   RBP: 0000010029865fd0   R8: 0000000000000000   R9: 0000000000000088
   R10: 0000000000001000  R11: 000001092ffcfad8  R12: 0000010777b59da8
   R13: 0000010777b59da8  R14: 0000010a2e407990  R15: 0000010029865fd0
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   10777b59c68: 0000010029865fd0 0000010a2e407990
   10777b59c78: 0000010777b59da8 0000010777b59da8
   10777b59c88: 0000010029865fd0 0000000000000001
   10777b59c98: 000001092ffcfad8 0000000000001000
   10777b59ca8: 0000000000000088 0000000000000000
   10777b59cb8: 0000000000000001 0a0d3e6465646f43
   10777b59cc8: 0000000000000001 0000010777b59da8
   10777b59cd8: 7470697263736544 ffffffffffffffff
   10777b59ce8: ffffffffa002bfa2 0000000000000010
   10777b59cf8: 0000000000010216 0000010777b59d18
   10777b59d08: 0000000000000018 0000000000000000
   10777b59d18: 0000010029865fd0 0000010e294bb620
   10777b59d28: 0000010777b59da8 0000000000000000
   10777b59d38: 0000010029865fd0 0000010777b59da8
   10777b59d48: 0000010777b59da8 0000010a2e407990 <----
   10777b59d58: 0000010029865fd0 ffffffffa002c9cb
#6 [10777b59d60] ext3_reserve_inode_write at ffffffffa002c9cb

That's the same value.  It was also saved onto the stack in ext3_reserve_inode_write():

#6 [10777b59d60] ext3_reserve_inode_write at ffffffffa002c9cb
   10777b59d68: 0000010a2e407958 0000000000000000
   10777b59d78: 0000010a2e407990 0000010777b59da8
   10777b59d88: 0000010bbac212c0 ffffffffa0031a8d
                ^^^

crash> inode 0000010bbac212c0
struct inode {
 i_hash = {
   next = 0x0,
   pprev = 0x10c11b76f10
 },
 i_list = {
   next = 0x10bbac212d0,
   prev = 0x10bbac212d0
 },
 i_dentry = {
   next = 0x10bbac212e0,
   prev = 0x10bbac212e0
 },
 i_ino = 2224091,
 i_count = {
   counter = 0
 },
 i_mode = 33200,
 i_nlink = 0,
 i_uid = 2735,
 i_gid = 28145,
 i_rdev = 0,
 i_size = 0,
 i_atime = {
   tv_sec = 1275886442,
   tv_nsec = 0
 },
 i_mtime = {
   tv_sec = 1279420796,
   tv_nsec = 0
 },
 i_ctime = {
   tv_sec = 1279420796,
   tv_nsec = 0
 },
 i_blkbits = 12,
 i_blksize = 4096,
 i_version = 1,
 i_blocks = 0,
 i_bytes = 0,
 i_sock = 0 '\000',
 i_lock = {
   lock = 1,
   magic = 3735899821
 },
 i_sem = {
   count = {
     counter = 1
   },
   sleepers = 0,
   wait = {
     lock = {
       lock = 1,
       magic = 3735899821
     },
     task_list = {
       next = 0x10bbac21388,
       prev = 0x10bbac21388
     }
   }
 },
 i_alloc_sem = {
   activity = 0,
   wait_lock = {
     lock = 1,
     magic = 3735899821
   },
   wait_list = {
     next = 0x10bbac213a8,
     prev = 0x10bbac213a8
   }
 },
 i_op = 0xffffffffa0048d60,
 i_fop = 0xffffffffa0048e00,
 i_sb = 0x100081da400,
 i_flock = 0x0,
 i_mapping = 0x10bbac213e0,
 i_data = {
   host = 0x10bbac212c0,
   page_tree = {
     height = 0,
     gfp_mask = 544,
     rnode = 0x0
   },
   tree_lock = {
     lock = 1,
     magic = 3735899821
   },
   i_mmap_writable = 0,
   i_mmap = {
     prio_tree_node = 0x0,
     index_bits = 1
   },
   i_mmap_nonlinear = {
     next = 0x10bbac21418,
     prev = 0x10bbac21418
   },
   i_mmap_lock = {
     lock = 1,
     magic = 3735899821
   },
   truncate_count = {
     counter = 0
   },
   nrpages = 0,
   writeback_index = 0,
   a_ops = 0xffffffffa0048fa0,
   flags = 210,
   backing_dev_info = 0x10bf63a43d0,
   private_lock = {
     lock = 1,
     magic = 3735899821
   },
   private_list = {
     next = 0x10bbac21468,
     prev = 0x10bbac21468
   },
   assoc_mapping = 0x0
 },
 i_dquot = {0x0, 0x0},
 i_devices = {
   next = 0x10bbac21490,
   prev = 0x10bbac21490
 },
 i_pipe = 0x0,
 i_bdev = 0x0,
 i_cdev = 0x0,
 i_cindex = 0,
 i_generation = 3739034195,
 i_dnotify_mask = 0,
 i_dnotify = 0x0,
 i_state = 23,
 dirtied_when = 4375203442,
 i_flags = 0,
 i_writecount = {
   counter = 0
 },
 i_security = 0x0,
 u = {
   generic_ip = 0x0
 }
}

That's better.  Strange that it has been corrupted on the stack.

Taking a closer look at ext3_orphan_del() there are two inodes being accessed.  The primary inode passed into ext3_orphan_del() is okay.  It's the previous inode on the orphan list that is corrupt and may be related to the bad orphan message:

EXT3-fs warning (device dm-0): ext3_orphan_get: bad orphan inode 17235!  e2fsck was run?

Comment 3 Lachlan McIlroy 2010-09-06 07:02:03 UTC

This looks to be identical to https://bugzilla.redhat.com/show_bug.cgi?id=434878.

Comment 4 Eric Sandeen 2010-09-08 15:41:54 UTC

*nod* and bug #434878 as you noted can't be a dup of bug #196915 since it was hit on a kernel which contains that fix already....

Comment 5 Josef Bacik 2010-09-08 18:23:08 UTC

Hrm well the only thing I can think of is the DIO problem.  The nlink check in ext3_link is protected by the i_sem so there shouldn't be a race.  Is their app doing DIO?  Can they test a patch to make sure it works?

Comment 6 Josef Bacik 2010-09-08 18:34:33 UTC

Created attachment 446064 [details]
possible fix

Heh so I finally get the core opened and it's oracle thats running, so it could very well be the DIO problem.  Here is a RHEL4 version of the DIO+orphan patch, let me know if this fixes their problem.

Comment 7 Lachlan McIlroy 2010-09-09 01:39:35 UTC

Josef, is that the same patch I posted to 629143?  I've confirmed that patch fixes BZ629143 using the test case from BZ588599 but I wasn't entirely convinced it was the same bug as this one since I was never able to get it to crash with the stack trace above - but it makes sense that it could.  The customer's panic was a one-off and we currently have no reproducer for this bug.  If we can get the patch into BZ629143 and you think it's the same thing then we can dupe this bug to it.

Comment 8 Josef Bacik 2010-09-09 15:40:04 UTC

Lachlan,

Aha, I knew I had seen that patch recently.  Yes it's the same thing, if we don't clean up the orphan entry when doing DIO and then remove the file, we'll get a corrupted list and it will blow up like this.  I'll close this as a dupe of 629143, thanks.

*** This bug has been marked as a duplicate of bug 629143 ***