Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 674137

Summary:

[RHEL6.1] panic in __d_lookup() during bootup

Product:

Red Hat Enterprise Linux 6

Reporter:

Jeff Burke <jburke>

Component:

kernel

Assignee:

Josef Bacik <jbacik>

Status:

CLOSED NOTABUG

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.1

CC:

arozansk, aviro, eparis, esandeen, jbacik, jlayton, jstancek, pbunyan

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-03-11 12:59:25 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

676037

Attachments:

Description	Flags
Console log	none

Description Jeff Burke 2011-01-31 18:40:04 UTC

Created attachment 476250 [details]
Console log

Version-Release number of selected component (if applicable):
2.6.32-109.el6

How reproducible:
*Unknown* Happened once on bootup while testing the new kernel
  
Actual results:
Starting portreserve: [  OK  ]   
Starting system logger: [  OK  ]   
Enabling ondemand cpu frequency scaling: [  OK  ]   
Starting irqbalance: [  OK  ]   
Starting rpcbind: [  OK  ]   
Starting NFS statd: [  OK  ]   
Starting mdmonitor: [  OK  ]   
Starting RPC idmapd: RPC: Registered udp transport module. 
RPC: Registered tcp transport module. 
RPC: Registered tcp NFSv4.1 backchannel transport module. 
[  OK  ]   
No kdump initial ramdisk found.[WARNING]   
Rebuilding /boot/initrd-2.6.32-109.el6.i686kdump.img  
BUG: unable to handle kernel paging request at b32a795a 
IP: [<c05398d6>] __d_lookup+0x66/0x100 
*pdpt = 0000000032fd9001 *pde = 0000000000000000  
Oops: 0000 [#1] SMP  
last sysfs file: /sys/devices/virtual/block/dm-2/dev 
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq ipv6 dm_mirror dm_region_hash dm_log i2c_i801 sg iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom firewire_ohci firewire_core crc_itu_t ahci ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod [last unloaded: microcode] 
 
Pid: 1565, comm: mkdumprd Not tainted (2.6.32-109.el6.i686 #1) To be filled by O.E.M. 
EIP: 0060:[<c05398d6>] EFLAGS: 00010286 CPU: 1 
EIP is at __d_lookup+0x66/0x100 
EAX: c1d04828 EBX: b32a795a ECX: 00000011 EDX: 041ff71e 
ESI: f4659e78 EDI: f4c016c0 EBP: 00000bce ESP: f4659e1c 
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 
Process mkdumprd (pid: 1565, ti=f4658000 task=f4be8570 task.ti=f4658000) 
Stack: 
 f4c016c0 00c83e68 00000004 f4659e78 f4659e87 00000022 f4659e78 f4c016c0 
<0> 00000bce c0539f7a f4659e87 00000000 f7037980 c057597b f4659e87 0000000d 
<0> c095486a 00000bce cc7a7f17 00000bce f70f8680 f70f8680 00000bce 00c83e68 
Call Trace: 
 [<c0539f7a>] ? d_lookup+0x2a/0x40 
 [<c057597b>] ? proc_flush_task+0x8b/0x250 
 [<c04542b9>] ? release_task+0x39/0x480 
 [<c0455127>] ? wait_consider_task+0x8b7/0xbb0 
 [<c04554f4>] ? do_wait+0xd4/0x1f0 
 [<c045569b>] ? sys_wait4+0x8b/0xe0 
 [<c0453d60>] ? child_wait_callback+0x0/0xa0 
 [<c0455717>] ? sys_waitpid+0x27/0x30 
 [<c0409a7f>] ? sysenter_do_call+0x12/0x28 

Additional info:

Comment 3 Jeff Burke 2011-02-01 01:44:28 UTC

BUG: unable to handle kernel paging request at b32a795a
IP: [<c05398d6>] __d_lookup+0x66/0x100
*pdpt = 00000000325da001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/block/dm-2/dev
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq ipv6 dm_mirror dm_region_hash dm_log sg i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_c]

Pid: 3021, comm: mkdumprd Not tainted (2.6.32-109.el6.i686 #1) To be filled by O.E.M.
EIP: 0060:[<c05398d6>] EFLAGS: 00010286 CPU: 0
EIP is at __d_lookup+0x66/0x100
EAX: c1d04828 EBX: b32a795a ECX: 00000011 EDX: 041ff71e
ESI: f26e7e78 EDI: f4c016c0 EBP: 00000bce ESP: f26e7e1c
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process mkdumprd (pid: 3021, ti=f26e6000 task=f4ba5570 task.ti=f26e6000)
Stack:
 f4c016c0 00c83e68 00000004 f26e7e78 f26e7e87 00000020 f26e7e78 f4c016c0
<0> 00000bce c0539f7a f26e7e87 00000000 f7037980 c057597b f26e7e87 0000000d
<0> c095486a 00000bce 0a57b2e6 00000bce f441be00 f441be00 00000bce 00c83e68
Call Trace:
 [<c0539f7a>] ? d_lookup+0x2a/0x40
 [<c057597b>] ? proc_flush_task+0x8b/0x250
 [<c04542b9>] ? release_task+0x39/0x480
 [<c0455127>] ? wait_consider_task+0x8b7/0xbb0
 [<c04554f4>] ? do_wait+0xd4/0x1f0
 [<c045569b>] ? sys_wait4+0x8b/0xe0
 [<c0453d60>] ? child_wait_callback+0x0/0xa0
 [<c0455717>] ? sys_waitpid+0x27/0x30
 [<c0409a7f>] ? sysenter_do_call+0x12/0x28
Code: 10 89 d0 35 01 00 37 9e d3 e8 31 d0 23 05 54 65 a5 c0 c1 e0 02 03 05 5c 65 a5 c0 8b 18 85 db 75 0a eb 64 66 90 8b 1b 85 db 74 5c <8b> 03 0f 18 00 90 8d 6b e 
EIP: [<c05398d6>] __d_lookup+0x66/0x100 SS:ESP 0068:f26e7e1c
CR2: 00000000b32a795a
---[ end trace 33f76c7bbdd68b7c ]---
Kernel panic - not syncing: Fatal exception
Pid: 3021, comm: mkdumprd Tainted: G      D    ----------------  2.6.32-109.el6.i686 #1
Call Trace:
 [<c081cf57>] ? panic+0x42/0xf7
 [<c0820d5c>] ? oops_end+0xbc/0xd0
 [<c04321f2>] ? no_context+0xc2/0x190
 [<c043246b>] ? bad_area+0x3b/0x50
 [<c043290e>] ? __do_page_fault+0x34e/0x420
 [<c05eca67>] ? cpumask_next_and+0x17/0x30
 [<c08227fa>] ? do_page_fault+0x2a/0x90
 [<c08227d0>] ? do_page_fault+0x0/0x90
 [<c0820147>] ? error_code+0x73/0x78
 [<c05398d6>] ? __d_lookup+0x66/0x100
 [<c0539f7a>] ? d_lookup+0x2a/0x40
 [<c057597b>] ? proc_flush_task+0x8b/0x250
 [<c04542b9>] ? release_task+0x39/0x480
 [<c0455127>] ? wait_consider_task+0x8b7/0xbb0
 [<c04554f4>] ? do_wait+0xd4/0x1f0
 [<c045569b>] ? sys_wait4+0x8b/0xe0
 [<c0453d60>] ? child_wait_callback+0x0/0xa0
 [<c0455717>] ? sys_waitpid+0x27/0x30
 [<c0409a7f>] ? sysenter_do_call+0x12/0x28
panic occurred, switching back to text console

Comment 4 RHEL Program Management 2011-02-01 06:22:10 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 6 RHEL Program Management 2011-02-01 19:13:05 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 8 Jeff Layton 2011-02-04 13:02:51 UTC

FWIW, it died in a prefetch. Possibly the one in the hlist_for_each_entry_rcu macro?

(gdb) list *(__d_lookup+0x66)
0xc05398d6 is in __d_lookup (/usr/src/debug/kernel-2.6.32-109.el6/linux-2.6.32-109.el6.i686/arch/x86/include/asm/processor.h:864).
859	 * It's not worth to care about 3dnow prefetches for the K6
860	 * because they are microcoded there and very slow.
861	 */
862	static inline void prefetch(const void *x)
863	{
864		alternative_input(BASE_PREFETCH,
865				  "prefetchnta (%1)",
866				  X86_FEATURE_XMM,
867				  "r" (x));
868	}

Comment 10 Josef Bacik 2011-02-09 02:04:40 UTC

Is there anyway memtest86 can be run on these boxes?  I have been trying to build a kernel on this box all freaking day and it keeps panicing with "bad page state", and then when I finally got it to build I tried to install the 109 kernel to verify the problem still happened and the damn thing paniced again in a weird way, so I'd like to make sure it's not a hardware issue.

Comment 12 Josef Bacik 2011-02-09 21:43:47 UTC

So I hit the bad page state BUG() on the other box as well, building a kernel in brew with prefetch turned off to test.

Comment 13 Josef Bacik 2011-02-10 22:30:52 UTC

Ok so I finally got a freaking kernel to build on -01 with prefetching disabled, and now it seems to be acting fine.  I'm going to run stuff on it all night since it was the more stable box anyway, but it seems that may be key.  Tomorrow I'll try to get the kernel onto -02 and make sure that is actually it.

Comment 14 Josef Bacik 2011-02-11 14:48:10 UTC

Hmm so -01 ran all night with no problems, so I spent a while this morning putting sync's into the various parts of the makefile so i could get the kernel to install on -02, got it installed and it's still blowing up on -02.  So it's not prefetch.

Comment 22 Josef Bacik 2011-02-18 16:19:47 UTC

Btw panics from c#17 and c#18 are from intel-piketon-01 on a build from the rhel6 git kernel with prefetch disabled, and then c#19 is intel-piketon-tpm-02 with the same kernel, and then the last 2 are from intel-piketon-tpm-02 with the ga rhel6 kernel.

Comment 26 Josef Bacik 2011-02-22 19:56:59 UTC

Provisioned -01 with x86_64 and haven't experienced any instability, so it seems to just be a 32bit problem.

Comment 27 Josef Bacik 2011-02-22 21:15:47 UTC

Ok so if i boot with mem=1024M the panic still happens, but if I boot with mem=768M the box runs fine.  Course the address that we're panicing on isn't a vmalloc address so I'm not sure if its a coincidence or not.

Comment 28 Josef Bacik 2011-02-22 21:56:06 UTC

BUG: unable to handle kernel paging request at b32a79d7
vmalloc: no
IP: [<c053c77e>] find_inode_fast+0x1e/0x60
*pdpt = 0000000031bcc001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/pci0000:ff/0000:ff:02.3/irq
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq ipv6 dm_mirror dm_region_hash dm_log sg iTCO_wdt iTCO_vendor_support i2c_i801 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif firewire_ohci firewire_core crc_itu_t ahci ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod [last unloaded: microcode]

Pid: 27113, comm: stat Not tainted (2.6.32 #94) To be filled by O.E.M.
EIP: 0060:[<c053c77e>] EFLAGS: 00010286 CPU: 3
EIP is at find_inode_fast+0x1e/0x60
EAX: b32a79d7 EBX: f4593e00 ECX: 00100a79 EDX: c1d04830
ESI: c1d04830 EDI: 00100a79 EBP: 00100a79 ESP: f10c3d94
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process stat (pid: 27113, ti=f10c2000 task=f4aae030 task.ti=f10c2000)
Stack:
 f4593e00 c1d04830 00100a79 c053d5aa 00000001 ecc2edf8 c1d04830 f4593e00
<0> c053d7c4 ecc2ee54 f10c3e38 ecc2edf8 ed251adc 00100a79 f4593e00 fa5a5f20
<0> c05a3cd3 00100000 f10c3de8 f10c3de8 00000657 00000001 00000000 00000000
Call Trace:
 [<c053d5aa>] ? ifind_fast+0x2a/0x90
 [<c053d7c4>] ? iget_locked+0x34/0x130
 [<fa5a5f20>] ? ext4_iget+0x20/0x760 [ext4]
 [<c05a3cd3>] ? inode_has_perm+0x63/0xc0
 [<fa5adc17>] ? ext4_lookup+0x87/0x110 [ext4]
 [<c053b408>] ? d_alloc+0xe8/0x150
 [<c0532752>] ? do_lookup+0x152/0x1c0
 [<c0532bbb>] ? __link_path_walk+0x3fb/0x6b0
 [<c0533451>] ? path_walk+0x51/0xc0
 [<c05335d9>] ? do_path_lookup+0x59/0x90
 [<c053371d>] ? user_path_at+0x3d/0x80
 [<c04328ca>] ? __do_page_fault+0x1aa/0x420
 [<c052b687>] ? vfs_fstatat+0x37/0x70
 [<c052b718>] ? vfs_lstat+0x18/0x20
 [<c052b72f>] ? sys_lstat64+0xf/0x30
 [<c04adc0c>] ? audit_syscall_entry+0x21c/0x240
 [<c08256aa>] ? do_page_fault+0x2a/0x90
 [<c0409adf>] ? sysenter_do_call+0x12/0x28
Code: 83 c4 24 c3 89 f6 8d bc 27 00 00 00 00 57 89 cf 56 89 d6 53 89 c3 8b 06 85 c0 75 0f eb 37 8d b4 26 00 00 00 00 85 d2 74 2c 89 d0 <8b> 10 0f 18 02 90 39 78 20 89 c1 75 ed 39 98 a4 00 00 00 75 e5 
EIP: [<c053c77e>] find_inode_fast+0x1e/0x60 SS:ESP 0068:f10c3d94
CR2: 00000000b32a79d7

/root/rhel6/fs/inode.c: 575
0xc053c760 <find_inode_fast>:   push   %edi
0xc053c761 <find_inode_fast+1>: mov    %ecx,%edi
0xc053c763 <find_inode_fast+3>: push   %esi
0xc053c764 <find_inode_fast+4>: mov    %edx,%esi
0xc053c766 <find_inode_fast+6>: push   %ebx
0xc053c767 <find_inode_fast+7>: mov    %eax,%ebx
/root/rhel6/fs/inode.c: 580
0xc053c769 <find_inode_fast+9>: mov    (%esi),%eax
0xc053c76b <find_inode_fast+11>:        test   %eax,%eax
0xc053c76d <find_inode_fast+13>:        jne    0xc053c77e <find_inode_fast+30>
0xc053c76f <find_inode_fast+15>:        jmp    0xc053c7a8 <find_inode_fast+72>
0xc053c771 <find_inode_fast+17>:        lea    0x0(%esi,%eiz,1),%esi
0xc053c778 <find_inode_fast+24>:        test   %edx,%edx
0xc053c77a <find_inode_fast+26>:        je     0xc053c7a8 <find_inode_fast+72>
0xc053c77c <find_inode_fast+28>:        mov    %edx,%eax
0xc053c77e <find_inode_fast+30>:        mov    (%eax),%edx
0xc053c780 <find_inode_fast+32>:        prefetchnta (%edx)
0xc053c783 <find_inode_fast+35>:        nop
/root/rhel6/fs/inode.c: 581
0xc053c784 <find_inode_fast+36>:        cmp    %edi,0x20(%eax)
/root/rhel6/fs/inode.c: 580
0xc053c787 <find_inode_fast+39>:        mov    %eax,%ecx
/root/rhel6/fs/inode.c: 581
0xc053c789 <find_inode_fast+41>:        jne    0xc053c778 <find_inode_fast+24>
/root/rhel6/fs/inode.c: 583
0xc053c78b <find_inode_fast+43>:        cmp    %ebx,0xa4(%eax)
0xc053c791 <find_inode_fast+49>:        jne    0xc053c778 <find_inode_fast+24>
/root/rhel6/fs/inode.c: 585
0xc053c793 <find_inode_fast+51>:        testb  $0x70,0x140(%eax)
0xc053c79a <find_inode_fast+58>:        je     0xc053c7a3 <find_inode_fast+67>
/root/rhel6/fs/inode.c: 586
0xc053c79c <find_inode_fast+60>:        call   0xc053c6c0 <__wait_on_freeing_inode>
/root/rhel6/fs/inode.c: 587
0xc053c7a1 <find_inode_fast+65>:        jmp    0xc053c769 <find_inode_fast+9>
/root/rhel6/fs/inode.c: 591
0xc053c7a3 <find_inode_fast+67>:        test   %ecx,%ecx
0xc053c7a5 <find_inode_fast+69>:        jne    0xc053c7aa <find_inode_fast+74>
0xc053c7a7 <find_inode_fast+71>:        nop
0xc053c7a8 <find_inode_fast+72>:        xor    %ecx,%ecx
/root/rhel6/fs/inode.c: 592
0xc053c7aa <find_inode_fast+74>:        mov    %ecx,%eax
0xc053c7ac <find_inode_fast+76>:        pop    %ebx
0xc053c7ad <find_inode_fast+77>:        pop    %esi
0xc053c7ae <find_inode_fast+78>:        pop    %edi
0xc053c7af <find_inode_fast+79>:        nop
0xc053c7b0 <find_inode_fast+80>:        ret
0xc053c7b1 <find_inode_fast+81>:        jmp    0xc053c7c0 <iunique>

so %ebx is the sb, %edi is the ino and %edx is the hlist head, so that gives us

sb=f4593e00
ino=00100a79
head=c1d04830

crash> struct hlist_head c1d04830
struct hlist_head {
  first = 0xb32a79d7
}

ok so thats where we got our address, it's the first entry in the list so we know this isn't a prefetch problem.  So we get head from iget_locked


struct inode *iget_locked(struct super_block *sb, unsigned long ino)
{
        struct hlist_head *head = inode_hashtable + hash(sb, ino);
        struct inode *inode;

        inode = ifind_fast(sb, head, ino);
        if (inode)
                return inode;


crash> p inode_hashtable
inode_hashtable = $8 = (struct hlist_head *) 0xc1cf7000

and hash is

static unsigned long hash(struct super_block *sb, unsigned long hashval)
{
        unsigned long tmp;

        tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
                        L1_CACHE_BYTES;
        tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
        return tmp & I_HASHMASK;
}

so that means hash() would have had to returned

0xd830

for this to work out right, but I plugged this into a program and it's not what I got, I got

0x360c

which turns out to be

0xc1d07e3c

which looks valid

crash> struct hlist_head 0xc1d07e3c
struct hlist_head {
  first = 0xef6a2adc
}
crash> kmem ef6a2adc
CACHE    NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
f4b26480 ext4_inode_cache         648      57386     57396   9566     4k
SLAB      MEMORY    TOTAL  ALLOCATED  FREE
ef6a2000  ef6a2034      6          6     0
FREE / [ALLOCATED]
  [ef6a2a54]

  PAGE    PHYSICAL   MAPPING    INDEX CNT FLAGS
ca5ed440  2f6a2000         0         0  1 2800080

so the question is, how the hell did hash() screw up?

Comment 29 Josef Bacik 2011-02-22 22:13:01 UTC

Maybe my test program is wrong, but I don't think so


#include <stdio.h>

#define GOLDEN_RATIO_PRIME 0x9e370001UL
#define L1_CACHE_BYTES (1 << 5)
/* Got these from crash */
#define I_HASHBITS 16
#define I_HASHMASK 65535

int main(int argc, char **argv)
{
        unsigned long tmp;
        unsigned long foo;
        unsigned long hashval = 0x00100a79;
        unsigned long sb = 0xf4593e00;

        tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
                L1_CACHE_BYTES;
        tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);

        foo = tmp & I_HASHMASK;

        printf("foo is 0x%lx\n", foo);

        return 0;
}

crash> p i_hash_shift
i_hash_shift = $14 = 16

crash> p i_hash_mask
i_hash_mask = $15 = 65535

Comment 30 Josef Bacik 2011-02-23 15:29:20 UTC

Ok got a bunch of panics last night (yay updatedb), inode_hashtable is always 0xc1cf7000

sb=f4526c00
ino=00180139
head=c1d04828
hash=0xd828
what hash should be=0x360a
what head should have been=0xc1cfa60a

sb=f4add600
ino=00160d3f
head=c1d04828
hash=0xd828
what hash should be=0x360a
what head should have been=0xc1cfa60a

sb=f4b2fa00
ino=001a08bb
head=c1d04830
hash=0xd830
what hash should have be=0x360c
what head should have been=0xc1cfa60c

sb=f4514c00
ino=00141e2a
head=c1d04828
hash=0xd828
what hash should have been=0x360a
what head should have been=0xc1cfa60a

sb=f449e400
ino=00123713
head=c1d04828
hash=0xd828
what hash should have been=0x360a
what head should have been=0xc1cfa60a

sb=f44b1600
ino=00280a9c
head=c1d04830
hash=0xd830
what hash should have been=0x360c
what head should have been=0xc1cfa60c

sb=f448c800
ino=002c06c6
head=c1d04820
hash=0xd820
what hash should have been=0x3608
what head should have been=0xc1cfa608

Since it's pretty consistently the same wrong thing I'm going to put a printk in there to try and catch what's going wrong, see if doing hash() twice in a row will ever come out with something different.

Comment 31 Josef Bacik 2011-02-23 17:22:29 UTC

Well I feel silly, I forget this is pointer math not normal math, thats why the head is coming out that way.  So surprise surprise the hash() stuff isn't wrong, we're just ending up with a bogus address in our hash list.  Now to figure out who is adding that address to the hash list.

Comment 32 Josef Bacik 2011-02-25 16:42:45 UTC

So I changed the inode hashtable stuff to be setup after the mm stuff is setup, so instead of allocating from bootmem we allocate from slab later on in the boot process, and now the problem doesn't happen anymore.  This would explain why 64bit works fine, since it's NUMA aware we allocate later on and not from bootmem as well.  Now to figure out why this makes a difference.

Comment 33 Josef Bacik 2011-02-25 22:24:16 UTC

Switched everything in inode.c to use hlist_del so I could poison the lists and I got this

BUG: unable to handle kernel paging request at 00200200
vmalloc: no
IP: [<c053d662>] generic_delete_inode+0x202/0x240
*pdpt = 0000000032a23001 *pde = 00000001353de067 
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/block/dm-1/dm/name
Modules linked in: dm_mirror dm_region_hash dm_log i2c_i801 iTCO_wdt iTCO_vendor_support sg snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif firewire_ohci firewire_core crc_itu_t ahci ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod [last unloaded: scsi_wait_scan]

Pid: 930, comm: mount Tainted: G        W  ----------------  (2.6.32 #113) To be filled by O.E.M.
EIP: 0060:[<c053d662>] EFLAGS: 00010206 CPU: 0
EIP is at generic_delete_inode+0x202/0x240
EAX: 00200200 EBX: f24a334c ECX: 00000000 EDX: 00000b0b
ESI: c05738d0 EDI: c0bc5be0 EBP: c1d09000 ESP: f2b81ddc
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process mount (pid: 930, ti=f2b80000 task=f2a77570 task.ti=f2b80000)
Stack:
 00003218 f24a334c f4f2aa5c f2b81e1c 00000000 c053c624 f4f2aa5c c0539da2
<0> f703ec00 f4f2aa5c c053a063 f2b81e24 f4f2aa64 f703ec88 00000002 f2b81e50
<0> f2b81e1c f2b81e1c f2b81e24 f2b81e24 f4f2aa5c f4deeeb8 f4deeeb8 f4deee7c
Call Trace:
 [<c053c624>] ? iput+0x44/0x50
 [<c0539da2>] ? d_kill+0x22/0x50
 [<c053a063>] ? __shrink_dcache_sb+0x293/0x2e0
 [<c053a378>] ? shrink_dcache_parent+0x108/0x120
 [<c0577856>] ? proc_flush_task+0x96/0x250
 [<c04550f9>] ? release_task+0x39/0x480
 [<c0455f67>] ? wait_consider_task+0x8b7/0xbb0
 [<c0456334>] ? do_wait+0xd4/0x1f0
 [<c04564db>] ? sys_wait4+0x8b/0xe0
 [<c0454ba0>] ? child_wait_callback+0x0/0xa0
 [<c0409adf>] ? sysenter_do_call+0x12/0x28

So this is a little weird since I do a

if (!inode_unhashed(&inode->i_hash))
    hlist_del(&inode->i_hash);

and inode_unhashed just does

return (!n->pprev || (n->pprev == LIST_POISON2));

so pprev would have had to change between the check and the deletion.  How this is happening boggles my mind since all accesses to i_hash are serialized by the inode_lock.

Comment 34 Josef Bacik 2011-02-25 22:42:38 UTC

------------[ cut here ]------------
kernel BUG at fs/inode.c:658!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: 
Modules linked in:

Pid: 78, comm: mount Tainted: G        W  ----------------  (2.6.32 #115) To be filled by O.E.M.
EIP: 0060:[<c053de26>] EFLAGS: 00010206 CPU: 1
EIP is at iget_locked+0x1c6/0x1e0
EAX: 5a5a5a5a EBX: f4e30dd4 ECX: f4c03da4 EDX: f703ec74
ESI: c1cff918 EDI: f703ec00 EBP: f0000000 ESP: f4aaddec
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process mount (pid: 78, ti=f4aac000 task=f498e030 task.ti=f4aac000)
Stack:
 00000000 00000000 f703f840 f703f840 f703f89e f4e2e954 c0573939 f703f840
<0> f0000000 c0578c5a f4c03d94 f4aadefc f4e2e954 f4c03d94 f4aade78 c0573ce9
<0> c0836d00 f4e2e954 f4aadefc c0532752 f4c03d94 f49b0c00 f4c016c0 f4c03e0c
Call Trace:
 [<c0573939>] ? proc_get_inode+0x9/0xd0
 [<c0578c5a>] ? proc_lookup_de+0x6a/0xd0
 [<c0573ce9>] ? proc_root_lookup+0x19/0x50
 [<c0532752>] ? do_lookup+0x152/0x1c0
 [<c0532bbb>] ? __link_path_walk+0x3fb/0x6b0
 [<c0533451>] ? path_walk+0x51/0xc0
 [<c05335d9>] ? do_path_lookup+0x59/0x90
 [<c05340b4>] ? do_filp_open+0xc4/0xb30
 [<c0501f92>] ? handle_mm_fault+0x132/0x1d0
 [<c0523fb8>] ? do_sys_open+0x58/0x130
 [<c052410c>] ? sys_open+0x2c/0x40
 [<c0409adf>] ? sysenter_do_call+0x12/0x28

after poisoning proc inodes on free.

Comment 35 Josef Bacik 2011-03-03 15:24:55 UTC

c1d04800:  eed770fc c0bc5ca0 eede8614 c0bc5ca0   .p...\.......\..
c1d04810:  00000000 5a5a5a5a 00000000 5a5a5a5a   ....ZZZZ....ZZZZ
c1d04820:  b32a78dd 00000000 b32a795a 00000000   .x*.....Zy*.....
c1d04830:  b32a79d7 00000000 00000000 5a5a5a5a   .y*.........ZZZZ

this is with my making the hashtable have

struct inode_hlist_head {
    struct hlist_head head;
    void *ptr;
}

and then head is init'ed with INIT_HLIST_HEAD and ptr is initted with 0x5a5a5a5a.  For every hlist_del/hlist_add i set the ->ptr = inode->i_op so I can figure out which inode corrupted us, and then do a check if ->ptr == NULL and set it to inode if it is, and this chunk is still getting set to NULL.  So color me confused, I have no idea whats going on, but we're definitely getting 3 invalid entries in the hashtable, all in the same area.

Comment 38 Josef Bacik 2011-03-11 12:59:25 UTC

Couldn't reproduce the issue after Paul updated the bios on the box, I'm going to call this a BIOS problem and call it a day.