Bug 674137
Summary: | [RHEL6.1] panic in __d_lookup() during bootup | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jeff Burke <jburke> | ||||
Component: | kernel | Assignee: | Josef Bacik <jbacik> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.1 | CC: | arozansk, aviro, eparis, esandeen, jbacik, jlayton, jstancek, pbunyan | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-03-11 12:59:25 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 676037 | ||||||
Attachments: |
|
BUG: unable to handle kernel paging request at b32a795a IP: [<c05398d6>] __d_lookup+0x66/0x100 *pdpt = 00000000325da001 *pde = 0000000000000000 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/block/dm-2/dev Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq ipv6 dm_mirror dm_region_hash dm_log sg i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_c] Pid: 3021, comm: mkdumprd Not tainted (2.6.32-109.el6.i686 #1) To be filled by O.E.M. EIP: 0060:[<c05398d6>] EFLAGS: 00010286 CPU: 0 EIP is at __d_lookup+0x66/0x100 EAX: c1d04828 EBX: b32a795a ECX: 00000011 EDX: 041ff71e ESI: f26e7e78 EDI: f4c016c0 EBP: 00000bce ESP: f26e7e1c DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mkdumprd (pid: 3021, ti=f26e6000 task=f4ba5570 task.ti=f26e6000) Stack: f4c016c0 00c83e68 00000004 f26e7e78 f26e7e87 00000020 f26e7e78 f4c016c0 <0> 00000bce c0539f7a f26e7e87 00000000 f7037980 c057597b f26e7e87 0000000d <0> c095486a 00000bce 0a57b2e6 00000bce f441be00 f441be00 00000bce 00c83e68 Call Trace: [<c0539f7a>] ? d_lookup+0x2a/0x40 [<c057597b>] ? proc_flush_task+0x8b/0x250 [<c04542b9>] ? release_task+0x39/0x480 [<c0455127>] ? wait_consider_task+0x8b7/0xbb0 [<c04554f4>] ? do_wait+0xd4/0x1f0 [<c045569b>] ? sys_wait4+0x8b/0xe0 [<c0453d60>] ? child_wait_callback+0x0/0xa0 [<c0455717>] ? sys_waitpid+0x27/0x30 [<c0409a7f>] ? sysenter_do_call+0x12/0x28 Code: 10 89 d0 35 01 00 37 9e d3 e8 31 d0 23 05 54 65 a5 c0 c1 e0 02 03 05 5c 65 a5 c0 8b 18 85 db 75 0a eb 64 66 90 8b 1b 85 db 74 5c <8b> 03 0f 18 00 90 8d 6b e EIP: [<c05398d6>] __d_lookup+0x66/0x100 SS:ESP 0068:f26e7e1c CR2: 00000000b32a795a ---[ end trace 33f76c7bbdd68b7c ]--- Kernel panic - not syncing: Fatal exception Pid: 3021, comm: mkdumprd Tainted: G D ---------------- 2.6.32-109.el6.i686 #1 Call Trace: [<c081cf57>] ? panic+0x42/0xf7 [<c0820d5c>] ? oops_end+0xbc/0xd0 [<c04321f2>] ? no_context+0xc2/0x190 [<c043246b>] ? bad_area+0x3b/0x50 [<c043290e>] ? __do_page_fault+0x34e/0x420 [<c05eca67>] ? cpumask_next_and+0x17/0x30 [<c08227fa>] ? do_page_fault+0x2a/0x90 [<c08227d0>] ? do_page_fault+0x0/0x90 [<c0820147>] ? error_code+0x73/0x78 [<c05398d6>] ? __d_lookup+0x66/0x100 [<c0539f7a>] ? d_lookup+0x2a/0x40 [<c057597b>] ? proc_flush_task+0x8b/0x250 [<c04542b9>] ? release_task+0x39/0x480 [<c0455127>] ? wait_consider_task+0x8b7/0xbb0 [<c04554f4>] ? do_wait+0xd4/0x1f0 [<c045569b>] ? sys_wait4+0x8b/0xe0 [<c0453d60>] ? child_wait_callback+0x0/0xa0 [<c0455717>] ? sys_waitpid+0x27/0x30 [<c0409a7f>] ? sysenter_do_call+0x12/0x28 panic occurred, switching back to text console This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. FWIW, it died in a prefetch. Possibly the one in the hlist_for_each_entry_rcu macro? (gdb) list *(__d_lookup+0x66) 0xc05398d6 is in __d_lookup (/usr/src/debug/kernel-2.6.32-109.el6/linux-2.6.32-109.el6.i686/arch/x86/include/asm/processor.h:864). 859 * It's not worth to care about 3dnow prefetches for the K6 860 * because they are microcoded there and very slow. 861 */ 862 static inline void prefetch(const void *x) 863 { 864 alternative_input(BASE_PREFETCH, 865 "prefetchnta (%1)", 866 X86_FEATURE_XMM, 867 "r" (x)); 868 } Is there anyway memtest86 can be run on these boxes? I have been trying to build a kernel on this box all freaking day and it keeps panicing with "bad page state", and then when I finally got it to build I tried to install the 109 kernel to verify the problem still happened and the damn thing paniced again in a weird way, so I'd like to make sure it's not a hardware issue. So I hit the bad page state BUG() on the other box as well, building a kernel in brew with prefetch turned off to test. Ok so I finally got a freaking kernel to build on -01 with prefetching disabled, and now it seems to be acting fine. I'm going to run stuff on it all night since it was the more stable box anyway, but it seems that may be key. Tomorrow I'll try to get the kernel onto -02 and make sure that is actually it. Hmm so -01 ran all night with no problems, so I spent a while this morning putting sync's into the various parts of the makefile so i could get the kernel to install on -02, got it installed and it's still blowing up on -02. So it's not prefetch. Btw panics from c#17 and c#18 are from intel-piketon-01 on a build from the rhel6 git kernel with prefetch disabled, and then c#19 is intel-piketon-tpm-02 with the same kernel, and then the last 2 are from intel-piketon-tpm-02 with the ga rhel6 kernel. Provisioned -01 with x86_64 and haven't experienced any instability, so it seems to just be a 32bit problem. Ok so if i boot with mem=1024M the panic still happens, but if I boot with mem=768M the box runs fine. Course the address that we're panicing on isn't a vmalloc address so I'm not sure if its a coincidence or not. BUG: unable to handle kernel paging request at b32a79d7 vmalloc: no IP: [<c053c77e>] find_inode_fast+0x1e/0x60 *pdpt = 0000000031bcc001 *pde = 0000000000000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci0000:ff/0000:ff:02.3/irq Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq ipv6 dm_mirror dm_region_hash dm_log sg iTCO_wdt iTCO_vendor_support i2c_i801 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif firewire_ohci firewire_core crc_itu_t ahci ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod [last unloaded: microcode] Pid: 27113, comm: stat Not tainted (2.6.32 #94) To be filled by O.E.M. EIP: 0060:[<c053c77e>] EFLAGS: 00010286 CPU: 3 EIP is at find_inode_fast+0x1e/0x60 EAX: b32a79d7 EBX: f4593e00 ECX: 00100a79 EDX: c1d04830 ESI: c1d04830 EDI: 00100a79 EBP: 00100a79 ESP: f10c3d94 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process stat (pid: 27113, ti=f10c2000 task=f4aae030 task.ti=f10c2000) Stack: f4593e00 c1d04830 00100a79 c053d5aa 00000001 ecc2edf8 c1d04830 f4593e00 <0> c053d7c4 ecc2ee54 f10c3e38 ecc2edf8 ed251adc 00100a79 f4593e00 fa5a5f20 <0> c05a3cd3 00100000 f10c3de8 f10c3de8 00000657 00000001 00000000 00000000 Call Trace: [<c053d5aa>] ? ifind_fast+0x2a/0x90 [<c053d7c4>] ? iget_locked+0x34/0x130 [<fa5a5f20>] ? ext4_iget+0x20/0x760 [ext4] [<c05a3cd3>] ? inode_has_perm+0x63/0xc0 [<fa5adc17>] ? ext4_lookup+0x87/0x110 [ext4] [<c053b408>] ? d_alloc+0xe8/0x150 [<c0532752>] ? do_lookup+0x152/0x1c0 [<c0532bbb>] ? __link_path_walk+0x3fb/0x6b0 [<c0533451>] ? path_walk+0x51/0xc0 [<c05335d9>] ? do_path_lookup+0x59/0x90 [<c053371d>] ? user_path_at+0x3d/0x80 [<c04328ca>] ? __do_page_fault+0x1aa/0x420 [<c052b687>] ? vfs_fstatat+0x37/0x70 [<c052b718>] ? vfs_lstat+0x18/0x20 [<c052b72f>] ? sys_lstat64+0xf/0x30 [<c04adc0c>] ? audit_syscall_entry+0x21c/0x240 [<c08256aa>] ? do_page_fault+0x2a/0x90 [<c0409adf>] ? sysenter_do_call+0x12/0x28 Code: 83 c4 24 c3 89 f6 8d bc 27 00 00 00 00 57 89 cf 56 89 d6 53 89 c3 8b 06 85 c0 75 0f eb 37 8d b4 26 00 00 00 00 85 d2 74 2c 89 d0 <8b> 10 0f 18 02 90 39 78 20 89 c1 75 ed 39 98 a4 00 00 00 75 e5 EIP: [<c053c77e>] find_inode_fast+0x1e/0x60 SS:ESP 0068:f10c3d94 CR2: 00000000b32a79d7 /root/rhel6/fs/inode.c: 575 0xc053c760 <find_inode_fast>: push %edi 0xc053c761 <find_inode_fast+1>: mov %ecx,%edi 0xc053c763 <find_inode_fast+3>: push %esi 0xc053c764 <find_inode_fast+4>: mov %edx,%esi 0xc053c766 <find_inode_fast+6>: push %ebx 0xc053c767 <find_inode_fast+7>: mov %eax,%ebx /root/rhel6/fs/inode.c: 580 0xc053c769 <find_inode_fast+9>: mov (%esi),%eax 0xc053c76b <find_inode_fast+11>: test %eax,%eax 0xc053c76d <find_inode_fast+13>: jne 0xc053c77e <find_inode_fast+30> 0xc053c76f <find_inode_fast+15>: jmp 0xc053c7a8 <find_inode_fast+72> 0xc053c771 <find_inode_fast+17>: lea 0x0(%esi,%eiz,1),%esi 0xc053c778 <find_inode_fast+24>: test %edx,%edx 0xc053c77a <find_inode_fast+26>: je 0xc053c7a8 <find_inode_fast+72> 0xc053c77c <find_inode_fast+28>: mov %edx,%eax 0xc053c77e <find_inode_fast+30>: mov (%eax),%edx 0xc053c780 <find_inode_fast+32>: prefetchnta (%edx) 0xc053c783 <find_inode_fast+35>: nop /root/rhel6/fs/inode.c: 581 0xc053c784 <find_inode_fast+36>: cmp %edi,0x20(%eax) /root/rhel6/fs/inode.c: 580 0xc053c787 <find_inode_fast+39>: mov %eax,%ecx /root/rhel6/fs/inode.c: 581 0xc053c789 <find_inode_fast+41>: jne 0xc053c778 <find_inode_fast+24> /root/rhel6/fs/inode.c: 583 0xc053c78b <find_inode_fast+43>: cmp %ebx,0xa4(%eax) 0xc053c791 <find_inode_fast+49>: jne 0xc053c778 <find_inode_fast+24> /root/rhel6/fs/inode.c: 585 0xc053c793 <find_inode_fast+51>: testb $0x70,0x140(%eax) 0xc053c79a <find_inode_fast+58>: je 0xc053c7a3 <find_inode_fast+67> /root/rhel6/fs/inode.c: 586 0xc053c79c <find_inode_fast+60>: call 0xc053c6c0 <__wait_on_freeing_inode> /root/rhel6/fs/inode.c: 587 0xc053c7a1 <find_inode_fast+65>: jmp 0xc053c769 <find_inode_fast+9> /root/rhel6/fs/inode.c: 591 0xc053c7a3 <find_inode_fast+67>: test %ecx,%ecx 0xc053c7a5 <find_inode_fast+69>: jne 0xc053c7aa <find_inode_fast+74> 0xc053c7a7 <find_inode_fast+71>: nop 0xc053c7a8 <find_inode_fast+72>: xor %ecx,%ecx /root/rhel6/fs/inode.c: 592 0xc053c7aa <find_inode_fast+74>: mov %ecx,%eax 0xc053c7ac <find_inode_fast+76>: pop %ebx 0xc053c7ad <find_inode_fast+77>: pop %esi 0xc053c7ae <find_inode_fast+78>: pop %edi 0xc053c7af <find_inode_fast+79>: nop 0xc053c7b0 <find_inode_fast+80>: ret 0xc053c7b1 <find_inode_fast+81>: jmp 0xc053c7c0 <iunique> so %ebx is the sb, %edi is the ino and %edx is the hlist head, so that gives us sb=f4593e00 ino=00100a79 head=c1d04830 crash> struct hlist_head c1d04830 struct hlist_head { first = 0xb32a79d7 } ok so thats where we got our address, it's the first entry in the list so we know this isn't a prefetch problem. So we get head from iget_locked struct inode *iget_locked(struct super_block *sb, unsigned long ino) { struct hlist_head *head = inode_hashtable + hash(sb, ino); struct inode *inode; inode = ifind_fast(sb, head, ino); if (inode) return inode; crash> p inode_hashtable inode_hashtable = $8 = (struct hlist_head *) 0xc1cf7000 and hash is static unsigned long hash(struct super_block *sb, unsigned long hashval) { unsigned long tmp; tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES; tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS); return tmp & I_HASHMASK; } so that means hash() would have had to returned 0xd830 for this to work out right, but I plugged this into a program and it's not what I got, I got 0x360c which turns out to be 0xc1d07e3c which looks valid crash> struct hlist_head 0xc1d07e3c struct hlist_head { first = 0xef6a2adc } crash> kmem ef6a2adc CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE f4b26480 ext4_inode_cache 648 57386 57396 9566 4k SLAB MEMORY TOTAL ALLOCATED FREE ef6a2000 ef6a2034 6 6 0 FREE / [ALLOCATED] [ef6a2a54] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ca5ed440 2f6a2000 0 0 1 2800080 so the question is, how the hell did hash() screw up? Maybe my test program is wrong, but I don't think so #include <stdio.h> #define GOLDEN_RATIO_PRIME 0x9e370001UL #define L1_CACHE_BYTES (1 << 5) /* Got these from crash */ #define I_HASHBITS 16 #define I_HASHMASK 65535 int main(int argc, char **argv) { unsigned long tmp; unsigned long foo; unsigned long hashval = 0x00100a79; unsigned long sb = 0xf4593e00; tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES; tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS); foo = tmp & I_HASHMASK; printf("foo is 0x%lx\n", foo); return 0; } crash> p i_hash_shift i_hash_shift = $14 = 16 crash> p i_hash_mask i_hash_mask = $15 = 65535 Ok got a bunch of panics last night (yay updatedb), inode_hashtable is always 0xc1cf7000 sb=f4526c00 ino=00180139 head=c1d04828 hash=0xd828 what hash should be=0x360a what head should have been=0xc1cfa60a sb=f4add600 ino=00160d3f head=c1d04828 hash=0xd828 what hash should be=0x360a what head should have been=0xc1cfa60a sb=f4b2fa00 ino=001a08bb head=c1d04830 hash=0xd830 what hash should have be=0x360c what head should have been=0xc1cfa60c sb=f4514c00 ino=00141e2a head=c1d04828 hash=0xd828 what hash should have been=0x360a what head should have been=0xc1cfa60a sb=f449e400 ino=00123713 head=c1d04828 hash=0xd828 what hash should have been=0x360a what head should have been=0xc1cfa60a sb=f44b1600 ino=00280a9c head=c1d04830 hash=0xd830 what hash should have been=0x360c what head should have been=0xc1cfa60c sb=f448c800 ino=002c06c6 head=c1d04820 hash=0xd820 what hash should have been=0x3608 what head should have been=0xc1cfa608 Since it's pretty consistently the same wrong thing I'm going to put a printk in there to try and catch what's going wrong, see if doing hash() twice in a row will ever come out with something different. Well I feel silly, I forget this is pointer math not normal math, thats why the head is coming out that way. So surprise surprise the hash() stuff isn't wrong, we're just ending up with a bogus address in our hash list. Now to figure out who is adding that address to the hash list. So I changed the inode hashtable stuff to be setup after the mm stuff is setup, so instead of allocating from bootmem we allocate from slab later on in the boot process, and now the problem doesn't happen anymore. This would explain why 64bit works fine, since it's NUMA aware we allocate later on and not from bootmem as well. Now to figure out why this makes a difference. Switched everything in inode.c to use hlist_del so I could poison the lists and I got this BUG: unable to handle kernel paging request at 00200200 vmalloc: no IP: [<c053d662>] generic_delete_inode+0x202/0x240 *pdpt = 0000000032a23001 *pde = 00000001353de067 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/virtual/block/dm-1/dm/name Modules linked in: dm_mirror dm_region_hash dm_log i2c_i801 iTCO_wdt iTCO_vendor_support sg snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif firewire_ohci firewire_core crc_itu_t ahci ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod [last unloaded: scsi_wait_scan] Pid: 930, comm: mount Tainted: G W ---------------- (2.6.32 #113) To be filled by O.E.M. EIP: 0060:[<c053d662>] EFLAGS: 00010206 CPU: 0 EIP is at generic_delete_inode+0x202/0x240 EAX: 00200200 EBX: f24a334c ECX: 00000000 EDX: 00000b0b ESI: c05738d0 EDI: c0bc5be0 EBP: c1d09000 ESP: f2b81ddc DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mount (pid: 930, ti=f2b80000 task=f2a77570 task.ti=f2b80000) Stack: 00003218 f24a334c f4f2aa5c f2b81e1c 00000000 c053c624 f4f2aa5c c0539da2 <0> f703ec00 f4f2aa5c c053a063 f2b81e24 f4f2aa64 f703ec88 00000002 f2b81e50 <0> f2b81e1c f2b81e1c f2b81e24 f2b81e24 f4f2aa5c f4deeeb8 f4deeeb8 f4deee7c Call Trace: [<c053c624>] ? iput+0x44/0x50 [<c0539da2>] ? d_kill+0x22/0x50 [<c053a063>] ? __shrink_dcache_sb+0x293/0x2e0 [<c053a378>] ? shrink_dcache_parent+0x108/0x120 [<c0577856>] ? proc_flush_task+0x96/0x250 [<c04550f9>] ? release_task+0x39/0x480 [<c0455f67>] ? wait_consider_task+0x8b7/0xbb0 [<c0456334>] ? do_wait+0xd4/0x1f0 [<c04564db>] ? sys_wait4+0x8b/0xe0 [<c0454ba0>] ? child_wait_callback+0x0/0xa0 [<c0409adf>] ? sysenter_do_call+0x12/0x28 So this is a little weird since I do a if (!inode_unhashed(&inode->i_hash)) hlist_del(&inode->i_hash); and inode_unhashed just does return (!n->pprev || (n->pprev == LIST_POISON2)); so pprev would have had to change between the check and the deletion. How this is happening boggles my mind since all accesses to i_hash are serialized by the inode_lock. ------------[ cut here ]------------ kernel BUG at fs/inode.c:658! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: Modules linked in: Pid: 78, comm: mount Tainted: G W ---------------- (2.6.32 #115) To be filled by O.E.M. EIP: 0060:[<c053de26>] EFLAGS: 00010206 CPU: 1 EIP is at iget_locked+0x1c6/0x1e0 EAX: 5a5a5a5a EBX: f4e30dd4 ECX: f4c03da4 EDX: f703ec74 ESI: c1cff918 EDI: f703ec00 EBP: f0000000 ESP: f4aaddec DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mount (pid: 78, ti=f4aac000 task=f498e030 task.ti=f4aac000) Stack: 00000000 00000000 f703f840 f703f840 f703f89e f4e2e954 c0573939 f703f840 <0> f0000000 c0578c5a f4c03d94 f4aadefc f4e2e954 f4c03d94 f4aade78 c0573ce9 <0> c0836d00 f4e2e954 f4aadefc c0532752 f4c03d94 f49b0c00 f4c016c0 f4c03e0c Call Trace: [<c0573939>] ? proc_get_inode+0x9/0xd0 [<c0578c5a>] ? proc_lookup_de+0x6a/0xd0 [<c0573ce9>] ? proc_root_lookup+0x19/0x50 [<c0532752>] ? do_lookup+0x152/0x1c0 [<c0532bbb>] ? __link_path_walk+0x3fb/0x6b0 [<c0533451>] ? path_walk+0x51/0xc0 [<c05335d9>] ? do_path_lookup+0x59/0x90 [<c05340b4>] ? do_filp_open+0xc4/0xb30 [<c0501f92>] ? handle_mm_fault+0x132/0x1d0 [<c0523fb8>] ? do_sys_open+0x58/0x130 [<c052410c>] ? sys_open+0x2c/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 after poisoning proc inodes on free. c1d04800: eed770fc c0bc5ca0 eede8614 c0bc5ca0 .p...\.......\.. c1d04810: 00000000 5a5a5a5a 00000000 5a5a5a5a ....ZZZZ....ZZZZ c1d04820: b32a78dd 00000000 b32a795a 00000000 .x*.....Zy*..... c1d04830: b32a79d7 00000000 00000000 5a5a5a5a .y*.........ZZZZ this is with my making the hashtable have struct inode_hlist_head { struct hlist_head head; void *ptr; } and then head is init'ed with INIT_HLIST_HEAD and ptr is initted with 0x5a5a5a5a. For every hlist_del/hlist_add i set the ->ptr = inode->i_op so I can figure out which inode corrupted us, and then do a check if ->ptr == NULL and set it to inode if it is, and this chunk is still getting set to NULL. So color me confused, I have no idea whats going on, but we're definitely getting 3 invalid entries in the hashtable, all in the same area. Couldn't reproduce the issue after Paul updated the bios on the box, I'm going to call this a BIOS problem and call it a day. |
Created attachment 476250 [details] Console log Version-Release number of selected component (if applicable): 2.6.32-109.el6 How reproducible: *Unknown* Happened once on bootup while testing the new kernel Actual results: Starting portreserve: [ OK ] Starting system logger: [ OK ] Enabling ondemand cpu frequency scaling: [ OK ] Starting irqbalance: [ OK ] Starting rpcbind: [ OK ] Starting NFS statd: [ OK ] Starting mdmonitor: [ OK ] Starting RPC idmapd: RPC: Registered udp transport module. RPC: Registered tcp transport module. RPC: Registered tcp NFSv4.1 backchannel transport module. [ OK ] No kdump initial ramdisk found.[WARNING] Rebuilding /boot/initrd-2.6.32-109.el6.i686kdump.img BUG: unable to handle kernel paging request at b32a795a IP: [<c05398d6>] __d_lookup+0x66/0x100 *pdpt = 0000000032fd9001 *pde = 0000000000000000 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/block/dm-2/dev Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq ipv6 dm_mirror dm_region_hash dm_log i2c_i801 sg iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom firewire_ohci firewire_core crc_itu_t ahci ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod [last unloaded: microcode] Pid: 1565, comm: mkdumprd Not tainted (2.6.32-109.el6.i686 #1) To be filled by O.E.M. EIP: 0060:[<c05398d6>] EFLAGS: 00010286 CPU: 1 EIP is at __d_lookup+0x66/0x100 EAX: c1d04828 EBX: b32a795a ECX: 00000011 EDX: 041ff71e ESI: f4659e78 EDI: f4c016c0 EBP: 00000bce ESP: f4659e1c DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mkdumprd (pid: 1565, ti=f4658000 task=f4be8570 task.ti=f4658000) Stack: f4c016c0 00c83e68 00000004 f4659e78 f4659e87 00000022 f4659e78 f4c016c0 <0> 00000bce c0539f7a f4659e87 00000000 f7037980 c057597b f4659e87 0000000d <0> c095486a 00000bce cc7a7f17 00000bce f70f8680 f70f8680 00000bce 00c83e68 Call Trace: [<c0539f7a>] ? d_lookup+0x2a/0x40 [<c057597b>] ? proc_flush_task+0x8b/0x250 [<c04542b9>] ? release_task+0x39/0x480 [<c0455127>] ? wait_consider_task+0x8b7/0xbb0 [<c04554f4>] ? do_wait+0xd4/0x1f0 [<c045569b>] ? sys_wait4+0x8b/0xe0 [<c0453d60>] ? child_wait_callback+0x0/0xa0 [<c0455717>] ? sys_waitpid+0x27/0x30 [<c0409a7f>] ? sysenter_do_call+0x12/0x28 Additional info: