| Summary: | kernel panic during LTP filesystem test run on ext3 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Mike Gahagan <mgahagan> |
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> |
| Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.2 | CC: | dchinner, eguan, esandeen, jburke, jstancek, lczerner, pbunyan, rwheeler |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | 6.2 | ||
| Hardware: | i686 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-11-02 17:07:59 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Mike Gahagan
2011-10-14 18:14:19 UTC
Ok, so list corruption.
By the time of the oops, it was also tainted with:
6: 'B' if a page-release function has found a bad page reference or
some unexpected page flags.
...
10: 'W' if a warning has previously been issued by the kernel.
(Though some warnings may set more specific taint flags.)
This seems to be the first error encountered:
------------[ cut here ]------------
WARNING: at lib/list_debug.c:26 __list_add+0x54/0xb0() (Not tainted)
Hardware name: SandyBridge Platform
list_add corruption. next->prev should be prev (df98a59c), but was df9885fc. (next=df9885fc).
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq mperf ipv6 microcode i2c_i801 sg iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext3 jbd mbcache firewire_ohci firewire_core crc_itu_t sr_mod cdrom sd_mod crc_t10dif ahci i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 28220, comm: ps Not tainted 2.6.32-207.el6.i686 #1
Call Trace:
[<c0454b41>] ? warn_slowpath_common+0x81/0xc0
[<c0604244>] ? __list_add+0x54/0xb0
[<c0604244>] ? __list_add+0x54/0xb0
[<c0454c13>] ? warn_slowpath_fmt+0x33/0x40
[<c0604244>] ? __list_add+0x54/0xb0
[<c053eb0a>] ? __d_instantiate+0x2a/0xd0
[<c053ebd9>] ? d_instantiate+0x29/0x50
[<c057a84f>] ? proc_pident_instantiate+0x5f/0x90
[<c057a995>] ? proc_pident_lookup+0x75/0xb0
[<c057aa24>] ? proc_tgid_base_lookup+0x14/0x20
[<c05367a2>] ? do_lookup+0x122/0x180
[<c0536eb3>] ? __link_path_walk+0x5e3/0xd60
[<c051cd40>] ? kmem_cache_alloc_notrace+0xa0/0xb0
[<c05adb32>] ? selinux_file_alloc_security+0x42/0xc0
[<c0537841>] ? path_walk+0x51/0xc0
[<c05379c9>] ? do_path_lookup+0x59/0x90
[<c053871c>] ? do_filp_open+0xdc/0xb00
[<c052eb6e>] ? cp_new_stat64+0xee/0x100
[<c0527fb8>] ? do_sys_open+0x58/0x130
[<c04afb0c>] ? audit_syscall_entry+0x21c/0x240
[<c052810c>] ? sys_open+0x2c/0x40
[<c0409a9f>] ? sysenter_do_call+0x12/0x28
---[ end trace 6a7cb877a54a826f ]---
so "ps" encountered list corruption somewhere in the proc filesystem guts ...
list_add corruption. next->prev should be prev (df98a59c), but was df9885fc. (next=df9885fc). should be df98a59c (11011111100110001010010110011100) but was df9885fc (11011111100110001000010111111100) More than just a bit flip, I guess, but based on the other bug, please do test memory on this box. Thanks, -Eric I Installed memtest86+ on this box, but when I tried to boot to run it, I never got any output in the remote console, so I don't know if memtest86 failed to run or our remote console isn't passing anything back to the client for some reason or another. I did notice the system was complaining a lot about single bit errors and it panic'ed on shutdown so bad memory or some other hardware issue is highly likely here. The system this happened on was having hardware issues and is now being repaired, I think we can safely close this bug as I never saw this on anything else and saw no ext3 related issues with Snapshot 4. |