Bug 180968 - Data corruption in ext3 FS when running hazard (corrupt inodes)
Summary: Data corruption in ext3 FS when running hazard (corrupt inodes)
Keywords:
Status: CLOSED DUPLICATE of bug 175216
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Anderson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix
TreeView+ depends on / blocked
 
Reported: 2006-02-10 21:09 UTC by James Smart
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-13 20:17:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description James Smart 2006-02-10 21:09:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

Description of problem:
While tracking down a related ext3 issue, we have at least not seen any data integrity issue in RHEL3 till now. We have seen a panic in ext3 code running RHEL3. The same panic is also reported by HP on RHEL4. Following are stack traces of both the panics. PANIC1 is reported by HP on RHEL4 and PANIC2 is seen in Emulex Shift left testing on RHEL3 U6.

Both the panics are in prune_dcache function and is followed a
"VFS: Busy inodes after unmount. Self-destruct in5 seconds.  Have a nice
day..."
error message in the console. Looks like the inode data structure is
corrupted.

SCSI HBA drivers do not access inode data structures. This data
structure is used by file system layer only. 
Both the log files also contain error messages  reporting "fsck - made
repairs".


=========PANIC 1 =====
Feb  1 13:53:53 trogdor kernel: VFS: Busy inodes after unmount.
Self-destruct in5 seconds.  Have a nice day...
Feb  1 13:54:00 trogdor kernel: Unable to handle kernel paging request
at virtual address 00005682 Feb  1 13:54:00 trogdor kernel:  printing
eip:
Feb  1 13:54:00 trogdor kernel: c0170496 Feb  1 13:54:00 trogdor kernel:
*pde = 2f377001 Feb  1 13:54:00 trogdor kernel: Oops: 0000 [#1] Feb  1
13:54:00 trogdor kernel: SMP Feb  1 13:54:00 trogdor kernel: Modules
linked in: sg parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc
lpfcdfc(U) dm_multipath button battery ac md5 ipv6 uhci_hcd ehci_hcd
hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
lpfc(U) scsi_transport_fc mptscsih mptbase sd_mod scsi_mod
Feb  1 13:54:00 trogdor kernel: CPU:    0
Feb  1 13:54:01 trogdor kernel: EIP:    0060:[<c0170496>]    Not tainted
VLI
Feb  1 13:54:01 trogdor kernel: EFLAGS: 00010202   (2.6.9-22.ELsmp)
Feb  1 13:54:01 trogdor kernel: EIP is at iput+0x25/0x61
Feb  1 13:54:01 trogdor kernel: eax: 0000566e   ebx: c3dbac28   ecx:
c3dbac00 edx: c038df80
Feb  1 13:54:01 trogdor kernel: esi: f765b214   edi: f765b21c   ebp:
0000001c esp: efdcccf0
Feb  1 13:54:01 trogdor kernel: ds: 007b   es: 007b   ss: 0068
Feb  1 13:54:01 trogdor kernel: Process diskfs (pid: 26271,
threadinfo=efdcc000 task=eda5d330) Feb  1 13:54:01 trogdor kernel:
Stack: c3dbac28 c016e0c5 00000000 00000084 00000000 f7ffe9c0 c016e443
c0148718
Feb  1 13:54:01 trogdor kernel:        00135c40 00000000 00000005
00000000 00035128 000001d2 000000f4 c0328b20
Feb  1 13:54:01 trogdor kernel:        000001d2 00000007 efdccdb0
c01496fc 00035128 00000018 000000c3 00000000
Feb  1 13:54:01 trogdor kernel: Call Trace:
Feb  1 13:54:01 trogdor kernel:  [<c016e0c5>] prune_dcache+0x154/0x19a
Feb  1 13:54:01 trogdor kernel:  [<c016e443>]
shrink_dcache_memory+0x14/0x2b Feb  1 13:54:01 trogdor kernel:
[<c0148718>] shrink_slab+0xf8/0x161 Feb  1 13:54:01 trogdor kernel:
[<c01496fc>] try_to_free_pages+0xd1/0x1a7 Feb  1 13:54:01 trogdor
kernel:  [<c0143276>] __alloc_pages+0x1fe/0x2f7 Feb  1 13:54:01 trogdor
kernel:  [<c0145767>] do_page_cache_readahead+0xe7/0x158
Feb  1 13:54:01 trogdor kernel:  [<c0145909>]
page_cache_readahead+0x131/0x19e Feb  1 13:54:01 trogdor kernel:
[<c013fd47>] do_generic_mapping_read+0xfa/0x3ae
Feb  1 13:54:01 trogdor kernel:  [<c0140263>]
__generic_file_aio_read+0x19f/0x1bd
Feb  1 13:54:01 trogdor kernel:  [<c013fffb>] file_read_actor+0x0/0xc9
Feb  1 13:54:01 trogdor kernel:  [<c01402c1>]
generic_file_aio_read+0x40/0x47 Feb  1 13:54:01 trogdor kernel:
[<c0159b79>] do_sync_read+0x97/0xc9 Feb  1 13:54:01 trogdor kernel:
[<c016501f>] permission+0x4a/0x4f Feb  1 13:54:01 trogdor kernel:
[<c02cf6e3>] __cond_resched+0x14/0x39 Feb  1 13:54:01 trogdor kernel:
[<c0159281>] dentry_open+0xf0/0x1a5 Feb  1 13:54:01 trogdor kernel:
[<c011ffb1>] autoremove_wake_function+0x0/0x2d Feb  1 13:54:01 trogdor
kernel:  [<c0159c61>] vfs_read+0xb6/0xe2 Feb  1 13:54:02 trogdor kernel:
[<c0159e74>] sys_read+0x3c/0x62 Feb  1 13:54:02 trogdor kernel:
[<c02d10cf>] syscall_call+0x7/0xb Feb  1 13:54:02 trogdor kernel:
[<c02d007b>] __lock_text_end+0x11a/0x100f Feb  1 13:54:02 trogdor
kernel: Code: ff e9 e5 fe ff ff 53 85 c0 89 c3 74 58 83 bb 3c 01 00 00
20 8b 80 a4 00 00  00 8b 40 24 75 08 0f 0b 54 04 f6 89 2e c0 85 c0 74 0b
<8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c ba f0 9d 32 c0 e8 66

=============== PANIC 2 ==============
VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice
day...
Unable to handle kernel NULL pointer dereference at virtual address
00000017  printing eip:
c01815f7
*pde = 3086a001
*pte = 00000000
Oops: 0000
ide-cd loop lvm-mod st sr_mod cdrom audit usbserial lp parport 8021q
netconsole autofs4 nfs lockd s unrpc tg3 sg microcode lpfcdfc keybdev
mousedev hid input u
CPU:    0
EIP:    0060:[<c01815f7>]    Tainted: GF
EFLAGS: 00010286

EIP is at iput [kernel] 0x37 (2.4.21-37.ELsmp/i686)
eax: ffffffff   ebx: db49cd80   ecx: db49cd90   edx: db49cd90
esi: ffffffff   edi: c2db2000   ebp: 00000443   esp: c2235f6c
ds: 0068   es: 0068   ss: 0068
Process kswapd (pid: 7, stackpage=c2235000)
Stack: dff55180 c017e2c0 f33f8280 f58dde18 f58dde00 db49cd80 c017e71f
db49cd80
       cf14cf80 c03a7b80 00000873 00000001 00000040 c017eb98 0000049e
00000001
       c0157180 00000006 000001d0 00000014 00004d09 00000000 000000ff
00000000
Call Trace:   [<c017e2c0>] dput [kernel] 0x30 (0xc2235f70)
[<c017e71f>] prune_dcache [kernel] 0xdf (0xc2235f84) [<c017eb98>]
shrink_dcache_memory [kernel] 0x68 (0xc2235fa0) [<c0157180>]
do_try_to_free_pages_kswapd [kernel] 0x150 (0xc2235fac) [<c0157348>]
kswapd [kernel] 0x68 (0xc2235fd0) [<c01572e0>] kswapd [kernel] 0x0
(0xc2235fe4) [<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xc2235ff0)

Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 9c c5 3a c0 8d
=================



Version-Release number of selected component (if applicable):
RHEL3U6. Also seen on RHEL4 !!!

How reproducible:
Sometimes

Steps to Reproduce:
1. Run HP Hazard tests (I'll try to get more detail)
2.
3.
  

Additional info:

Comment 1 Phil Knirsch 2006-02-13 10:24:20 UTC
Reassigning to kernel as the filesystem component relates to the filesystem
package we ship which is the directory skeleton for our distribution.

Read ya, Phil

Comment 2 Ernie Petrides 2006-02-13 20:17:15 UTC

*** This bug has been marked as a duplicate of 175216 ***

Comment 3 Ernie Petrides 2006-02-18 00:26:15 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.2.EL).


Comment 4 Ernie Petrides 2006-04-28 21:45:10 UTC
Adding a couple dozen bugs to CanFix list so I can complete the stupid advisory.


Note You need to log in before you can comment on or make changes to this bug.