213802 – BUG at fs/ext3/namei.c:383

Bug 213802 - BUG at fs/ext3/namei.c:383

Summary: BUG at fs/ext3/namei.c:383

Keywords:
Status:	CLOSED DUPLICATE of bug 236464
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Eric Sandeen
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-11-03 01:06 UTC by Sertaç Ö. Yıldız
Modified:	2007-11-30 22:11 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-08-10 16:51:13 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sertaç Ö. Yıldız 2006-11-03 01:06:08 UTC

Description of problem:
I got the attached kernel error when trying to access my home directory (or
unmount the /home partition) after boot. fsck was claiming the filesystem to be
clean but after forcing a check an unclean HTREE index was found. Cleaning that
index fixed the problem.

------------[ cut here ]------------
kernel BUG at fs/ext3/namei.c:383!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:1c.1/0000:03:00.0/cmd
Modules linked in: i915 drm ipv6 cpufreq_ondemand dm_mirror dm_mod video sbs
i2c_ec button battery ac parport_pc lp parport omnibook(U) backlight
snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm ipw3945(U) joydev sg ide_cd
ohci1394 snd_timer snd ieee80211 pcspkr ieee1394 soundcore i2c_i801
ieee80211_crypt snd_page_alloc e100 i2c_core mii sdhci cdrom mmc_core serio_raw
ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<f8bb0c6a>]    Not tainted VLI
EFLAGS: 00010296   (2.6.18-1.2798.fc6 #1) 
EIP is at dx_probe+0x174/0x2d8 [ext3]
eax: 00000081   ebx: f00cad5c   ecx: c067e1d0   edx: 00000082
esi: efc4e018   edi: 00000000   ebp: f07e95e8   esp: c1951d04
ds: 007b   es: 007b   ss: 0068
Process login (pid: 2127, ti=c1951000 task=f762a7a0 task.ti=c1951000)
Stack: f8bbc0ea f8bbb1e8 f8bbc0da 0000017f f8bbc270 0151c5af c196ef00 f00cad5c 
       c1951d68 3f67cb34 f07bb1ac f00cad5c f07bb1ac f00caddc f8bb1706 c1951dc0 
       c1951de8 00000002 c1951d7c c04718f0 c1951e14 f07bb1ac 0000000a f74d8800 
Call Trace:
 [<f8bb1706>] ext3_find_entry+0xc6/0x57c [ext3]
 [<f8bb2fba>] ext3_lookup+0x27/0xc4 [ext3]
 [<c047ac2e>] do_lookup+0xb2/0x15a
 [<c047ca3c>] __link_path_walk+0x8b0/0xd76
 [<c047cf4b>] link_path_walk+0x49/0xbd
 [<c047d328>] do_path_lookup+0x21a/0x26b
 [<c047daed>] __user_walk_fd+0x2f/0x40
 [<c046e25c>] sys_faccessat+0x96/0x129
 [<c046e30e>] sys_access+0x1f/0x23
 [<c0404013>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 =======================
Code: 44 24 10 70 c2 bb f8 c7 44 24 0c 7f 01 00 00 c7 44 24 08 da c0 bb f8 c7 44
24 04 e8 b1 bb f8 c7 04 24 
ea c0 bb f8 e8 24 4b 87 c7 <0f> 0b 7f 01 da c0 bb f8 8b 44 24 3c 89 44 24 20 66
8b 46 02 66 
EIP: [<f8bb0c6a>] dx_probe+0x174/0x2d8 [ext3] SS:ESP 0068:c1951d04

Comment 1 Eric Sandeen 2007-01-15 16:59:51 UTC

I believe it was this assert that tripped in dx_probe:

        assert(dx_get_limit(entries) == dx_root_limit(dir,
                                                      root->info.info_length));

I'll need to find some time to try to recreate this one, corrupted filesystems
should be handled gracefully, and not cause oops.

Thanks for the report,
-Eric

Comment 2 Sertaç Ö. Yıldız 2007-04-02 23:10:08 UTC

I just ran into this again with kernel 2.6.20-1.2933.fc6PAE. Anything more I can
provide if this happens again?


Assertion failure in dx_probe() at fs/ext3/namei.c:384: "dx_get_limit(entries)
== dx_root_limit(dir, root->info.info_length)"
------------[ cut here ]------------
kernel BUG at fs/ext3/namei.c:384!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:1c.1/0000:03:00.0/cmd
Modules linked in: ndiswrapper(U) cpufreq_ondemand dm_mirror dm_mod video sbs
i2c_ec dock button battery ac ipv6 parport_pc lp parport omnibook(U) backlight
snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
joydev snd_seq_device sr_mod snd_pcm_oss cdrom snd_mixer_oss ipw3945(U) snd_pcm
snd_timer snd sg pcspkr iTCO_wdt soundcore ieee80211 i2c_i801 snd_page_alloc
ohci1394 iTCO_vendor_support ieee80211_crypt i2c_core ieee1394 e100 mii
tifm_7xx1 sdhci tifm_core mmc_core serio_raw ata_piix libata sd_mod scsi_mod
ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    1
EIP:    0060:[<f8b8893c>]    Tainted: P      VLI
EFLAGS: 00010292   (2.6.20-1.2933.fc6PAE #1)
EIP is at dx_probe+0x175/0x2c7 [ext3]
eax: 00000081   ebx: f52e0018   ecx: c0702090   edx: 00000082
esi: f6d0160c   edi: ec206c88   ebp: 00000000   esp: f5a9dcd8
ds: 007b   es: 007b   ss: 0068
Process sh (pid: 2947, ti=f5a9d000 task=f71046f0 task.ti=f5a9d000)
Stack: f8b93703 f8b927ab f8b936f3 00000180 f8b93877 0000005c c048f1d0 f6d0160c 
       c0479ad9 9f46c4da f5b00c58 f6d0160c f5b00c58 f6d0168c f8b892d3 f5a9dd94 
       f5a9ddbc 00000002 00000002 ffffffff f5a9dde8 f5b00c58 00000003 f7767800 
Call Trace:
 [<c048f1d0>] __getblk+0x3b/0x289
 [<c0479ad9>] do_lookup+0x4f/0x143
 [<f8b892d3>] ext3_find_entry+0xc6/0x57e [ext3]
 [<c045ac18>] __alloc_pages+0x68/0x2aa
 [<f8b8ac0c>] ext3_lookup+0x27/0xc7 [ext3]
 [<c0482738>] d_alloc+0x171/0x17c
 [<c0479b30>] do_lookup+0xa6/0x143
 [<c047b34f>] __link_path_walk+0x2fa/0xc16
 [<c0621dcc>] _read_unlock_irq+0x5/0x7
 [<c047bcaf>] link_path_walk+0x44/0xb3
 [<c0621dc5>] _spin_unlock_irq+0x5/0x7
 [<c04ee880>] copy_to_user+0x3c/0x50
 [<c047bfbe>] do_path_lookup+0x17a/0x1ca
 [<c047c911>] __path_lookup_intent_open+0x45/0x75
 [<c047c9b0>] path_lookup_open+0x20/0x25
 [<c0477030>] open_exec+0x25/0xa5
 [<c0621dc5>] _spin_unlock_irq+0x5/0x7
 [<c04ee880>] copy_to_user+0x3c/0x50
 [<c047833c>] do_execve+0x35/0x1f0
 [<c040218a>] sys_execve+0x2f/0x4f
 [<c0403eee>] sysenter_past_esp+0x5f/0x85
 [<c0620033>] __sched_text_start+0x253/0xa21
 =======================
Code: 44 24 10 77 38 b9 f8 c7 44 24 0c 80 01 00 00 c7 44 24 08 f3 36 b9 f8 c7 44
24 04 ab 27 b9 f8 c7 04 24 03 37 b9 f8 e8 89 f6 89 c7 <0f> 0b eb fe 8b 44 24 3c
89 44 24 20 66 8b 53 02 66 85 d2 74 08 
EIP: [<f8b8893c>] dx_probe+0x175/0x2c7 [ext3] SS:ESP 0068:f5a9dcd8


btw, sould i also file this for e2fsprogs? fsck marks the filesystem clean
unless i force it.

Comment 3 Eric Sandeen 2007-04-02 23:13:40 UTC

Have you already fsck'd the filesystem?  If not if I could get a metadata image
of the fs with e2image, I could probably recreate it & fix it up.

-Eric

Comment 4 Sertaç Ö. Yıldız 2007-04-02 23:31:45 UTC

yes, i've forced the check. i'll try to get an image next time.

Comment 5 Eric Sandeen 2007-04-16 22:08:22 UTC

If nothing else, even the output from fsck would offer some clues.

Comment 6 Sertaç Ö. Yıldız 2007-05-02 11:29:36 UTC

I've reproduced this issue by:
* mount the ext3 partition from windows with http://www.fs-driver.org/ driver.
* open a file from this mounted partition with vim.

I have the image generated with e2image's '-s' option. It's around 16M bzip2'ed.
Will that be useful? If so, will bugzilla accept a file that big?

Comment 7 Eric Sandeen 2007-05-02 14:39:11 UTC

re: comment #6, I don't think I can support bugs in windows drivers.  :)  If you
must reproduce it through a windows DLL, I don't think that will help us
identify any linux kernel bug, I'm afraid.

I suppose it may be worth seeing if the linux kernel can be any more robust in
the face of this type of corruption, but it seems that the root cause, at least
in your most recent case, is likely a bug in the windows driver.

I'm not sure if you can attach 16M... is there any other place you can put it?

Comment 8 Eric Sandeen 2007-05-02 14:51:42 UTC

Was the original bug report for a filesystem that also had been mounted with
this windows driver?

Comment 9 Sertaç Ö. Yıldız 2007-05-02 16:46:01 UTC

(In reply to comment #7)
> re: comment #6, I don't think I can support bugs in windows drivers.  :)

I can't see a support request for the windows driver here.

>  If you
> must reproduce it through a windows DLL, I don't think that will help us
> identify any linux kernel bug, I'm afraid.

The message in $subject _is_ from linux kernel. And if you can reproduce it
within linux, fine with me:)

> I suppose it may be worth seeing if the linux kernel can be any more robust in
> the face of this type of corruption, but it seems that the root cause, at least
> in your most recent case, is likely a bug in the windows driver.

The cause (at least in the recent case) might be vim, the windows ext2 driver or
the windows os itself. But they continue to work without error. The effect
however is that linux kernel and fs tools cannot handle this.

BTW, i don't see the bug reports i file as support requests. I wouldn't use
rawhide to start with even if it were so.

And I wouldn't file this bug report if there weren't two similar reports closed
with WORKSFORME, because I didn't think I would reproduce a filesystem problem.

Comment 10 Eric Sandeen 2007-05-02 17:35:00 UTC

I know you're not asking for windows support, or necessarily support of any
kind.... my only point is, so far I think that you have demonstrated a bug in
the *windows* ext2 driver, which corrupts the filesystem, and linux is
discovering it, and tripping an ASSERT.  At best, perhaps linux could handle it
more gracefully.

Without evidence to the contrary, I believe that you have a corrupted ext
filesystem due to a buggy windows fs driver, and when you mount it under Linux,
Linux discovers the corruption and ungracefully BUGs.

Which other 2 similar reports were closed WORKSFORME?

Thanks,
-Eric

Comment 11 Eric Sandeen 2007-05-02 17:50:51 UTC

Also, I would still be interested in the e2image if you can provide it, in an
effort to make the kernel more robust in the face of this apparent corruption,
and to see how e2fsck behaves.

And, just to clarify - for the original report, had that filesystem been
previously mounted with the windows driver?

Thanks,

-Eric

Comment 12 Sertaç Ö. Yıldız 2007-05-02 18:58:11 UTC

(In reply to comment #10)
> so far I think that you have demonstrated a bug in
> the *windows* ext2 driver, which corrupts the filesystem, and linux is
> discovering it, and tripping an ASSERT.  At best, perhaps linux could handle it
> more gracefully.

I thought these BUG messages were for internal kernel bugs that shouldn't
happen, not for bugs it discovered.

> Which other 2 similar reports were closed WORKSFORME?

https://bugzilla.redhat.com/bugzilla/buglist.cgi?product=Fedora+Core&component=kernel&bug_status=CLOSED&short_desc_type=allwordssubstr&short_desc=namei.c&long_desc_type=allwordssubstr&long_desc=

(In reply to comment #11)
> had that filesystem been previously mounted with the windows driver?

I didn't have windows installed when I filed the original report. I've done the
partitioning with a Ubuntu cd and had ubuntu and fedora sharing the home
partition if it matters.

I don't recall if I had windows on the second BUG.

Comment 13 Sertaç Ö. Yıldız 2007-05-03 14:59:26 UTC

uploaded the image here:
http://rapidshare.com/files/29270018/sda6.e2i.bz2.gz.html

Comment 14 Eric Sandeen 2007-08-09 22:33:05 UTC

Ok, so this is a known problem with that windows driver, that it corrupts htree
directories.  The corruption could come from other problems as well, of course.

I've submitted a patch upstream to try to recover from this error rather than
BUG() & bring down the kernel.

e2fsck does seem to properly clean it up when forced... I'll look further into
why it needs to be forced to check.

I have about 3 of these bugs, 2 will fall victim to the dup-machine :)  But it's
all the same issue...

Thanks,
-Eric

Comment 15 Eric Sandeen 2007-08-10 16:51:13 UTC

Ok, even though you got here first :) I'm duping this bug to another with a bit
more info... thanks for the report, the kernel code will be more robust soon.

*** This bug has been marked as a duplicate of 236464 ***

Note You need to log in before you can comment on or make changes to this bug.