Description of problem: Got a segmentation fault with kernel stack trace while accessing a directory. Version-Release number of selected component (if applicable): 2.6.20-1.2933.fc6 How reproducible: Not reproducible every time. Steps to Reproduce: 1. 2. 3. Actual results: Got following in dmesg: -- begin dump -- Assertion failure in dx_probe() at fs/ext3/namei.c:384: "dx_get_limit(entries) == dx_root_limit(dir, root->info.info_length)" ------------[ cut here ]------------ kernel BUG at fs/ext3/namei.c:384! invalid opcode: 0000 [1] SMP last sysfs file: /class/net/eth0/statistics/collisions CPU 1 Modules linked in: i915 drm ppdev autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables acpi_cpufreq nls_utf8 ntfs(U) dm_mirror dm_multipath dm_mod video sbs i2 c_ec dock button battery asus_acpi backlight ac ipv6 lp snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd _pcm_oss snd_mixer_oss snd_pcm sg snd_timer snd pcspkr soundcore ide_cd snd_page_alloc parport_pc serio_raw cdrom parport i2c_i801 i2c_core shpchp e100 mii iTCO_wdt iTCO_vendor_support ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 3539, comm: ls Not tainted 2.6.20-1.2925.fc6 #1 RIP: 0010:[<ffffffff8803d139>] [<ffffffff8803d139>] :ext3:dx_probe+0x141/0x278 RSP: 0018:ffff81001c661d28 EFLAGS: 00010282 RAX: 0000000000000081 RBX: ffff810017716000 RCX: ffffffff8057dc58 RDX: ffffffff8057dc58 RSI: 0000000000000000 RDI: ffffffff8057dc40 RBP: 0000000000000000 R08: ffffffff8057dc58 R09: 00000000ffffffff R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100102fa8e8 R13: 0000000000000000 R14: ffff81000524cc90 R15: ffff81001c661dc4 FS: 00002aaaaaad5bd0(0000) GS:ffff81001f4a4ac0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000031cea90690 CR3: 0000000019b5d000 CR4: 00000000000006e0 Process ls (pid: 3539, threadinfo ffff81001c660000, task ffff810017436800) Stack: ffff81001c661d78 ffff81000d595500 ffff8100176d51c0 ffff81000524cc90 0000000000000000 ffff8100176d51c0 ffff81001c661f38 ffffffff8803e3c4 ffff81000d595528 000000000524cc90 0000000000000000 ffff81000447faf8 Call Trace: [<ffffffff8803e3c4>] :ext3:ext3_htree_fill_tree+0xae/0x1cb [<ffffffff802257a3>] filldir+0x0/0xb7 [<ffffffff88036e3c>] :ext3:ext3_readdir+0x1a3/0x4df [<ffffffff802257a3>] filldir+0x0/0xb7 [<ffffffff80318e54>] file_has_perm+0x94/0xa3 [<ffffffff802257a3>] filldir+0x0/0xb7 [<ffffffff80233e1d>] vfs_readdir+0x77/0xa9 [<ffffffff802373c5>] sys_getdents+0x75/0xbd [<ffffffff8022dbc9>] sys_fcntl+0x2da/0x2e6 [<ffffffff8025911e>] system_call+0x7e/0x83 Code: 0f 0b eb fe 48 8b 1c 24 66 8b 56 02 66 85 d2 74 08 66 39 16 RIP [<ffffffff8803d139>] :ext3:dx_probe+0x141/0x278 RSP <ffff81001c661d28> -- end dump -- Expected results: No segmentation fault should occur. Additional info:
The previous stack trace is with old kernel version. This one is from the latest version. -- begin dump -- Apr 14 20:31:23 dkshukla-desktop kernel: Assertion failure in dx_probe() at fs/ext3/namei.c:384: "dx_get_limit(entries) == dx_root_limit(dir, root->info.in fo_length)" Apr 14 20:31:23 dkshukla-desktop kernel: ------------[ cut here ]------------ Apr 14 20:31:23 dkshukla-desktop kernel: kernel BUG at fs/ext3/namei.c:384! Apr 14 20:31:23 dkshukla-desktop kernel: invalid opcode: 0000 [1] SMP Apr 14 20:31:23 dkshukla-desktop kernel: last sysfs file: /class/net/eth0/statistics/collisions Apr 14 20:31:23 dkshukla-desktop kernel: CPU 1 Apr 14 20:31:23 dkshukla-desktop kernel: Modules linked in: i915 drm ppdev autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntr ack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables acpi_cpufreq nls_utf8 ntfs(U) d m_mirror dm_multipath dm_mod video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 lp snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_ seq_midi_event snd_seq sg snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc shpchp e100 parport_pc iTCO_wdt iTCO_vend or_support serio_raw mii parport pcspkr i2c_i801 i2c_core ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Apr 14 20:31:23 dkshukla-desktop kernel: Pid: 3263, comm: beagled Not tainted 2.6.20-1.2933.fc6 #1 Apr 14 20:31:23 dkshukla-desktop kernel: RIP: 0010:[<ffffffff8803d13d>] [<ffffffff8803d13d>] :ext3:dx_probe+0x141/0x278 Apr 14 20:31:23 dkshukla-desktop kernel: RSP: 0018:ffff810009413d28 EFLAGS: 00010282 Apr 14 20:31:23 dkshukla-desktop kernel: RAX: 0000000000000081 RBX: ffff81000f2b2000 RCX: ffffffff8057fc58 Apr 14 20:31:23 dkshukla-desktop kernel: RDX: ffffffff8057fc58 RSI: 0000000000000000 RDI: ffffffff8057fc40 Apr 14 20:31:23 dkshukla-desktop kernel: RBP: 0000000000000000 R08: ffffffff8057fc58 R09: 0000000000000000 Apr 14 20:31:23 dkshukla-desktop kernel: R10: 0000000000000000 R11: ffffffff805d3000 R12: ffff81001cd2d610 Apr 14 20:31:23 dkshukla-desktop kernel: R13: 0000000000000000 R14: ffff810003460c90 R15: ffff810009413dc4 Apr 14 20:31:23 dkshukla-desktop kernel: FS: 0000000040726940(0063) GS:ffff81001f4a4ac0(0000) knlGS:0000000000000000 Apr 14 20:31:23 dkshukla-desktop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 14 20:31:23 dkshukla-desktop kernel: CR2: 0000000001affb58 CR3: 0000000009dd0000 CR4: 00000000000006e0 Apr 14 20:31:23 dkshukla-desktop kernel: Process beagled (pid: 3263, threadinfo ffff810009412000, task ffff8100104d5840) Apr 14 20:31:23 dkshukla-desktop kernel: Stack: ffff810009413d78 ffff810012084d40 ffff8100177c5d00 ffff810003460c90 Apr 14 20:31:23 dkshukla-desktop kernel: 0000000000000000 ffff8100177c5d00 ffff810009413f38 ffffffff8803e3c8 Apr 14 20:31:23 dkshukla-desktop kernel: ffff810012084d68 0000000080319f33 0000000000000000 ffff81000cd98198 Apr 14 20:31:23 dkshukla-desktop kernel: Call Trace: Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff8803e3c8>] :ext3:ext3_htree_fill_tree+0xae/0x1cb Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff802257f0>] filldir+0x0/0xb7 Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff88036e3c>] :ext3:ext3_readdir+0x1a3/0x4df Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff802257f0>] filldir+0x0/0xb7 Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff80319fd4>] file_has_perm+0x94/0xa3 Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff802257f0>] filldir+0x0/0xb7 Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff80233eb0>] vfs_readdir+0x77/0xa9 Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff80237458>] sys_getdents+0x75/0xbd Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff8021d950>] sys_close+0x93/0xd1 Apr 14 20:31:23 dkshukla-desktop kernel: [<ffffffff8025a11e>] system_call+0x7e/0x83 Apr 14 20:31:23 dkshukla-desktop kernel: Apr 14 20:31:23 dkshukla-desktop kernel: Apr 14 20:31:23 dkshukla-desktop kernel: Code: 0f 0b eb fe 48 8b 1c 24 66 8b 56 02 66 85 d2 74 08 66 39 16 Apr 14 20:31:23 dkshukla-desktop kernel: RIP [<ffffffff8803d13d>] :ext3:dx_probe+0x141/0x278 Apr 14 20:31:23 dkshukla-desktop kernel: RSP <ffff810009413d28> -- end dump --
Was fsck run on this filesystem after the error happened?
Sorry for this late reply. I did an fsck on the filesystem, and after that no kernel panics. Thanks
Another oops from filesystem cossuption.
just for future reference, at least saving the output of fsck will often help. Making an e2image would be even better. Either is easier than trying to work backwards from the oops. Thanks, -Eric
Well, it's not hard to reproduce, but the real issue here is that we have an assert on contents of bh->b_data we'd just read from disk. This is the first time we are looking at the entries->limit and if it's corrupt on-disk the assert will trigger, of course. IOW, it's a bullshit test in bull^Wdaniel's code. Proper reaction is to give a warning and bail out as it's done for tests just above.
Hm, good point. I was thinking in terms of "how did we get here" but yeah, now that we've gotten here, no need to bring down the machine. I'll look into fixing that up. Thanks Al.
Sent a patch upstream for this, http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg03014.html -Eric
Out of curiosity, did you happen to use a windows ext3 driver to access this filesystem?
Yes, I've used Windows Ext2 IFS ( available at http://fs-driver.org/ ).
FYI, after perusing several of these bugs, it is clear to me that that driver corrupts htree directories...
*** Bug 246398 has been marked as a duplicate of this bug. ***
*** Bug 213802 has been marked as a duplicate of this bug. ***
slightly more descriptive summary...
fwiw, this was what i had with bug #213802: # umount /dev/sda6 umount: /home: device is busy umount: /home: device is busy # fsck -v /dev/sda6 fsck 1.39 (29-May-2006) e2fsck 1.39 (29-May-2006) /dev/sda6 is mounted. WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage. Do you really want to continue (y/n)? yes HOME: clean, 63289/3571712 files, 6439415/7134860 blocks # umount /home umount: /home: device is busy umount: /home: device is busy # fsck -fv /dev/sda6 fsck 1.39 (29-May-2006) e2fsck 1.39 (29-May-2006) /dev/sda6 is mounted. WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage. Do you really want to continue (y/n)? yes Pass 1: Checking inodes, blocks, and sizes HTREE directory inode 3063809 has an invalid root node. Clear HTree index<y>? yes Pass 2: Checking directory structure ... need to file a bug against e2fsprogs?
re: comment #15, Sertaç, what bug would you file against e2fsprogs? I'm not sure what your comment shows as a problem. Thanks, -Eric
The patch to resolve this is queued for 2.6.23, as well as 2.6.22.8.
Fixed in 2.6.22.9, available for fc6 now. Thanks, -Eric