Bug 236464

Summary: Kernel crashes while accessing a directory, assertion failure in dx_probe
Product: [Fedora] Fedora Reporter: Ashish Shukla <wahjava>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: esandeen, javier.miguel, sertacyildiz, sgrubb
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.22.9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-30 05:15:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ashish Shukla 2007-04-14 16:05:35 UTC
Description of problem:
Got a segmentation fault with kernel stack trace while accessing a directory.

Version-Release number of selected component (if applicable):
2.6.20-1.2933.fc6

How reproducible:
Not reproducible every time.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Got following in dmesg:
-- begin dump --
Assertion failure in dx_probe() at fs/ext3/namei.c:384: "dx_get_limit(entries)
== dx_root_limit(dir, root->info.info_length)"
------------[ cut here ]------------
kernel BUG at fs/ext3/namei.c:384!
invalid opcode: 0000 [1] SMP
last sysfs file: /class/net/eth0/statistics/collisions
CPU 1
Modules linked in: i915 drm ppdev autofs4 hidp l2cap bluetooth sunrpc
nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink
iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables
x_tables acpi_cpufreq nls_utf8 ntfs(U) dm_mirror dm_multipath dm_mod video sbs i2
c_ec dock button battery asus_acpi backlight ac ipv6 lp snd_hda_intel
snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd
_pcm_oss snd_mixer_oss snd_pcm sg snd_timer snd pcspkr soundcore ide_cd
snd_page_alloc parport_pc serio_raw cdrom parport i2c_i801 i2c_core shpchp e100 mii
 iTCO_wdt iTCO_vendor_support ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd
ohci_hcd uhci_hcd
Pid: 3539, comm: ls Not tainted 2.6.20-1.2925.fc6 #1
RIP: 0010:[<ffffffff8803d139>]  [<ffffffff8803d139>] :ext3:dx_probe+0x141/0x278
RSP: 0018:ffff81001c661d28  EFLAGS: 00010282
RAX: 0000000000000081 RBX: ffff810017716000 RCX: ffffffff8057dc58
RDX: ffffffff8057dc58 RSI: 0000000000000000 RDI: ffffffff8057dc40
RBP: 0000000000000000 R08: ffffffff8057dc58 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100102fa8e8
R13: 0000000000000000 R14: ffff81000524cc90 R15: ffff81001c661dc4
FS:  00002aaaaaad5bd0(0000) GS:ffff81001f4a4ac0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031cea90690 CR3: 0000000019b5d000 CR4: 00000000000006e0
Process ls (pid: 3539, threadinfo ffff81001c660000, task ffff810017436800)
Stack:  ffff81001c661d78 ffff81000d595500 ffff8100176d51c0 ffff81000524cc90
 0000000000000000 ffff8100176d51c0 ffff81001c661f38 ffffffff8803e3c4
 ffff81000d595528 000000000524cc90 0000000000000000 ffff81000447faf8
Call Trace:
 [<ffffffff8803e3c4>] :ext3:ext3_htree_fill_tree+0xae/0x1cb
 [<ffffffff802257a3>] filldir+0x0/0xb7
 [<ffffffff88036e3c>] :ext3:ext3_readdir+0x1a3/0x4df
 [<ffffffff802257a3>] filldir+0x0/0xb7
 [<ffffffff80318e54>] file_has_perm+0x94/0xa3
 [<ffffffff802257a3>] filldir+0x0/0xb7
 [<ffffffff80233e1d>] vfs_readdir+0x77/0xa9
 [<ffffffff802373c5>] sys_getdents+0x75/0xbd
 [<ffffffff8022dbc9>] sys_fcntl+0x2da/0x2e6
 [<ffffffff8025911e>] system_call+0x7e/0x83


Code: 0f 0b eb fe 48 8b 1c 24 66 8b 56 02 66 85 d2 74 08 66 39 16
RIP  [<ffffffff8803d139>] :ext3:dx_probe+0x141/0x278
 RSP <ffff81001c661d28>
-- end dump --

Expected results:
No segmentation fault should occur.

Additional info:

Comment 1 Ashish Shukla 2007-04-14 16:10:17 UTC
The previous stack trace is with old kernel version. This one is from the latest
version.
-- begin dump --
Apr 14 20:31:23 dkshukla-desktop kernel: Assertion failure in dx_probe() at
fs/ext3/namei.c:384: "dx_get_limit(entries) == dx_root_limit(dir, root->info.in
fo_length)"
Apr 14 20:31:23 dkshukla-desktop kernel: ------------[ cut here ]------------
Apr 14 20:31:23 dkshukla-desktop kernel: kernel BUG at fs/ext3/namei.c:384!
Apr 14 20:31:23 dkshukla-desktop kernel: invalid opcode: 0000 [1] SMP
Apr 14 20:31:23 dkshukla-desktop kernel: last sysfs file:
/class/net/eth0/statistics/collisions
Apr 14 20:31:23 dkshukla-desktop kernel: CPU 1
Apr 14 20:31:23 dkshukla-desktop kernel: Modules linked in: i915 drm ppdev
autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntr
ack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT
xt_tcpudp ip6table_filter ip6_tables x_tables acpi_cpufreq nls_utf8 ntfs(U) d
m_mirror dm_multipath dm_mod video sbs i2c_ec dock button battery asus_acpi
backlight ac ipv6 lp snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_
seq_midi_event snd_seq sg snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp e100 parport_pc iTCO_wdt iTCO_vend
or_support serio_raw mii parport pcspkr i2c_i801 i2c_core ide_cd cdrom ata_piix
libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Apr 14 20:31:23 dkshukla-desktop kernel: Pid: 3263, comm: beagled Not tainted
2.6.20-1.2933.fc6 #1
Apr 14 20:31:23 dkshukla-desktop kernel: RIP: 0010:[<ffffffff8803d13d>] 
[<ffffffff8803d13d>] :ext3:dx_probe+0x141/0x278
Apr 14 20:31:23 dkshukla-desktop kernel: RSP: 0018:ffff810009413d28  EFLAGS:
00010282
Apr 14 20:31:23 dkshukla-desktop kernel: RAX: 0000000000000081 RBX:
ffff81000f2b2000 RCX: ffffffff8057fc58
Apr 14 20:31:23 dkshukla-desktop kernel: RDX: ffffffff8057fc58 RSI:
0000000000000000 RDI: ffffffff8057fc40
Apr 14 20:31:23 dkshukla-desktop kernel: RBP: 0000000000000000 R08:
ffffffff8057fc58 R09: 0000000000000000
Apr 14 20:31:23 dkshukla-desktop kernel: R10: 0000000000000000 R11:
ffffffff805d3000 R12: ffff81001cd2d610
Apr 14 20:31:23 dkshukla-desktop kernel: R13: 0000000000000000 R14:
ffff810003460c90 R15: ffff810009413dc4
Apr 14 20:31:23 dkshukla-desktop kernel: FS:  0000000040726940(0063)
GS:ffff81001f4a4ac0(0000) knlGS:0000000000000000
Apr 14 20:31:23 dkshukla-desktop kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Apr 14 20:31:23 dkshukla-desktop kernel: CR2: 0000000001affb58 CR3:
0000000009dd0000 CR4: 00000000000006e0
Apr 14 20:31:23 dkshukla-desktop kernel: Process beagled (pid: 3263, threadinfo
ffff810009412000, task ffff8100104d5840)
Apr 14 20:31:23 dkshukla-desktop kernel: Stack:  ffff810009413d78
ffff810012084d40 ffff8100177c5d00 ffff810003460c90
Apr 14 20:31:23 dkshukla-desktop kernel:  0000000000000000 ffff8100177c5d00
ffff810009413f38 ffffffff8803e3c8
Apr 14 20:31:23 dkshukla-desktop kernel:  ffff810012084d68 0000000080319f33
0000000000000000 ffff81000cd98198
Apr 14 20:31:23 dkshukla-desktop kernel: Call Trace:
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff8803e3c8>]
:ext3:ext3_htree_fill_tree+0xae/0x1cb
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff802257f0>] filldir+0x0/0xb7
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff88036e3c>]
:ext3:ext3_readdir+0x1a3/0x4df
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff802257f0>] filldir+0x0/0xb7
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff80319fd4>]
file_has_perm+0x94/0xa3
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff802257f0>] filldir+0x0/0xb7
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff80233eb0>] vfs_readdir+0x77/0xa9
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff80237458>]
sys_getdents+0x75/0xbd
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff8021d950>] sys_close+0x93/0xd1
Apr 14 20:31:23 dkshukla-desktop kernel:  [<ffffffff8025a11e>] system_call+0x7e/0x83
Apr 14 20:31:23 dkshukla-desktop kernel:
Apr 14 20:31:23 dkshukla-desktop kernel:
Apr 14 20:31:23 dkshukla-desktop kernel: Code: 0f 0b eb fe 48 8b 1c 24 66 8b 56
02 66 85 d2 74 08 66 39 16
Apr 14 20:31:23 dkshukla-desktop kernel: RIP  [<ffffffff8803d13d>]
:ext3:dx_probe+0x141/0x278
Apr 14 20:31:23 dkshukla-desktop kernel:  RSP <ffff810009413d28>
-- end dump --

Comment 2 Chuck Ebbert 2007-04-16 23:10:18 UTC
Was fsck run on this filesystem after the error happened?


Comment 3 Ashish Shukla 2007-04-29 06:30:05 UTC
Sorry for this late reply. I did an fsck on the filesystem, and after that no
kernel panics.

Thanks

Comment 4 Chuck Ebbert 2007-04-30 22:49:35 UTC
Another oops from filesystem cossuption.

Comment 5 Eric Sandeen 2007-05-01 01:09:55 UTC
just for future reference, at least saving the output of fsck will often help. 
Making an e2image would be even better.  Either is easier than trying to work
backwards from the oops.

Thanks,
-Eric

Comment 6 Alexander Viro 2007-08-08 02:12:27 UTC
Well, it's not hard to reproduce, but the real issue here is that
we have an assert on contents of bh->b_data we'd just read from
disk.  This is the first time we are looking at the entries->limit
and if it's corrupt on-disk the assert will trigger, of course.

IOW, it's a bullshit test in bull^Wdaniel's code.  Proper reaction
is to give a warning and bail out as it's done for tests just above.

Comment 7 Eric Sandeen 2007-08-08 02:46:01 UTC
Hm, good point.  I was thinking in terms of "how did we get here" but yeah, now
that we've gotten here, no need to bring down the machine.  I'll look into
fixing that up.

Thanks Al.

Comment 8 Eric Sandeen 2007-08-09 21:46:18 UTC
Sent a patch upstream for this,
http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg03014.html

-Eric

Comment 9 Eric Sandeen 2007-08-09 21:56:30 UTC
Out of curiosity, did you happen to use a windows ext3 driver to access this
filesystem?

Comment 10 Ashish Shukla 2007-08-10 04:24:46 UTC
Yes, I've used Windows Ext2 IFS ( available at http://fs-driver.org/ ).

Comment 11 Eric Sandeen 2007-08-10 04:58:00 UTC
FYI, after perusing several of these bugs, it is clear to me that that driver
corrupts htree directories...

Comment 12 Eric Sandeen 2007-08-10 16:16:43 UTC
*** Bug 246398 has been marked as a duplicate of this bug. ***

Comment 13 Eric Sandeen 2007-08-10 16:51:18 UTC
*** Bug 213802 has been marked as a duplicate of this bug. ***

Comment 14 Eric Sandeen 2007-08-10 16:53:44 UTC
slightly more descriptive summary...

Comment 15 Sertaç Ö. Yıldız 2007-08-10 19:31:19 UTC
fwiw, this was what i had with bug #213802:

# umount /dev/sda6
umount: /home: device is busy
umount: /home: device is busy
# fsck -v /dev/sda6
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
/dev/sda6 is mounted.

WARNING!!!  Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? yes

HOME: clean, 63289/3571712 files, 6439415/7134860 blocks
# umount /home
umount: /home: device is busy
umount: /home: device is busy
# fsck -fv /dev/sda6
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
/dev/sda6 is mounted.

WARNING!!!  Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? yes

Pass 1: Checking inodes, blocks, and sizes
HTREE directory inode 3063809 has an invalid root node.
Clear HTree index<y>? yes

Pass 2: Checking directory structure
...

need to file a bug against e2fsprogs?

Comment 16 Eric Sandeen 2007-09-18 17:31:31 UTC
re: comment #15, Sertaç, what bug would you file against e2fsprogs?  I'm not
sure what your comment shows as a problem.

Thanks,
-Eric

Comment 17 Eric Sandeen 2007-09-24 21:45:36 UTC
The patch to resolve this is queued for 2.6.23, as well as 2.6.22.8.

Comment 18 Eric Sandeen 2007-11-30 05:15:45 UTC
Fixed in 2.6.22.9, available for fc6 now.

Thanks,
-Eric