Bug 246398 - Assertion failure in dx_probe() at fs/ext3/namei.c
Summary: Assertion failure in dx_probe() at fs/ext3/namei.c
Keywords:
Status: CLOSED DUPLICATE of bug 236464
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL: http://gufete.net
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-01 19:51 UTC by Javier de Miguel Rodríguez
Modified: 2007-11-30 22:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-10 16:16:37 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Javier de Miguel Rodríguez 2007-07-01 19:51:59 UTC
Description of problem:

When I try to upgrade kernel (yum or rpm -Uvh) I get the following error:

Jul  1 21:40:15 amd64 kernel: Assertion failure in dx_probe() at
fs/ext3/namei.c:384: "dx_get_limit(entries) == dx_root_limit(dir,
root->info.info_length)"
Jul  1 21:40:15 amd64 kernel: ------------[ cut here ]------------
Jul  1 21:40:15 amd64 kernel: kernel BUG at fs/ext3/namei.c:384!
Jul  1 21:40:15 amd64 kernel: invalid opcode: 0000 [1] SMP
Jul  1 21:40:15 amd64 kernel: last sysfs file: /devices/pci0000:00/0000:00:18.3/name
Jul  1 21:40:15 amd64 kernel: CPU 0
Jul  1 21:40:15 amd64 kernel: Modules linked in: autofs4 hidp rfcomm l2cap
bluetooth sunrpc ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables
cpufreq_ondemand dm_mirror dm_multipath dm_mod raid1 video sbs i2c_ec dock
button battery asus_acpi backlight ac ipv6 parport_pc lp parport loop snd_bt87x
bt878 tuner tvaudio bttv snd_intel8x0 snd_ac97_codec video_buf ide_cd ir_common
compat_ioctl32 ac97_bus i2c_algo_bit snd_mpu401 cdrom snd_mpu401_uart
snd_seq_dummy snd_seq_oss snd_rawmidi btcx_risc tveeprom snd_seq_midi_event
snd_seq skge snd_seq_device videodev v4l2_common v4l1_compat snd_pcm_oss
ohci1394 snd_mixer_oss snd_pcm ieee1394 i2c_nforce2 floppy shpchp ns558
snd_timer pcspkr gameport i2c_core forcedeth snd k8temp k8_edac hwmon soundcore
edac_mc snd_page_alloc sata_nv libata sd_mod scsi_mod raid456 xor raid0 ext3 jbd
ehci_hcd ohci_hcd uhci_hcd
Jul  1 21:40:15 amd64 kernel: Pid: 3169, comm: rpmv Not tainted 2.6.20-1.2933.fc6 #1
Jul  1 21:40:15 amd64 kernel: RIP: 0010:[<ffffffff8803d13d>] 
[<ffffffff8803d13d>] :ext3:dx_probe+0x141/0x278
Jul  1 21:40:15 amd64 kernel: RSP: 0018:ffff81005a9f3a58  EFLAGS: 00010282
Jul  1 21:40:15 amd64 kernel: RAX: 0000000000000081 RBX: ffff810052705000 RCX:
ffffffff8057fc58
Jul  1 21:40:15 amd64 kernel: RDX: ffffffff8057fc58 RSI: 0000000000000000 RDI:
ffffffff8057fc40
Jul  1 21:40:15 amd64 kernel: RBP: 0000000000000000 R08: ffffffff8057fc58 R09:
00000000ffffffff
Jul  1 21:40:15 amd64 kernel: R10: 0000000000000000 R11: 0000000000000000 R12:
ffff810058718818
Jul  1 21:40:15 amd64 kernel: R13: 00000000d393763a R14: ffff81007b38d178 R15:
ffff81005a9f3bb4
Jul  1 21:40:15 amd64 kernel: FS:  00002aaaaaf24dd0(0000)
GS:ffffffff805d3000(0000) knlGS:00000000f7fc66c0
Jul  1 21:40:15 amd64 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul  1 21:40:15 amd64 kernel: CR2: 000000000074d000 CR3: 000000005aedb000 CR4:
00000000000006e0
Jul  1 21:40:15 amd64 kernel: Process rpmv (pid: 3169, threadinfo
ffff81005a9f2000, task ffff81005f170080)
Jul  1 21:40:15 amd64 kernel: Stack:  ffff81005a9f3b58 ffff81005270bbd8
ffff81007b38d178 ffff81007b38d178
Jul  1 21:40:15 amd64 kernel:  ffff81005270bbd8 ffff81007b38d178
ffff81007b38d178 ffffffff8803db37
Jul  1 21:40:15 amd64 kernel:  ffff81005a9f3c08 ffff81005270bbd8
0000001c00000000 ffff81007e357400
Jul  1 21:40:15 amd64 kernel: Call Trace:
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8803db37>] :ext3:ext3_find_entry+0xd7/0x5a8
Jul  1 21:40:15 amd64 kernel:  [<ffffffff88025da0>]
:jbd:journal_cancel_revoke+0x8f/0xb6
Jul  1 21:40:15 amd64 kernel:  [<ffffffff88021c3a>]
:jbd:do_get_write_access+0x4d5/0x507
Jul  1 21:40:15 amd64 kernel:  [<ffffffff880212a5>]
:jbd:__journal_file_buffer+0x125/0x21c
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80260ab5>] _read_unlock_irq+0x9/0xc
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80212a77>]
__do_page_cache_readahead+0xe5/0x1ee
Jul  1 21:40:15 amd64 kernel:  [<ffffffff88044896>]
:ext3:__ext3_journal_dirty_metadata+0x1e/0x46
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80260aa9>] _spin_unlock_irq+0x9/0xc
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8803f70b>] :ext3:ext3_lookup+0x31/0xf6
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80222709>] d_alloc+0x1a4/0x1af
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8020cb06>] do_lookup+0xc4/0x1ae
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8025de37>]
copy_user_generic_string+0x17/0x40
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80209c72>] __link_path_walk+0x903/0xdb0
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8020e74f>] link_path_walk+0x55/0xd7
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80297f09>]
autoremove_wake_function+0x0/0x2e
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8020cccb>] file_read_actor+0x0/0x166
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8020c8d4>] do_path_lookup+0x1b5/0x217
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80212380>] getname+0x152/0x1b8
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80223786>] __user_walk_fd+0x37/0x4c
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8023dbd3>] vfs_lstat_fd+0x18/0x47
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80297f09>]
autoremove_wake_function+0x0/0x2e
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8020cccb>] file_read_actor+0x0/0x166
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8022a49a>] sys_newlstat+0x19/0x31
Jul  1 21:40:15 amd64 kernel:  [<ffffffff80260aa9>] _spin_unlock_irq+0x9/0xc
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8021d634>] sigprocmask+0xba/0xc1
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8022f53b>] sys_rt_sigprocmask+0x50/0xcf
Jul  1 21:40:15 amd64 kernel:  [<ffffffff8025a11e>] system_call+0x7e/0x83
Jul  1 21:40:15 amd64 kernel:
Jul  1 21:40:15 amd64 kernel:
Jul  1 21:40:15 amd64 kernel: Code: 0f 0b eb fe 48 8b 1c 24 66 8b 56 02 66 85 d2
74 08 66 39 16
Jul  1 21:40:15 amd64 kernel: RIP  [<ffffffff8803d13d>] :ext3:dx_probe+0x141/0x278
Jul  1 21:40:15 amd64 kernel:  RSP <ffff81005a9f3a58>


Version-Release number of selected component (if applicable):

Fedora Core 7, (upgraded from fc6->fc7 via yum)

Linux amd64.casa.net 2.6.20-1.2933.fc6 #1 SMP Mon Mar 19 11:00:19 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux



How reproducible:

Always. Try to upgrade kernel 

Steps to Reproduce:
1.rpm -Uvh /var/cache/yum/updates/packages/kernel-2.6.21-1.3228.fc7.x86_64.rpm  
2. I get "Assertion failure in dx_probe() at fs/ext3/namei.c" in
/var/log/messages, so I can not upgrade the kernel
3.
  
Actual results:
No kernel upgrade

Expected results:

Kernel upgrade

Additional info:

If I try to run sysreport I get the same error of "Assertion failure in
dx_probe() at fs/ext3/namei.c". Maybe kernel bug? My partitions are as follow:

/dev/md1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md0 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md2 on /home type ext3 (rw)
/dev/md3 on /home/javier/externo type ext3 (rw,noexec,nosuid,nodev)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Wed Jan  4 19:32:16 2006
     Raid Level : raid1
     Array Size : 152512 (148.96 MiB 156.17 MB)
  Used Dev Size : 152512 (148.96 MiB 156.17 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Jul  1 23:38:28 2007
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : eb667dca:bb92f200:cb05246c:83071bca
         Events : 0.216390

    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1      22        1        1      active sync   /dev/hdc1

mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Wed Jan  4 19:32:01 2006
     Raid Level : raid0
     Array Size : 25607168 (24.42 GiB 26.22 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Jan  4 19:32:01 2006
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

           UUID : 8b458a37:35595c35:dcff4be4:1b257187
         Events : 0.1

    Number   Major   Minor   RaidDevice State
       0       3        3        0      active sync   /dev/hda3
       1      22        3        1      active sync   /dev/hdc3
mdadm --detail /dev/md2
/dev/md2:
        Version : 00.90.03
  Creation Time : Wed Jan  4 19:32:17 2006
     Raid Level : raid1
     Array Size : 40957632 (39.06 GiB 41.94 GB)
  Used Dev Size : 40957632 (39.06 GiB 41.94 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Jul  1 21:50:41 2007
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 732141c0:906eba59:d77960cd:10d159fd
         Events : 0.46450

    Number   Major   Minor   RaidDevice State
       0       3        2        0      active sync   /dev/hda2
       1      22        2        1      active sync   /dev/hdc2

fdisk -l /dev/hda

Disk /dev/hda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          19      152586   fd  Linux raid autodetect
/dev/hda2              20        5118    40957717+  fd  Linux raid autodetect
/dev/hda3            5119        6712    12803805   fd  Linux raid autodetect
/dev/hda4            6713       14593    63304132+   5  Extended
/dev/hda5            6713       14593    63304101   fd  Linux raid autodetect

fdisk -l /dev/hdb


Disk /dev/hdb: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdb1               1          65      522081   82  Linux swap / Solaris
/dev/hdb2   *          66        9728    77618047+   7  HPFS/NTFS

fdisk -l /dev/hdc

Disk /dev/hdc: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdc1   *           1          19      152586   fd  Linux raid autodetect
/dev/hdc2              20        5756    46082452+  fd  Linux raid autodetect
/dev/hdc3            5757        7350    12803805   fd  Linux raid autodetect
/dev/hdc4            7351       14593    58179397+   5  Extended
/dev/hdc5            7351       14593    58179366   fd  Linux raid autodetect

If you need additional info, please contact me

Comment 1 Javier de Miguel Rodríguez 2007-07-01 20:00:21 UTC
I also use windows XP, and I access linux partitions inside windows (Ext2IFS
driver). Googling I found something similar with Ubuntu:

https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/109177


The problem is likely to be related to the use of an ext2 filesystem driver in
Windows with an ext3 filesystem using the dir_index option. The ext2 filesystem
drivers available for Windows seem to corrupt the directory indices when -O
dir_index is enabled (which is a bug in those drivers). However, the ext3 driver
in the kernel should detect the corruption and rebuild the directory index
(which is the bug being reported here). This is just a hypothesis, but is
supported by evidence from past usage with past kernels on other systems, and
the disappearance of problems when -O dir_index was disabled using tune2fs.

But I try to disable directory index with no luck, still problems upgrading kernel:

tune2fs -O ^dir_index  /dev/md0
tune2fs -O ^dir_index  /dev/md1
tune2fs -O ^dir_index  /dev/md2
tune2fs -O ^dir_index  /dev/md3


Comment 2 Chuck Ebbert 2007-07-02 22:35:38 UTC
The filesystem on disk is corrupted. Was fsck tried?



Comment 3 Javier de Miguel Rodríguez 2007-07-03 06:34:28 UTC
Yes, it was fsck.ext3 and no errors detected.

Comment 4 Chuck Ebbert 2007-07-03 22:46:36 UTC
Eric, any ideas on this one?


Comment 5 Javier de Miguel Rodríguez 2007-07-04 07:33:05 UTC
If you need any aditional info or to do any test, please contact me.

Another interesting topic is that I was unable to upgrade from FC6 to FC7 using
the DVD. It failed with a similar error "Assertion failure in dx_probe() at
fs/ext3/namei.c". If I try to run sysreport, I get the same error.

The system seems ok. I work in it without problems, firefox, openoffice, samba
file serving... but I can not upgrade kernel. With pup I upgrade the other
packages, but no luck with kernel.

Greetings

Javier

Comment 6 Eric Sandeen 2007-07-06 16:28:56 UTC
It seems clear that there is filesystem corruption; would you be willing to use
e2image -r to create an image of your filesystem metadata and post it somewhere,
so I can take a look?  (feel free to mail me directly with the location, if you
don't want all of your filesystem metadata made public... please don't use the
-s option, though, in this case)

If you're using a buggy windows ext2 driver with this same filesystem under
windows, then I'm pretty inclined to blame it - though fsck should find & fix
the error, of course.  You might also try fsck from e2fsprogs-1.40, which was
just released (see http://e2fsprogs.sourceforge.net/ - I don't yet have it in
fedora devel as an RPM, sorry), to see if this is being caught now.

I assume you're only having trouble with the kernel upgrade because only it is
trying to manipulate the directory with the corruption.

Thanks,

-Eric

Comment 7 Javier de Miguel Rodríguez 2007-07-06 20:27:20 UTC
I have created e2images from / and /boot (no e2image for /home)

The commands I used to create the images are:

e2image -r /dev/md0 - | bzip2 -9 >  md0_file.bz2
e2image -r /dev/md1 - | bzip2 -9 >  md1_file.bz2

I have made those images availables in the following URL:

http://talika.eii.us.es/~javier/fc7/

Anyway, I am currently compiling e2fsprogs 1.40 to try a newer fsck.

Greetings from Spain

Javier

Comment 8 Eric Sandeen 2007-07-06 20:43:13 UTC
Great, thank you for the images!  I'm pulling them down now.  Though, I'm
thinking it might have been good if the assert had said which directory we were
working on... I'll see what I can find.

-Eric



Comment 9 Eric Sandeen 2007-08-09 21:52:06 UTC
see also bug #236464 - looking into changing this ASSERT into a recoverable error.

Note, I wasn't able to read your md1 file:
[root@bear-05 test]# debugfs -c md1_file 
debugfs 1.39 (29-May-2006)
md1_file: Bad magic number in super-block while opening filesystem

and I wasn't able to find a filesystem in there anywhere...?

Did a later fsck fix this up for you?  There is a 1.40.2 rpm in F7
updates-testing now...

With a hand-corrupted image, e2fsprogs 1.39 finds this problem:

[root@bear-05 tmp]# e2fsck -f fsfile
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Problem in HTREE directory inode 14337: node (0) has invalid limit (125)
Clear HTree index<y>? yes

...


Comment 10 Eric Sandeen 2007-08-09 22:24:21 UTC
hmm I take that back; I only cannot read the md1 image after journal recovery
goes bad... I still may be able to make something of it.

Comment 11 Andreas Dilger 2007-08-10 07:12:06 UTC
Eric, can you dump the corrupt directory block that is causing this problem? 
The design of dir_index should have allowed a dir_index unaware ext2/3
filesystem mount the filesystem w/o problems.  Deleting inodes out wouldn't
cause any problems, and adding new inodes _should_ result in the root index
being overwritten (because to non-dir_index users it would appear to be an empty
block).  An "od -Ax -tx4" of the first block of the directory would be useful.

We definitely shouldn't go BUG() on disk data - it looks like a simple check and
returning ERR_BAD_DX_DIR is easily possible here.  There are a few other
assert()s on the on-disk data that could likely also go boom if there are just
the right corruptions. 

Comment 12 Eric Sandeen 2007-08-10 15:12:13 UTC
Andreas, unfortunately there seems to be no corruption on the md0 image, and the
md1 image seems to be *badly* corrupt; log replay fails with blocks out of
range, and after that there's no valid superblock... so I've not been able to
find the exact area of corruption yet.  I guess all we need is a windows box
with that driver.. ;-)

Comment 13 Eric Sandeen 2007-08-10 16:16:37 UTC
After clearing the replay flag & running e2fsck I still find no htree corruption
in the md1 image.

Anyway, I'm duping this one to an earlier bug, I have 2 or three of these bugs
resulting from the buggy windows driver.

*** This bug has been marked as a duplicate of 236464 ***

Comment 14 Javier de Miguel Rodríguez 2007-08-13 07:40:42 UTC
I have solved this issue.

I have created a "rescue" cdrom with fc7 and a .tar.gz with the latest
e2fsprogs.   I have booted with that cd and fsck.ext3 the filesystems. No errors
poped, but this message apeared in the beginning of the fsck:

"Adding dirhash hint to filesystem"

Rebooted and everything worked.

Greetings.

Javier

Comment 15 Eric Sandeen 2007-08-13 15:24:45 UTC
That's good to know.  Be warned, though, that if you keep using that windows
driver you will probably hit this again, at least until the linux kernel is
updated to mitigate the damage that it does.

-Eric


Note You need to log in before you can comment on or make changes to this bug.