Description of problem: When I try to upgrade kernel (yum or rpm -Uvh) I get the following error: Jul 1 21:40:15 amd64 kernel: Assertion failure in dx_probe() at fs/ext3/namei.c:384: "dx_get_limit(entries) == dx_root_limit(dir, root->info.info_length)" Jul 1 21:40:15 amd64 kernel: ------------[ cut here ]------------ Jul 1 21:40:15 amd64 kernel: kernel BUG at fs/ext3/namei.c:384! Jul 1 21:40:15 amd64 kernel: invalid opcode: 0000 [1] SMP Jul 1 21:40:15 amd64 kernel: last sysfs file: /devices/pci0000:00/0000:00:18.3/name Jul 1 21:40:15 amd64 kernel: CPU 0 Jul 1 21:40:15 amd64 kernel: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables cpufreq_ondemand dm_mirror dm_multipath dm_mod raid1 video sbs i2c_ec dock button battery asus_acpi backlight ac ipv6 parport_pc lp parport loop snd_bt87x bt878 tuner tvaudio bttv snd_intel8x0 snd_ac97_codec video_buf ide_cd ir_common compat_ioctl32 ac97_bus i2c_algo_bit snd_mpu401 cdrom snd_mpu401_uart snd_seq_dummy snd_seq_oss snd_rawmidi btcx_risc tveeprom snd_seq_midi_event snd_seq skge snd_seq_device videodev v4l2_common v4l1_compat snd_pcm_oss ohci1394 snd_mixer_oss snd_pcm ieee1394 i2c_nforce2 floppy shpchp ns558 snd_timer pcspkr gameport i2c_core forcedeth snd k8temp k8_edac hwmon soundcore edac_mc snd_page_alloc sata_nv libata sd_mod scsi_mod raid456 xor raid0 ext3 jbd ehci_hcd ohci_hcd uhci_hcd Jul 1 21:40:15 amd64 kernel: Pid: 3169, comm: rpmv Not tainted 2.6.20-1.2933.fc6 #1 Jul 1 21:40:15 amd64 kernel: RIP: 0010:[<ffffffff8803d13d>] [<ffffffff8803d13d>] :ext3:dx_probe+0x141/0x278 Jul 1 21:40:15 amd64 kernel: RSP: 0018:ffff81005a9f3a58 EFLAGS: 00010282 Jul 1 21:40:15 amd64 kernel: RAX: 0000000000000081 RBX: ffff810052705000 RCX: ffffffff8057fc58 Jul 1 21:40:15 amd64 kernel: RDX: ffffffff8057fc58 RSI: 0000000000000000 RDI: ffffffff8057fc40 Jul 1 21:40:15 amd64 kernel: RBP: 0000000000000000 R08: ffffffff8057fc58 R09: 00000000ffffffff Jul 1 21:40:15 amd64 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff810058718818 Jul 1 21:40:15 amd64 kernel: R13: 00000000d393763a R14: ffff81007b38d178 R15: ffff81005a9f3bb4 Jul 1 21:40:15 amd64 kernel: FS: 00002aaaaaf24dd0(0000) GS:ffffffff805d3000(0000) knlGS:00000000f7fc66c0 Jul 1 21:40:15 amd64 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 1 21:40:15 amd64 kernel: CR2: 000000000074d000 CR3: 000000005aedb000 CR4: 00000000000006e0 Jul 1 21:40:15 amd64 kernel: Process rpmv (pid: 3169, threadinfo ffff81005a9f2000, task ffff81005f170080) Jul 1 21:40:15 amd64 kernel: Stack: ffff81005a9f3b58 ffff81005270bbd8 ffff81007b38d178 ffff81007b38d178 Jul 1 21:40:15 amd64 kernel: ffff81005270bbd8 ffff81007b38d178 ffff81007b38d178 ffffffff8803db37 Jul 1 21:40:15 amd64 kernel: ffff81005a9f3c08 ffff81005270bbd8 0000001c00000000 ffff81007e357400 Jul 1 21:40:15 amd64 kernel: Call Trace: Jul 1 21:40:15 amd64 kernel: [<ffffffff8803db37>] :ext3:ext3_find_entry+0xd7/0x5a8 Jul 1 21:40:15 amd64 kernel: [<ffffffff88025da0>] :jbd:journal_cancel_revoke+0x8f/0xb6 Jul 1 21:40:15 amd64 kernel: [<ffffffff88021c3a>] :jbd:do_get_write_access+0x4d5/0x507 Jul 1 21:40:15 amd64 kernel: [<ffffffff880212a5>] :jbd:__journal_file_buffer+0x125/0x21c Jul 1 21:40:15 amd64 kernel: [<ffffffff80260ab5>] _read_unlock_irq+0x9/0xc Jul 1 21:40:15 amd64 kernel: [<ffffffff80212a77>] __do_page_cache_readahead+0xe5/0x1ee Jul 1 21:40:15 amd64 kernel: [<ffffffff88044896>] :ext3:__ext3_journal_dirty_metadata+0x1e/0x46 Jul 1 21:40:15 amd64 kernel: [<ffffffff80260aa9>] _spin_unlock_irq+0x9/0xc Jul 1 21:40:15 amd64 kernel: [<ffffffff8803f70b>] :ext3:ext3_lookup+0x31/0xf6 Jul 1 21:40:15 amd64 kernel: [<ffffffff80222709>] d_alloc+0x1a4/0x1af Jul 1 21:40:15 amd64 kernel: [<ffffffff8020cb06>] do_lookup+0xc4/0x1ae Jul 1 21:40:15 amd64 kernel: [<ffffffff8025de37>] copy_user_generic_string+0x17/0x40 Jul 1 21:40:15 amd64 kernel: [<ffffffff80209c72>] __link_path_walk+0x903/0xdb0 Jul 1 21:40:15 amd64 kernel: [<ffffffff8020e74f>] link_path_walk+0x55/0xd7 Jul 1 21:40:15 amd64 kernel: [<ffffffff80297f09>] autoremove_wake_function+0x0/0x2e Jul 1 21:40:15 amd64 kernel: [<ffffffff8020cccb>] file_read_actor+0x0/0x166 Jul 1 21:40:15 amd64 kernel: [<ffffffff8020c8d4>] do_path_lookup+0x1b5/0x217 Jul 1 21:40:15 amd64 kernel: [<ffffffff80212380>] getname+0x152/0x1b8 Jul 1 21:40:15 amd64 kernel: [<ffffffff80223786>] __user_walk_fd+0x37/0x4c Jul 1 21:40:15 amd64 kernel: [<ffffffff8023dbd3>] vfs_lstat_fd+0x18/0x47 Jul 1 21:40:15 amd64 kernel: [<ffffffff80297f09>] autoremove_wake_function+0x0/0x2e Jul 1 21:40:15 amd64 kernel: [<ffffffff8020cccb>] file_read_actor+0x0/0x166 Jul 1 21:40:15 amd64 kernel: [<ffffffff8022a49a>] sys_newlstat+0x19/0x31 Jul 1 21:40:15 amd64 kernel: [<ffffffff80260aa9>] _spin_unlock_irq+0x9/0xc Jul 1 21:40:15 amd64 kernel: [<ffffffff8021d634>] sigprocmask+0xba/0xc1 Jul 1 21:40:15 amd64 kernel: [<ffffffff8022f53b>] sys_rt_sigprocmask+0x50/0xcf Jul 1 21:40:15 amd64 kernel: [<ffffffff8025a11e>] system_call+0x7e/0x83 Jul 1 21:40:15 amd64 kernel: Jul 1 21:40:15 amd64 kernel: Jul 1 21:40:15 amd64 kernel: Code: 0f 0b eb fe 48 8b 1c 24 66 8b 56 02 66 85 d2 74 08 66 39 16 Jul 1 21:40:15 amd64 kernel: RIP [<ffffffff8803d13d>] :ext3:dx_probe+0x141/0x278 Jul 1 21:40:15 amd64 kernel: RSP <ffff81005a9f3a58> Version-Release number of selected component (if applicable): Fedora Core 7, (upgraded from fc6->fc7 via yum) Linux amd64.casa.net 2.6.20-1.2933.fc6 #1 SMP Mon Mar 19 11:00:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Always. Try to upgrade kernel Steps to Reproduce: 1.rpm -Uvh /var/cache/yum/updates/packages/kernel-2.6.21-1.3228.fc7.x86_64.rpm 2. I get "Assertion failure in dx_probe() at fs/ext3/namei.c" in /var/log/messages, so I can not upgrade the kernel 3. Actual results: No kernel upgrade Expected results: Kernel upgrade Additional info: If I try to run sysreport I get the same error of "Assertion failure in dx_probe() at fs/ext3/namei.c". Maybe kernel bug? My partitions are as follow: /dev/md1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) /dev/md2 on /home type ext3 (rw) /dev/md3 on /home/javier/externo type ext3 (rw,noexec,nosuid,nodev) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Wed Jan 4 19:32:16 2006 Raid Level : raid1 Array Size : 152512 (148.96 MiB 156.17 MB) Used Dev Size : 152512 (148.96 MiB 156.17 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Jul 1 23:38:28 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : eb667dca:bb92f200:cb05246c:83071bca Events : 0.216390 Number Major Minor RaidDevice State 0 3 1 0 active sync /dev/hda1 1 22 1 1 active sync /dev/hdc1 mdadm --detail /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Wed Jan 4 19:32:01 2006 Raid Level : raid0 Array Size : 25607168 (24.42 GiB 26.22 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Jan 4 19:32:01 2006 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Chunk Size : 256K UUID : 8b458a37:35595c35:dcff4be4:1b257187 Events : 0.1 Number Major Minor RaidDevice State 0 3 3 0 active sync /dev/hda3 1 22 3 1 active sync /dev/hdc3 mdadm --detail /dev/md2 /dev/md2: Version : 00.90.03 Creation Time : Wed Jan 4 19:32:17 2006 Raid Level : raid1 Array Size : 40957632 (39.06 GiB 41.94 GB) Used Dev Size : 40957632 (39.06 GiB 41.94 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Sun Jul 1 21:50:41 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 732141c0:906eba59:d77960cd:10d159fd Events : 0.46450 Number Major Minor RaidDevice State 0 3 2 0 active sync /dev/hda2 1 22 2 1 active sync /dev/hdc2 fdisk -l /dev/hda Disk /dev/hda: 120.0 GB, 120034123776 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 19 152586 fd Linux raid autodetect /dev/hda2 20 5118 40957717+ fd Linux raid autodetect /dev/hda3 5119 6712 12803805 fd Linux raid autodetect /dev/hda4 6713 14593 63304132+ 5 Extended /dev/hda5 6713 14593 63304101 fd Linux raid autodetect fdisk -l /dev/hdb Disk /dev/hdb: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hdb1 1 65 522081 82 Linux swap / Solaris /dev/hdb2 * 66 9728 77618047+ 7 HPFS/NTFS fdisk -l /dev/hdc Disk /dev/hdc: 120.0 GB, 120034123776 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hdc1 * 1 19 152586 fd Linux raid autodetect /dev/hdc2 20 5756 46082452+ fd Linux raid autodetect /dev/hdc3 5757 7350 12803805 fd Linux raid autodetect /dev/hdc4 7351 14593 58179397+ 5 Extended /dev/hdc5 7351 14593 58179366 fd Linux raid autodetect If you need additional info, please contact me
I also use windows XP, and I access linux partitions inside windows (Ext2IFS driver). Googling I found something similar with Ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/109177 The problem is likely to be related to the use of an ext2 filesystem driver in Windows with an ext3 filesystem using the dir_index option. The ext2 filesystem drivers available for Windows seem to corrupt the directory indices when -O dir_index is enabled (which is a bug in those drivers). However, the ext3 driver in the kernel should detect the corruption and rebuild the directory index (which is the bug being reported here). This is just a hypothesis, but is supported by evidence from past usage with past kernels on other systems, and the disappearance of problems when -O dir_index was disabled using tune2fs. But I try to disable directory index with no luck, still problems upgrading kernel: tune2fs -O ^dir_index /dev/md0 tune2fs -O ^dir_index /dev/md1 tune2fs -O ^dir_index /dev/md2 tune2fs -O ^dir_index /dev/md3
The filesystem on disk is corrupted. Was fsck tried?
Yes, it was fsck.ext3 and no errors detected.
Eric, any ideas on this one?
If you need any aditional info or to do any test, please contact me. Another interesting topic is that I was unable to upgrade from FC6 to FC7 using the DVD. It failed with a similar error "Assertion failure in dx_probe() at fs/ext3/namei.c". If I try to run sysreport, I get the same error. The system seems ok. I work in it without problems, firefox, openoffice, samba file serving... but I can not upgrade kernel. With pup I upgrade the other packages, but no luck with kernel. Greetings Javier
It seems clear that there is filesystem corruption; would you be willing to use e2image -r to create an image of your filesystem metadata and post it somewhere, so I can take a look? (feel free to mail me directly with the location, if you don't want all of your filesystem metadata made public... please don't use the -s option, though, in this case) If you're using a buggy windows ext2 driver with this same filesystem under windows, then I'm pretty inclined to blame it - though fsck should find & fix the error, of course. You might also try fsck from e2fsprogs-1.40, which was just released (see http://e2fsprogs.sourceforge.net/ - I don't yet have it in fedora devel as an RPM, sorry), to see if this is being caught now. I assume you're only having trouble with the kernel upgrade because only it is trying to manipulate the directory with the corruption. Thanks, -Eric
I have created e2images from / and /boot (no e2image for /home) The commands I used to create the images are: e2image -r /dev/md0 - | bzip2 -9 > md0_file.bz2 e2image -r /dev/md1 - | bzip2 -9 > md1_file.bz2 I have made those images availables in the following URL: http://talika.eii.us.es/~javier/fc7/ Anyway, I am currently compiling e2fsprogs 1.40 to try a newer fsck. Greetings from Spain Javier
Great, thank you for the images! I'm pulling them down now. Though, I'm thinking it might have been good if the assert had said which directory we were working on... I'll see what I can find. -Eric
see also bug #236464 - looking into changing this ASSERT into a recoverable error. Note, I wasn't able to read your md1 file: [root@bear-05 test]# debugfs -c md1_file debugfs 1.39 (29-May-2006) md1_file: Bad magic number in super-block while opening filesystem and I wasn't able to find a filesystem in there anywhere...? Did a later fsck fix this up for you? There is a 1.40.2 rpm in F7 updates-testing now... With a hand-corrupted image, e2fsprogs 1.39 finds this problem: [root@bear-05 tmp]# e2fsck -f fsfile e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Problem in HTREE directory inode 14337: node (0) has invalid limit (125) Clear HTree index<y>? yes ...
hmm I take that back; I only cannot read the md1 image after journal recovery goes bad... I still may be able to make something of it.
Eric, can you dump the corrupt directory block that is causing this problem? The design of dir_index should have allowed a dir_index unaware ext2/3 filesystem mount the filesystem w/o problems. Deleting inodes out wouldn't cause any problems, and adding new inodes _should_ result in the root index being overwritten (because to non-dir_index users it would appear to be an empty block). An "od -Ax -tx4" of the first block of the directory would be useful. We definitely shouldn't go BUG() on disk data - it looks like a simple check and returning ERR_BAD_DX_DIR is easily possible here. There are a few other assert()s on the on-disk data that could likely also go boom if there are just the right corruptions.
Andreas, unfortunately there seems to be no corruption on the md0 image, and the md1 image seems to be *badly* corrupt; log replay fails with blocks out of range, and after that there's no valid superblock... so I've not been able to find the exact area of corruption yet. I guess all we need is a windows box with that driver.. ;-)
After clearing the replay flag & running e2fsck I still find no htree corruption in the md1 image. Anyway, I'm duping this one to an earlier bug, I have 2 or three of these bugs resulting from the buggy windows driver. *** This bug has been marked as a duplicate of 236464 ***
I have solved this issue. I have created a "rescue" cdrom with fc7 and a .tar.gz with the latest e2fsprogs. I have booted with that cd and fsck.ext3 the filesystems. No errors poped, but this message apeared in the beginning of the fsck: "Adding dirhash hint to filesystem" Rebooted and everything worked. Greetings. Javier
That's good to know. Be warned, though, that if you keep using that windows driver you will probably hit this again, at least until the linux kernel is updated to mitigate the damage that it does. -Eric