Description of problem: e2fsck segfaults + dumps core when trying to check a filesystem on an attached FC storage. Core attached. [root@dl3131 root]# uname -a Linux dl3131 2.4.9-e.12enterprise #1 SMP Tue Feb 11 01:29:18 EST 2003 i686 unknown [root@dl3131 root]# modprobe qla2300_6x [root@dl3131 root]# dmesg [...] scsi2 : QLogic QLA2300 PCI to Fibre Channel Host Adapter: bus 0 device 8 irq 18 Firmware version: 3.01.18, Driver version 6.04.02 [...] SCSI device sde: 83886080 512-byte hdwr sectors (42950 MB) sde: sde1 [...] [root@dl3131 root]# fdisk -l /dev/sde Disk /dev/sde: 64 heads, 32 sectors, 40960 cylinders Units = cylinders of 2048 * 512 bytes Device Boot Start End Blocks Id System /dev/sde1 1 40960 41943024 83 Linux [root@dl3131 root]# ulimit -c unlimited [root@dl3131 root]# e2fsck /dev/sde1 e2fsck 1.26 (3-Feb-2002) Segmentation fault (core dumped) Version-Release number of selected component (if applicable): e2fsprogs-1.26-1.72 How reproducible: Always Steps to Reproduce: run e2fsck Actual Results: coredump Expected Results: fsck Additional info: will gladly provide more details. just ask.
Created attachment 95512 [details] e2fsck core dump
This problem can be easily reproduced by running e2fsck on the 1MB snippet from the beginning of the partition I'll attach right now.
Created attachment 95729 [details] first one million bytes of the partition in question This file contains the first one million bytes of the partition (BZ attachment limitation). Running e2fsck on the file reproduces the problem, when running it on only the first 100kB, e2fsck gets a short read and aborts. e2fsprogs-1.34-1 from Fedora Core doesn't crash and brings: e2fsck 1.34 (25-Jul-2003) Superblock has a bad ext3 journal (inode 8). Clear<y>? yes *** ext3 journal has been deleted - filesystem is now ext2 only ***
Steffen Mann was puzzled over kernel 2.4.9-e.12enterprise (As shown in first report). Upgraded machine to kernel-enterprise-2.4.9-e.25. Did not change behaviour. Duh.
I would have been really surprised if upgrading the kernel changed the erroneous behaviour of e2fsck. After all, the error could be reproduced by running e2fsck on normal file, so the bug would have to be with e2fsck if other programs don't show the error.
re: Surpise - Me too. But I'm not trying to make sense of it, just trying to keep RH support happy. If a more recent kernel makes them happy, so be it. As long as it is one we certified ;-) And this bug seems to have no friends. Really. Still "NEW" after all this time. Support ticket is #267706, filed with my account "541004 - Web". Last thing they told me is they are "escalating to development". Let's see how fast we come around full circle.
Newer e2fsprogs can fix this problem. I'll have to check if we will update e2fsprogs to a newer version, but this is probably a too big upgrade step for an RHEL AS release. The real question here is what is causing the data corruption and what kernel or hardware problem you are running into with some reproducable report and then open that issue up with Red Hat. greetings, Florian La Roche
cannot say _why_ the fs becam corrupted - all I know is, that after an unclean shutdown, the fsck refused to work on the fs. When I see something like this again, I'll certainly give more info - but I am not in a position to do testing with the goal beeing to determine possible sources of fs corruption. If you have more specific ideas about test cases, we could make available a test machine. Again - at this point my ideas of how and what to test are pretty global, too global to act upon them with justifiable effort.
No reproducable bug for the kernel and working ok for the newest e2fsprogs rpm to fix the corrupted partition. greetings, Florian La Roche