Description of problem: Hello, Normally I track rawhide pretty closely, but for the last few weeks new kernels haven't booted on the system I use for that, so I stuck with the most recent one that worked, 2.6.29-0.74.rc3.git3.fc11.x86_64 However, now that there are 4 newer kernel images, none of which boots, it's getting a little precarious, and I investigated. Note: this is the first that failed to boot: 2.6.29-0.99.rc4.git1.fc11.x86_64 The most recent, 2.6.29-0.131.rc5.git2.fc11.x86_64 also fails the same way, with a segfault from init/nash. The traceback I saw had (from memory, sorry) glibc's strlen ... libblkid's blkid_verify blkid_get_dev nash... I found that I could boot into any of the recent kernels, only if I'd either disconnect /dev/hdb physically, or if I had erased its partition table. I did save a copy. The partitions on /dev/hdb were of type ext4 and ext3: Here's what parted said before I reformatted the xfs partition as ext4, just to be sure xfs wasn't implicated: Model: ATA SAMSUNG HD501LJ (scsi) Disk /dev/sdb: 976773168s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32s 390625279s 390625248s primary ext3 boot 2 390625280s 488282111s 97656832s primary ext3 3 488282112s 625000447s 136718336s extended 5 488282144s 585938943s 97656800s logical xfs 6 585938976s 625000447s 39061472s logical If you need more detail, I'll be happy to help, but it may take me a week or so. Version-Release number of selected component (if applicable): see above How reproducible: always Steps to Reproduce: 1. copy partition table back to /dev/hdb 2. reboot 3. Actual results: segfault in nash/libblkid Expected results: no segfault Additional info:
Jim, can you attach the partition table here? Thanks, -Eric
Created attachment 332980 [details] first 512 bytes here you go...
Also it'd be interesting if you could try installing some old kernel w/ your present system; if that fails too it might be due to something horked in an e2fsprogs upgrade? Or, put the partition table back, boot the old working kernel+initrd, then run blkid with "-c /dev/null" so you don't read cached info, and see if you can reproduce. I'll assume this is e2fsprogs' fault for now and take the bug. :)
hooked up serial cable. Full log attached below. here's the stack trace: Activating logical volumes 2 logical volume(s) in volume group "VolGroup00" now active init[1]: segfault at 0 ip 000000332867dd20 sp 00007fffec7f2a48 error 4 in libc-2] nash received SIGSEGV! Backtrace (16): /bin/nash[0x40ef98] /lib64/libc.so.6[0x3328633340] /lib64/libc.so.6(strlen+0x30)[0x332867dd20] /lib64/libblkid.so.1[0x332ba06b2e] /lib64/libblkid.so.1[0x332ba06c0c] /lib64/libblkid.so.1[0x332ba06d94] /lib64/libblkid.so.1(blkid_verify+0x1cd)[0x332ba070dd] /lib64/libblkid.so.1(blkid_get_dev+0xab)[0x332ba0415b] /usr/lib64/libnash.so.6.0.77[0x332960cb97] /usr/lib64/libnash.so.6.0.77(nashFindFsByName+0x63)[0x332960cd3c] /usr/lib64/libnash.so.6.0.77(nashAGetPathBySpec+0xa1)[0x332960ce44] /bin/nash[0x40a2bf] /bin/nash[0x40ee72] /bin/nash[0x40f50c] /lib64/libc.so.6(__libc_start_main+0xfd)[0x332861e5ed] /bin/nash[0x4050b9]
Created attachment 332992 [details] serial console log
Eric, responding also here to your comment #3, using an older kernel does work (see the actual report for the version numbers). with zeroed partition table, even newer kernels boot. when booted into the latest kernel, I restored the partition table, ran partprobe to make the kernel reread it, then ran blkid to show everything. Worked fine: for i in $(blkid |perl -nle '/.* UUID="(.*?)".*/ and print $1'); do echo $i; blkid -l -t UUID=$i > /dev/null; done 5ed0e379-eea9-47cc-95ca-f5b86032f887 f9d4b936-c764-4bfa-af27-d9ab7f949e0d 6c27b605-0bd3-48fc-84f9-a9b536e4eae3 f9d4b936-c764-4bfa-af27-d9ab7f949e0d toYpTt-s6ty-jpaV-K4Re-mluN-rdjl-IBqRq8 5ed0e379-eea9-47cc-95ca-f5b86032f887 47DD-19E3 e73c8549-4182-4976-9ae9-954ca952623c f1b08685-6a49-4908-b9b0-919b886ba55c f15321e5-d3fc-4256-acc3-3a93c7c7ae1e
I've reduced it a little more. For background, here's my partition table. Note that hdb2 is a 50GB ext4 partition. Model: ATA SAMSUNG HD501LJ (scsi) Disk /dev/sdb: 976773168s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32s 390625279s 390625248s primary ext3 boot 2 390625280s 488282111s 97656832s primary ext3 3 488282112s 625000447s 136718336s extended 5 488282144s 585938943s 97656800s logical ext3 6 585938976s 625000447s 39061472s logical Trying to determine which partition is provoking the failure, I'm removing #6 first: parted -s /dev/sdb rm 6 boot-to-newest-kernel still fails. boot back to usable kernel, then repeat for partition 5, then 3, then finally 2. It's only after removing partition #2 that the latest kernel managed to boot. So now I have copied that 50GB partition to a regular file, made it sparse, tar'd and compressed it down to 32MB. You can get a copy from http://et.redhat.com/~meyering/sdb2.img.tar.xz (and if your distro doesn't package xz yet, prod your friendly lzma maintainer. xz is the new name for lzma: http://tukaani.org/xz/) Given all that, you should be able to untar and copy the result to the same sectors as listed above, with the already-attached boot sector, and then reproduce the problem.
FYI, I recompressed it with xz -8 (which took longer), and now it's just 12MiB. Contrast with bzip -9's size that's 3.5 times larger: $ du -h sdb* 43M sdb2.img.tar.bz2 12M sdb2.img.tar.xz For your convenience, here's the bzip2-compressed file, too: http://et.redhat.com/~meyering/sdb2.img.tar.bz2
e2fsprogs-1.41.4-4.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/e2fsprogs-1.41.4-4.fc10
I've just confirmed that latest kernel (which now has an initrd built from fixed e2fsprogs-1.41.4-5.fc11.x86_64) solves the problem. Thanks, Eric.
e2fsprogs-1.41.4-4.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update e2fsprogs'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-2165
e2fsprogs-1.41.4-4.fc10 has been pushed to the Fedora 10 stable repository. If problems still persist, please make note of it in this bug report.