Bug 486997
Summary: | rawhide's init/nash segfaults in libblkid | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jim Meyering <meyering> | ||||||
Component: | e2fsprogs | Assignee: | Eric Sandeen <esandeen> | ||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | rawhide | CC: | esandeen, kernel-maint, kzak, oliver, quintela | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 1.41.4-4.fc10 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-03-18 19:06:05 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jim Meyering
2009-02-23 16:31:21 UTC
Jim, can you attach the partition table here? Thanks, -Eric Created attachment 332980 [details]
first 512 bytes
here you go...
Also it'd be interesting if you could try installing some old kernel w/ your present system; if that fails too it might be due to something horked in an e2fsprogs upgrade? Or, put the partition table back, boot the old working kernel+initrd, then run blkid with "-c /dev/null" so you don't read cached info, and see if you can reproduce. I'll assume this is e2fsprogs' fault for now and take the bug. :) hooked up serial cable. Full log attached below. here's the stack trace: Activating logical volumes 2 logical volume(s) in volume group "VolGroup00" now active init[1]: segfault at 0 ip 000000332867dd20 sp 00007fffec7f2a48 error 4 in libc-2] nash received SIGSEGV! Backtrace (16): /bin/nash[0x40ef98] /lib64/libc.so.6[0x3328633340] /lib64/libc.so.6(strlen+0x30)[0x332867dd20] /lib64/libblkid.so.1[0x332ba06b2e] /lib64/libblkid.so.1[0x332ba06c0c] /lib64/libblkid.so.1[0x332ba06d94] /lib64/libblkid.so.1(blkid_verify+0x1cd)[0x332ba070dd] /lib64/libblkid.so.1(blkid_get_dev+0xab)[0x332ba0415b] /usr/lib64/libnash.so.6.0.77[0x332960cb97] /usr/lib64/libnash.so.6.0.77(nashFindFsByName+0x63)[0x332960cd3c] /usr/lib64/libnash.so.6.0.77(nashAGetPathBySpec+0xa1)[0x332960ce44] /bin/nash[0x40a2bf] /bin/nash[0x40ee72] /bin/nash[0x40f50c] /lib64/libc.so.6(__libc_start_main+0xfd)[0x332861e5ed] /bin/nash[0x4050b9] Created attachment 332992 [details]
serial console log
Eric, responding also here to your comment #3, using an older kernel does work (see the actual report for the version numbers). with zeroed partition table, even newer kernels boot. when booted into the latest kernel, I restored the partition table, ran partprobe to make the kernel reread it, then ran blkid to show everything. Worked fine: for i in $(blkid |perl -nle '/.* UUID="(.*?)".*/ and print $1'); do echo $i; blkid -l -t UUID=$i > /dev/null; done 5ed0e379-eea9-47cc-95ca-f5b86032f887 f9d4b936-c764-4bfa-af27-d9ab7f949e0d 6c27b605-0bd3-48fc-84f9-a9b536e4eae3 f9d4b936-c764-4bfa-af27-d9ab7f949e0d toYpTt-s6ty-jpaV-K4Re-mluN-rdjl-IBqRq8 5ed0e379-eea9-47cc-95ca-f5b86032f887 47DD-19E3 e73c8549-4182-4976-9ae9-954ca952623c f1b08685-6a49-4908-b9b0-919b886ba55c f15321e5-d3fc-4256-acc3-3a93c7c7ae1e I've reduced it a little more. For background, here's my partition table. Note that hdb2 is a 50GB ext4 partition. Model: ATA SAMSUNG HD501LJ (scsi) Disk /dev/sdb: 976773168s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32s 390625279s 390625248s primary ext3 boot 2 390625280s 488282111s 97656832s primary ext3 3 488282112s 625000447s 136718336s extended 5 488282144s 585938943s 97656800s logical ext3 6 585938976s 625000447s 39061472s logical Trying to determine which partition is provoking the failure, I'm removing #6 first: parted -s /dev/sdb rm 6 boot-to-newest-kernel still fails. boot back to usable kernel, then repeat for partition 5, then 3, then finally 2. It's only after removing partition #2 that the latest kernel managed to boot. So now I have copied that 50GB partition to a regular file, made it sparse, tar'd and compressed it down to 32MB. You can get a copy from http://et.redhat.com/~meyering/sdb2.img.tar.xz (and if your distro doesn't package xz yet, prod your friendly lzma maintainer. xz is the new name for lzma: http://tukaani.org/xz/) Given all that, you should be able to untar and copy the result to the same sectors as listed above, with the already-attached boot sector, and then reproduce the problem. FYI, I recompressed it with xz -8 (which took longer), and now it's just 12MiB. Contrast with bzip -9's size that's 3.5 times larger: $ du -h sdb* 43M sdb2.img.tar.bz2 12M sdb2.img.tar.xz For your convenience, here's the bzip2-compressed file, too: http://et.redhat.com/~meyering/sdb2.img.tar.bz2 e2fsprogs-1.41.4-4.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/e2fsprogs-1.41.4-4.fc10 I've just confirmed that latest kernel (which now has an initrd built from fixed e2fsprogs-1.41.4-5.fc11.x86_64) solves the problem. Thanks, Eric. e2fsprogs-1.41.4-4.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update e2fsprogs'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-2165 e2fsprogs-1.41.4-4.fc10 has been pushed to the Fedora 10 stable repository. If problems still persist, please make note of it in this bug report. |