Red Hat Bugzilla – Bug 500946
readahead segfaults during bootup
Last modified: 2009-10-13 06:16:14 EDT
Description of problem:
I found the following in dmesg:
readahead: segfault at 0 ip 00adf053 sp bf84b17c error 4 in libc-2.10.1.so[a66000+16b000]
Version-Release number of selected component (if applicable):
Once (c.f. below)
Steps to Reproduce:
No idea (c.f. below)
segfault during bootup.
FWIW: Before the bootup containing the readahead segfault, the system had seen a
"yum update", which had comprised a glibc update.
which version of glibc?
Current rawhide: glibc-2.10.1-1.i686
The update I am referring to, was likely from
glibc-2.9.90-22.i686 to glibc-2.10.1-1.i686
Is it reproducible? If so, can you install the debuginfo packages to gather more information where it crashes?
(In reply to comment #3)
> Is it reproducible?
I haven't tried, but I can try to downgrade glibc.
(In reply to comment #4)
> (In reply to comment #3)
> > Is it reproducible?
> I haven't tried, but I can try to downgrade glibc.
I downgraded glibc to *-2.9.90-22, rebooted, upgraded to *-2.10.1-1, rebooted and checked dmesg on 2 different machines (comprising the one which had exposed the segfault, the other didn't), but without success.
This didn't trigger the segfault.
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.
More information and reason for this action is here:
I had this happen on a freshly installed and updated Fedora 11 box; i.e., with readahead-1.4.9-1.fc11.x86_64 and glibc-2.10.1-2.x86_64. Here are all of the entries in /var/log/messages related to readahead:
Jun 18 13:17:16 localhost kernel: audit(1245352636.562:50515): auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 op=remove rule key=(null) list=2 res=1
Jun 18 13:17:16 localhost kernel: audit(1245352636.562:50516): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 res=1
Jun 18 13:47:01 localhost kernel: readahead: segfault at 0 ip 00007f33a2b44411 sp 00007fffab3ca5a8 error 4 in libc-2.10.1.so[7f33a2ac5000+164000]
I installed the readahead-debuginfo package, and GDB claims that address 00007f33a2b44411 is not part of any function. To be clear, the sequence was as follows: I installed F-11, did a "yum upgrade", rebooted, and then the segfault occurred.
Seen a couple of these so far:
eule/var/log/messages:Jul 22 19:17:45 eule kernel: readahead: segfault at 0 ip 000000347d67f411 sp 00007fff1e9c0f98 error 4 in libc-2.10.1.so[347d600000+164000]
phlebas/var/log/messages-20090713:Jul 9 10:59:55 phlebas kernel: readahead: segfault at 0 ip 000000306207f411 sp 00007fff7bcbf988 error 4 in libc-2.10.1.so[3062000000+164000]
phlebas/var/log/messages-20090713:Jul 13 10:11:09 phlebas kernel: readahead: segfault at 0 ip 000000306207f411 sp 00007fff3cee2c98 error 4 in libc-2.10.1.so[3062000000+164000]
This is still happening on my Fedora 12/rawhide system at boot. Some messages:
Sep 13 10:38:22 fedora12 yum: Updated: 1:readahead-1.5.0-2.fc12.x86_64
Sep 13 22:18:09 fedora12 kernel: readahead: segfault at 0 ip 0000003ab2881212 sp 00007fff2e9b7da8 error 4 in libc-2.10.90.so[3ab2800000+176000]
Sep 15 19:39:47 fedora12 yum: Updated: 1:readahead-1.5.1-1.fc12.x86_64
Sep 15 21:45:26 fedora12 kernel: readahead: segfault at 0 ip 00007f04420e51f2 sp 00007fff053083b8 error 4 in libc-2.10.90.so[7f0442064000+176000]
I'm getting the readahead segfault under the same conditions reported by Jerry James: I did a fresh install of F11 x86_64, then did a yum update, then rebooted.
readahead: segfault at 0 ip 00000033b4c7f541 sp 00007fff934533a8 error 4 in libc-2.10.1.so[33b4c00000+164000]
And on another boot:
readahead: segfault at 0 ip 00000033b4c7f541 sp 00007fff8fd3cb18 error 4 in libc-2.10.1.so[33b4c00000+164000]
System has an Athlon 64 X2 4800+ CPU on an Asus K8V Deluxe motherboard with 2GB of RAM, and a 3ware 8506 hardware RAID controller.
If I manually invoke readahead under GDB with a breakpoint on main, so that I can examine the memory space, GDB tells me:
(gdb) list *0x33b4c7f541
0x33b4c7f541 is at ../sysdeps/x86_64/strlen.S:31
26 movq %rdi, %rcx
27 movq %rdi, %r8
28 andq $~15,%rdi
29 pxor %xmm1, %xmm1
30 orl $0xffffffff, %esi
31 movdqa (%rdi), %xmm0
32 subq %rdi, %rcx
33 leaq 16(%rdi), %rdi
34 pcmpeqb %xmm1, %xmm0
35 shl %cl, %esi
I don't know how to get any more useful information, since the segfault message doesn't include a backtrace or other register values.
This might be the same bug reported as Debian bug 547159: http://bugs.debian.org/547159
hmm, there is only one call to readahead for strlen().
can you attach the files in /var/lib/readahead/ ?
Created attachment 361601 [details]
Created attachment 361602 [details]
There could be a call to strlen() in something else that readahead calls. Is there any way to get a segfault during startup to produce a backtrace?
hmmm.. maybe s.th. like that?
# mv /sbin/readahead /sbin/readahead.orig
# cat > /sbin/readahead <<EOF
ulimit -Sc unlimited
# chmod 0755 /sbin/readahead
Tried your suggestion. With that script in place I don't get any mention of readahead in dmesg, not sure why. Maybe readahead is affected by the current directory being /tmp instead of wherever it normally is?
To make sure it was getting invoked, I added "echo 'starting readahead' >>/tmp/readahead.log" before the cd and a similar message after readahead.orig, and then I get errors from echo that /tmp is a read-only filesystem, so /tmp obviously isn't going to be a suitable place for coredumps at that time.
How does readahead get started, anyhow? I see the line to enable it in /etc/sysconfig/readahead, but I can't find a reference to it in the startup scripts. Is it started by something inside the initrd?
/dev should be writable...
readahead gets started by upstart with /etc/event.d/readahead.event
OK, changed the script to use /dev. Confirmed that my echos go into /dev/readahead.log, but dmesg has no error and there's no core file.
Moved the original readahead back in place, rebooted, and now I don't get the segfault! I have no idea what's changed that has fixed it. :-(
(In reply to comment #10)
> This might be the same bug reported as Debian bug 547159:
The bug is caused because blkid_devno_to_devname returns /dev/root for the root fs device, in spite of that symlink (created by udev, at least here on Debian) not existing; but I'm not quite sure it is the same bug.
I'm going to apply the following patch on the next upload for Debian:
@@ -114,6 +114,8 @@ get_file_device(dev_t dev, struct device
devices[i].st_dev = dev;
devices[i].name = blkid_devno_to_devname(dev);
+ if (!devices[i].name)
+ return NULL;
if (ext2fs_open(devices[i].name, 0, 0, 0, IO_MGR, &fs) || !fs)
Correcting my previous statement: actually devno_to_devname fails to find a device file, returning a NULL value, which if unhandled is passed to ext2s_open which later causes the segfault. The issue was found in Debian by some users who are using a kernel without initrd and without static device files (explaining the failure of blkid).
thx! pushed to upstream git