Bug 500946 - readahead segfaults during bootup
Summary: readahead segfaults during bootup
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: readahead
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Harald Hoyer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-15 04:22 UTC by Ralf Corsepius
Modified: 2009-10-13 10:16 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-10-13 10:16:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/var/lib/readahead/custom.early (63.25 KB, text/plain)
2009-09-18 06:28 UTC, Eric Smith
no flags Details
/var/lib/readahead/early.sorted (71.55 KB, text/plain)
2009-09-18 06:28 UTC, Eric Smith
no flags Details

Description Ralf Corsepius 2009-05-15 04:22:42 UTC
Description of problem:

I found the following in dmesg:
...
readahead[122]: segfault at 0 ip 00adf053 sp bf84b17c error 4 in libc-2.10.1.so[a66000+16b000]
...

Version-Release number of selected component (if applicable):
readahead-1.4.9-1.fc11.i586

How reproducible:
Once (c.f. below)

Steps to Reproduce:
No idea (c.f. below)
  
Actual results:
segfault during bootup.

Expected results:
no segfault.


Additional info:
FWIW: Before the bootup containing the readahead segfault, the system had seen a
"yum update", which had comprised a glibc update.

Comment 1 Harald Hoyer 2009-05-15 06:29:39 UTC
which version of glibc?

Comment 2 Ralf Corsepius 2009-05-15 06:42:39 UTC
Current rawhide: glibc-2.10.1-1.i686

The update I am referring to, was likely from
glibc-2.9.90-22.i686 to glibc-2.10.1-1.i686

Comment 3 Harald Hoyer 2009-05-15 06:58:11 UTC
Is it reproducible? If so, can you install the debuginfo packages to gather more information where it crashes?

Comment 4 Ralf Corsepius 2009-05-15 07:09:53 UTC
(In reply to comment #3)
> Is it reproducible?
I haven't tried, but I can try to downgrade glibc.

Comment 5 Ralf Corsepius 2009-05-15 09:36:20 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Is it reproducible?
Seemingly no.

> I haven't tried, but I can try to downgrade glibc.  
I downgraded glibc to *-2.9.90-22, rebooted, upgraded to *-2.10.1-1, rebooted and checked dmesg on 2 different machines (comprising the one which had exposed the segfault, the other didn't), but without success.

This didn't trigger the segfault.

Comment 6 Bug Zapper 2009-06-09 15:51:05 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Jerry James 2009-06-19 14:06:43 UTC
I had this happen on a freshly installed and updated Fedora 11 box; i.e., with readahead-1.4.9-1.fc11.x86_64 and glibc-2.10.1-2.x86_64.  Here are all of the entries in /var/log/messages related to readahead:

Jun 18 13:17:16 localhost kernel: audit(1245352636.562:50515): auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 op=remove rule key=(null) list=2 res=1
Jun 18 13:17:16 localhost kernel: audit(1245352636.562:50516): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 res=1
...
Jun 18 13:47:01 localhost kernel: readahead[185]: segfault at 0 ip 00007f33a2b44411 sp 00007fffab3ca5a8 error 4 in libc-2.10.1.so[7f33a2ac5000+164000]

I installed the readahead-debuginfo package, and GDB claims that address 00007f33a2b44411 is not part of any function.  To be clear, the sequence was as follows: I installed F-11, did a "yum upgrade", rebooted, and then the segfault occurred.

Comment 8 Orion Poplawski 2009-07-23 15:15:14 UTC
Seen a couple of these so far:

eule/var/log/messages:Jul 22 19:17:45 eule kernel: readahead[95]: segfault at 0 ip 000000347d67f411 sp 00007fff1e9c0f98 error 4 in libc-2.10.1.so[347d600000+164000]
phlebas/var/log/messages-20090713:Jul  9 10:59:55 phlebas kernel: readahead[83]: segfault at 0 ip 000000306207f411 sp 00007fff7bcbf988 error 4 in libc-2.10.1.so[3062000000+164000]
phlebas/var/log/messages-20090713:Jul 13 10:11:09 phlebas kernel: readahead[85]: segfault at 0 ip 000000306207f411 sp 00007fff3cee2c98 error 4 in libc-2.10.1.so[3062000000+164000]

Comment 9 stan 2009-09-16 05:17:44 UTC
This is still happening on my Fedora 12/rawhide system at boot.  Some messages:

Sep 13 10:38:22 fedora12 yum: Updated: 1:readahead-1.5.0-2.fc12.x86_64
Sep 13 22:18:09 fedora12 kernel: readahead[127]: segfault at 0 ip 0000003ab2881212 sp 00007fff2e9b7da8 error 4 in libc-2.10.90.so[3ab2800000+176000]

Sep 15 19:39:47 fedora12 yum: Updated: 1:readahead-1.5.1-1.fc12.x86_64
Sep 15 21:45:26 fedora12 kernel: readahead[113]: segfault at 0 ip 00007f04420e51f2 sp 00007fff053083b8 error 4 in libc-2.10.90.so[7f0442064000+176000]

Comment 10 Eric Smith 2009-09-18 05:37:22 UTC
I'm getting the readahead segfault under the same conditions reported by Jerry James:  I did a fresh install of F11 x86_64, then did a yum update, then rebooted.

readahead[102]: segfault at 0 ip 00000033b4c7f541 sp 00007fff934533a8 error 4 in libc-2.10.1.so[33b4c00000+164000]

And on another boot:

readahead[99]: segfault at 0 ip 00000033b4c7f541 sp 00007fff8fd3cb18 error 4 in libc-2.10.1.so[33b4c00000+164000]

System has an Athlon 64 X2 4800+ CPU on an Asus K8V Deluxe motherboard with 2GB of RAM, and a 3ware 8506 hardware RAID controller.

kernel-2.6.30.5-43.fc11.x86_64
readahead-1.5.0-1.fc11.x86_64
glibc-2.10.1-5.x86_64

If I manually invoke readahead under GDB with a breakpoint on main, so that I can examine the memory space, GDB tells me:

(gdb) list *0x33b4c7f541
0x33b4c7f541 is at ../sysdeps/x86_64/strlen.S:31
26              movq    %rdi, %rcx
27              movq    %rdi, %r8
28              andq    $~15,%rdi
29              pxor    %xmm1, %xmm1
30              orl     $0xffffffff, %esi
31              movdqa  (%rdi), %xmm0
32              subq    %rdi, %rcx
33              leaq    16(%rdi), %rdi
34              pcmpeqb %xmm1, %xmm0
35              shl     %cl, %esi


I don't know how to get any more useful information, since the segfault message doesn't include a backtrace or other register values.


This might be the same bug reported as Debian bug 547159:  http://bugs.debian.org/547159

Comment 11 Harald Hoyer 2009-09-18 06:14:18 UTC
hmm, there is only one call to readahead for strlen().

can you attach the files in /var/lib/readahead/ ?

Comment 12 Eric Smith 2009-09-18 06:28:02 UTC
Created attachment 361601 [details]
/var/lib/readahead/custom.early

Comment 13 Eric Smith 2009-09-18 06:28:33 UTC
Created attachment 361602 [details]
/var/lib/readahead/early.sorted

Comment 14 Eric Smith 2009-09-18 06:29:35 UTC
There could be a call to strlen() in something else that readahead calls.  Is there any way to get a segfault during startup to produce a backtrace?

Comment 15 Harald Hoyer 2009-09-18 06:46:46 UTC
hmmm.. maybe s.th. like that?  

# mv /sbin/readahead /sbin/readahead.orig
# cat > /sbin/readahead <<EOF
#!/bin/sh

cd /tmp

ulimit -Sc unlimited

/sbin/readahead.orig "$@"

# chmod 0755 /sbin/readahead

Comment 16 Eric Smith 2009-09-18 08:03:28 UTC
Tried your suggestion.  With that script in place I don't get any mention of readahead in dmesg, not sure why.  Maybe readahead is affected by the current directory being /tmp instead of wherever it normally is?

To make sure it was getting invoked, I added "echo 'starting readahead' >>/tmp/readahead.log" before the cd and a similar message after readahead.orig, and then I get errors from echo that /tmp is a read-only filesystem, so /tmp obviously isn't going to be a suitable place for coredumps at that time.

How does readahead get started, anyhow?  I see the line to enable it in /etc/sysconfig/readahead, but I can't find a reference to it in the startup scripts.  Is it started by something inside the initrd?

Comment 17 Harald Hoyer 2009-09-18 09:26:40 UTC
ah, doh! 

cd /dev

/dev should be writable...

readahead gets started by upstart with /etc/event.d/readahead.event

Comment 18 Eric Smith 2009-09-18 10:14:12 UTC
OK, changed the script to use /dev.  Confirmed that my echos go into /dev/readahead.log, but dmesg has no error and there's no core file.

Moved the original readahead back in place, rebooted, and now I don't get the segfault!  I have no idea what's changed that has fixed it.  :-(

Comment 19 Raphael Geissert 2009-09-26 19:41:23 UTC
(In reply to comment #10)
> This might be the same bug reported as Debian bug 547159: 
> http://bugs.debian.org/547159  

The bug is caused because blkid_devno_to_devname returns /dev/root for the root fs device, in spite of that symlink (created by udev, at least here on Debian) not existing; but I'm not quite sure it is the same bug.

I'm going to apply the following patch on the next upload for Debian:

--- readahead.orig/src/readahead.c
+++ readahead/src/readahead.c
@@ -114,6 +114,8 @@ get_file_device(dev_t dev, struct device
        }
        devices[i].st_dev = dev;
        devices[i].name = blkid_devno_to_devname(dev);
+       if (!devices[i].name)
+               return NULL;
        if (ext2fs_open(devices[i].name, 0, 0, 0, IO_MGR, &fs) || !fs)
                return NULL;
        else {

Comment 20 Raphael Geissert 2009-10-06 14:48:20 UTC
Correcting my previous statement: actually devno_to_devname fails to find a device file, returning a NULL value, which if unhandled is passed to ext2s_open which later causes the segfault. The issue was found in Debian by some users who are using a kernel without initrd and without static device files (explaining the failure of blkid).

Comment 21 Harald Hoyer 2009-10-06 14:57:25 UTC
thx! pushed to upstream git


Note You need to log in before you can comment on or make changes to this bug.