Bug 203735 - glibc reporting corrupted double-linked list
glibc reporting corrupted double-linked list
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
rawhide
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
: 203754 203984 204027 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-23 10:57 EDT by Chris Lumens
Modified: 2007-11-30 17:11 EST (History)
13 users (show)

See Also:
Fixed In Version: 2.4.90-25
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-25 09:55:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
glibc output (2.39 KB, text/plain)
2006-08-23 10:57 EDT, Chris Lumens
no flags Details
Simpler testcase that triggers the glibc error message (1.47 KB, text/x-csrc)
2006-08-24 15:28 EDT, Stephen Smalley
no flags Details
Testcase from rpmfc (1.10 KB, text/x-csrc)
2006-08-25 10:27 EDT, Paul Nasrat
no flags Details

  None (edit)
Description Chris Lumens 2006-08-23 10:57:18 EDT
Running /usr/sbin/load_policy -q -b during today's rawhide install produces the
attached backtrace from glibc.
Comment 1 Chris Lumens 2006-08-23 10:57:19 EDT
Created attachment 134721 [details]
glibc output
Comment 2 Stephen Smalley 2006-08-23 11:20:52 EDT
Per that output, the actual error is happening in libsepol, so it should be
assigned to it, along with corresponding version info.
Comment 3 Stephen Smalley 2006-08-23 11:38:16 EDT
Actually, sepol_genbools shouldn't be called at all anymore.
What do you have in your /etc/selinux/config?
Should have SETLOCALDEFS=0.
Possibly that should become the default in libselinux now.
Comment 4 Chris Lumens 2006-08-23 12:02:39 EDT
sh-3.1# cat /etc/selinux/config
SELINUX=permissive
SELINUXTYPE=targeted
Comment 5 Stephen Smalley 2006-08-23 12:23:00 EDT
Ok, and what is providing that file?  What package provides the initial contents
for /etc/selinux for bootstrapping prior to installation of the normal
selinux-policy* packages?

Without SETLOCALDEFS=0, it should probe for an /etc/selinux/targeted/booleans*
file, but you shouldn't have those.  Do you?

I'd like a copy of the entire /etc/selinux tree that you are using.  If you can
just provide the package from which it originates, that would suffice.

Would also be good to know the architecture of your machine.
Comment 6 Chris Lumens 2006-08-23 13:00:43 EDT
/etc/selinux/config is created by the tree composition process in anaconda
(scripts/upd-instroot in the anaconda source tree, if you want to look).  So we
can add anything into that we need to - just takes an anaconda rebuild and a new
tree to take effect.

The /etc/selinux tree is built from upd-instroot as well.  First we install
various selinux packages into a tree (libselinux, libsepol,
selinux-policy-targeted, libsemanage, policy, policycoreutils).  Then we create
the poliy by running:  /usr/sbin/semodule -b /usr/share/selinux/targeted/base.pp
-n -s targeted.  Finally, we pick individual files from the built /etc/selinux
that we want to be in the final installation environment.  There is no booleans
file.

I've changed the architecture of the bug report to reflect that this is being
seen on i386.  I have not tested any other architectures.
Comment 7 Stephen Smalley 2006-08-23 14:13:42 EDT
Ok, moved aside my /etc/selinux/targeted directory, re-installed latest
selinux-policy-targeted package, removed SETLOCALDEFS=0 from
/etc/selinux/config, and ran load_policy -b, and reproduced the glibc backtrace.
 Seems to be happening upon a malloc call from hashtab_create from symtab_init
from policydb_init from sepol_genbools.  I don't see why/how that could cause
corruption.  Running it under valgrind reports no errors at all.  Any chance
this could be a false positive from glibc?

You can avoid the problem by adding SETLOCALDEFS=1 to the /etc/selinux/config
generated by upd-instroot so that sepol_genbools is not called.


Comment 8 Stephen Smalley 2006-08-23 14:23:40 EDT
Sorry, I meant SETLOCALDEFS=0.
Comment 9 Stephen Smalley 2006-08-24 09:23:36 EDT
(In reply to comment #7)
> I don't see why/how that could cause
> corruption.  Running it under valgrind reports no errors at all.  Any chance
> this could be a false positive from glibc?

I will not doubt glibc.  I will not doubt glibc.  I will not doubt glibc.
The corruption being reported is a corruption in glibc's own internal lists for
malloc, IIUC, and the actual corruption is happening earlier.  sepol_genbools is
just where it is being detected, upon a malloc that attempts to use the
previously freed memory and finds it in a corrupted state.  Still investigating
precisely where the corruption is happening, but I suspect sepol_genusers.
Comment 10 Chris Lumens 2006-08-24 13:17:31 EDT
How are you debugging this?  We have a similar problem elsewhere and are having
difficulties getting a handle on figuring it out.
Comment 11 Stephen Smalley 2006-08-24 13:34:51 EDT
I'm just trying to debug using gdb and trying some changes in the code to
explore what precisely triggers the glibc backtrace, but I can't say that I
really understand yet what is going on.  The glibc developers might have some
helpful hints on how to debug these kinds of errors.  I'm still puzzled that
valgrind doesn't report any errors.
Comment 12 Stephen Smalley 2006-08-24 15:28:06 EDT
Created attachment 134849 [details]
Simpler testcase that triggers the glibc error message

Simpler testcase that triggers the glibc error message for me.
Build with:
 gcc -g -o testcase ./testcase.c /usr/lib/libselinux.a /usr/lib/libsepol.a
Run with:
 ./testcase
Just opens+mmap's the policy file and then runs a
policydb_init();policydb_read();policydb_destroy() sequence in a loop, seems to
hit the glibc error on the third iteration every time for me.
valgrind still reports nothing.
Comment 13 Chris Lumens 2006-08-24 15:37:22 EDT
We've been seeing a variety of similar bugs over the last couple days, so CC'ing
Jakub on this problem as well.
Comment 14 Klaus Weidner 2006-08-24 18:22:08 EDT
Workaround specifically for /sbin/init dying on boot:

mv /sbin/init /sbin/init.bin
echo '#!/bin/sh
MALLOC_CHECK_=0 exec /sbin/init.bin "$@"
' > /sbin/init
chmod +x /sbin/init
chcon --reference=/sbin/init.bin /sbin/init
Comment 15 Ulrich Drepper 2006-08-25 01:32:59 EDT
Just in case I have to bail out before I have more info, this is most probably a
glibc issue.  The problem appear immediately after we bail out of sorting chunks
because the limit is reached.  This is new code.  I'll look at it and Jakub will
know where to look, too.
Comment 16 Jakub Jelinek 2006-08-25 09:54:22 EDT
*** Bug 204027 has been marked as a duplicate of this bug. ***
Comment 17 Jakub Jelinek 2006-08-25 09:55:31 EDT
glibc-2.4.90-25 has the sort iteration cap changes temporarily backed out.
Comment 18 Paul Nasrat 2006-08-25 09:56:33 EDT
I'm seeing a simlar issue with rpmfc on a package with a large number of files.

it doesn't occur with 10625 files but does with greater than 10656.  I'm trying
to narrow it down more - would a reproducer with number be useful or are do you
know what you need to look at?
Comment 19 Paul Nasrat 2006-08-25 10:27:31 EDT
Created attachment 134918 [details]
Testcase from rpmfc

10652 seems to be the number that triggered it here.

find /usr/lib | head -n 10652 | tee usrfiles
gcc -o rpmdeps -lrpm -lrpmio -lrpmbuild rpmdeps.c
./rpmdeps < usrfiles

Hope that's helpful to make a more generic testcase
Comment 20 Paul Nasrat 2006-08-25 10:33:52 EDT
*** Bug 203754 has been marked as a duplicate of this bug. ***
Comment 21 Bill Nottingham 2006-08-28 14:48:05 EDT
*** Bug 203984 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.