203735 – glibc reporting corrupted double-linked list

Bug 203735 - glibc reporting corrupted double-linked list

Summary: glibc reporting corrupted double-linked list

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	rawhide
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	203754 203984 204027 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-08-23 14:57 UTC by Chris Lumens
Modified:	2007-11-30 22:11 UTC (History)
CC List:	13 users (show)
Fixed In Version:	2.4.90-25
Clone Of:
Environment:
Last Closed:	2006-08-25 13:55:31 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glibc output (2.39 KB, text/plain) 2006-08-23 14:57 UTC, Chris Lumens	no flags	Details
Simpler testcase that triggers the glibc error message (1.47 KB, text/x-csrc) 2006-08-24 19:28 UTC, Stephen Smalley	no flags	Details
Testcase from rpmfc (1.10 KB, text/x-csrc) 2006-08-25 14:27 UTC, Paul Nasrat	no flags	Details
View All

Description Chris Lumens 2006-08-23 14:57:18 UTC

Running /usr/sbin/load_policy -q -b during today's rawhide install produces the
attached backtrace from glibc.

Comment 1 Chris Lumens 2006-08-23 14:57:19 UTC

Created attachment 134721 [details]
glibc output

Comment 2 Stephen Smalley 2006-08-23 15:20:52 UTC

Per that output, the actual error is happening in libsepol, so it should be
assigned to it, along with corresponding version info.

Comment 3 Stephen Smalley 2006-08-23 15:38:16 UTC

Actually, sepol_genbools shouldn't be called at all anymore.
What do you have in your /etc/selinux/config?
Should have SETLOCALDEFS=0.
Possibly that should become the default in libselinux now.

Comment 4 Chris Lumens 2006-08-23 16:02:39 UTC

sh-3.1# cat /etc/selinux/config
SELINUX=permissive
SELINUXTYPE=targeted

Comment 5 Stephen Smalley 2006-08-23 16:23:00 UTC

Ok, and what is providing that file?  What package provides the initial contents
for /etc/selinux for bootstrapping prior to installation of the normal
selinux-policy* packages?

Without SETLOCALDEFS=0, it should probe for an /etc/selinux/targeted/booleans*
file, but you shouldn't have those.  Do you?

I'd like a copy of the entire /etc/selinux tree that you are using.  If you can
just provide the package from which it originates, that would suffice.

Would also be good to know the architecture of your machine.

Comment 6 Chris Lumens 2006-08-23 17:00:43 UTC

/etc/selinux/config is created by the tree composition process in anaconda
(scripts/upd-instroot in the anaconda source tree, if you want to look).  So we
can add anything into that we need to - just takes an anaconda rebuild and a new
tree to take effect.

The /etc/selinux tree is built from upd-instroot as well.  First we install
various selinux packages into a tree (libselinux, libsepol,
selinux-policy-targeted, libsemanage, policy, policycoreutils).  Then we create
the poliy by running:  /usr/sbin/semodule -b /usr/share/selinux/targeted/base.pp
-n -s targeted.  Finally, we pick individual files from the built /etc/selinux
that we want to be in the final installation environment.  There is no booleans
file.

I've changed the architecture of the bug report to reflect that this is being
seen on i386.  I have not tested any other architectures.

Comment 7 Stephen Smalley 2006-08-23 18:13:42 UTC

Ok, moved aside my /etc/selinux/targeted directory, re-installed latest
selinux-policy-targeted package, removed SETLOCALDEFS=0 from
/etc/selinux/config, and ran load_policy -b, and reproduced the glibc backtrace.
 Seems to be happening upon a malloc call from hashtab_create from symtab_init
from policydb_init from sepol_genbools.  I don't see why/how that could cause
corruption.  Running it under valgrind reports no errors at all.  Any chance
this could be a false positive from glibc?

You can avoid the problem by adding SETLOCALDEFS=1 to the /etc/selinux/config
generated by upd-instroot so that sepol_genbools is not called.

Comment 8 Stephen Smalley 2006-08-23 18:23:40 UTC

Sorry, I meant SETLOCALDEFS=0.

Comment 9 Stephen Smalley 2006-08-24 13:23:36 UTC

(In reply to comment #7)
> I don't see why/how that could cause
> corruption.  Running it under valgrind reports no errors at all.  Any chance
> this could be a false positive from glibc?

I will not doubt glibc.  I will not doubt glibc.  I will not doubt glibc.
The corruption being reported is a corruption in glibc's own internal lists for
malloc, IIUC, and the actual corruption is happening earlier.  sepol_genbools is
just where it is being detected, upon a malloc that attempts to use the
previously freed memory and finds it in a corrupted state.  Still investigating
precisely where the corruption is happening, but I suspect sepol_genusers.

Comment 10 Chris Lumens 2006-08-24 17:17:31 UTC

How are you debugging this?  We have a similar problem elsewhere and are having
difficulties getting a handle on figuring it out.

Comment 11 Stephen Smalley 2006-08-24 17:34:51 UTC

I'm just trying to debug using gdb and trying some changes in the code to
explore what precisely triggers the glibc backtrace, but I can't say that I
really understand yet what is going on.  The glibc developers might have some
helpful hints on how to debug these kinds of errors.  I'm still puzzled that
valgrind doesn't report any errors.

Comment 12 Stephen Smalley 2006-08-24 19:28:06 UTC

Created attachment 134849 [details]
Simpler testcase that triggers the glibc error message

Simpler testcase that triggers the glibc error message for me.
Build with:
 gcc -g -o testcase ./testcase.c /usr/lib/libselinux.a /usr/lib/libsepol.a
Run with:
 ./testcase
Just opens+mmap's the policy file and then runs a
policydb_init();policydb_read();policydb_destroy() sequence in a loop, seems to
hit the glibc error on the third iteration every time for me.
valgrind still reports nothing.

Comment 13 Chris Lumens 2006-08-24 19:37:22 UTC

We've been seeing a variety of similar bugs over the last couple days, so CC'ing
Jakub on this problem as well.

Comment 14 Klaus Weidner 2006-08-24 22:22:08 UTC

Workaround specifically for /sbin/init dying on boot:

mv /sbin/init /sbin/init.bin
echo '#!/bin/sh
MALLOC_CHECK_=0 exec /sbin/init.bin "$@"
' > /sbin/init
chmod +x /sbin/init
chcon --reference=/sbin/init.bin /sbin/init

Comment 15 Ulrich Drepper 2006-08-25 05:32:59 UTC

Just in case I have to bail out before I have more info, this is most probably a
glibc issue.  The problem appear immediately after we bail out of sorting chunks
because the limit is reached.  This is new code.  I'll look at it and Jakub will
know where to look, too.

Comment 16 Jakub Jelinek 2006-08-25 13:54:22 UTC

*** Bug 204027 has been marked as a duplicate of this bug. ***

Comment 17 Jakub Jelinek 2006-08-25 13:55:31 UTC

glibc-2.4.90-25 has the sort iteration cap changes temporarily backed out.

Comment 18 Paul Nasrat 2006-08-25 13:56:33 UTC

I'm seeing a simlar issue with rpmfc on a package with a large number of files.

it doesn't occur with 10625 files but does with greater than 10656.  I'm trying
to narrow it down more - would a reproducer with number be useful or are do you
know what you need to look at?

Comment 19 Paul Nasrat 2006-08-25 14:27:31 UTC

Created attachment 134918 [details]
Testcase from rpmfc

10652 seems to be the number that triggered it here.

find /usr/lib | head -n 10652 | tee usrfiles
gcc -o rpmdeps -lrpm -lrpmio -lrpmbuild rpmdeps.c
./rpmdeps < usrfiles

Hope that's helpful to make a more generic testcase

Comment 20 Paul Nasrat 2006-08-25 14:33:52 UTC

*** Bug 203754 has been marked as a duplicate of this bug. ***

Comment 21 Bill Nottingham 2006-08-28 18:48:05 UTC

*** Bug 203984 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.