Running /usr/sbin/load_policy -q -b during today's rawhide install produces the
attached backtrace from glibc.
Created attachment 134721 [details]
Per that output, the actual error is happening in libsepol, so it should be
assigned to it, along with corresponding version info.
Actually, sepol_genbools shouldn't be called at all anymore.
What do you have in your /etc/selinux/config?
Should have SETLOCALDEFS=0.
Possibly that should become the default in libselinux now.
sh-3.1# cat /etc/selinux/config
Ok, and what is providing that file? What package provides the initial contents
for /etc/selinux for bootstrapping prior to installation of the normal
Without SETLOCALDEFS=0, it should probe for an /etc/selinux/targeted/booleans*
file, but you shouldn't have those. Do you?
I'd like a copy of the entire /etc/selinux tree that you are using. If you can
just provide the package from which it originates, that would suffice.
Would also be good to know the architecture of your machine.
/etc/selinux/config is created by the tree composition process in anaconda
(scripts/upd-instroot in the anaconda source tree, if you want to look). So we
can add anything into that we need to - just takes an anaconda rebuild and a new
tree to take effect.
The /etc/selinux tree is built from upd-instroot as well. First we install
various selinux packages into a tree (libselinux, libsepol,
selinux-policy-targeted, libsemanage, policy, policycoreutils). Then we create
the poliy by running: /usr/sbin/semodule -b /usr/share/selinux/targeted/base.pp
-n -s targeted. Finally, we pick individual files from the built /etc/selinux
that we want to be in the final installation environment. There is no booleans
I've changed the architecture of the bug report to reflect that this is being
seen on i386. I have not tested any other architectures.
Ok, moved aside my /etc/selinux/targeted directory, re-installed latest
selinux-policy-targeted package, removed SETLOCALDEFS=0 from
/etc/selinux/config, and ran load_policy -b, and reproduced the glibc backtrace.
Seems to be happening upon a malloc call from hashtab_create from symtab_init
from policydb_init from sepol_genbools. I don't see why/how that could cause
corruption. Running it under valgrind reports no errors at all. Any chance
this could be a false positive from glibc?
You can avoid the problem by adding SETLOCALDEFS=1 to the /etc/selinux/config
generated by upd-instroot so that sepol_genbools is not called.
Sorry, I meant SETLOCALDEFS=0.
(In reply to comment #7)
> I don't see why/how that could cause
> corruption. Running it under valgrind reports no errors at all. Any chance
> this could be a false positive from glibc?
I will not doubt glibc. I will not doubt glibc. I will not doubt glibc.
The corruption being reported is a corruption in glibc's own internal lists for
malloc, IIUC, and the actual corruption is happening earlier. sepol_genbools is
just where it is being detected, upon a malloc that attempts to use the
previously freed memory and finds it in a corrupted state. Still investigating
precisely where the corruption is happening, but I suspect sepol_genusers.
How are you debugging this? We have a similar problem elsewhere and are having
difficulties getting a handle on figuring it out.
I'm just trying to debug using gdb and trying some changes in the code to
explore what precisely triggers the glibc backtrace, but I can't say that I
really understand yet what is going on. The glibc developers might have some
helpful hints on how to debug these kinds of errors. I'm still puzzled that
valgrind doesn't report any errors.
Created attachment 134849 [details]
Simpler testcase that triggers the glibc error message
Simpler testcase that triggers the glibc error message for me.
gcc -g -o testcase ./testcase.c /usr/lib/libselinux.a /usr/lib/libsepol.a
Just opens+mmap's the policy file and then runs a
policydb_init();policydb_read();policydb_destroy() sequence in a loop, seems to
hit the glibc error on the third iteration every time for me.
valgrind still reports nothing.
We've been seeing a variety of similar bugs over the last couple days, so CC'ing
Jakub on this problem as well.
Workaround specifically for /sbin/init dying on boot:
mv /sbin/init /sbin/init.bin
MALLOC_CHECK_=0 exec /sbin/init.bin "$@"
' > /sbin/init
chmod +x /sbin/init
chcon --reference=/sbin/init.bin /sbin/init
Just in case I have to bail out before I have more info, this is most probably a
glibc issue. The problem appear immediately after we bail out of sorting chunks
because the limit is reached. This is new code. I'll look at it and Jakub will
know where to look, too.
*** Bug 204027 has been marked as a duplicate of this bug. ***
glibc-2.4.90-25 has the sort iteration cap changes temporarily backed out.
I'm seeing a simlar issue with rpmfc on a package with a large number of files.
it doesn't occur with 10625 files but does with greater than 10656. I'm trying
to narrow it down more - would a reproducer with number be useful or are do you
know what you need to look at?
Created attachment 134918 [details]
Testcase from rpmfc
10652 seems to be the number that triggered it here.
find /usr/lib | head -n 10652 | tee usrfiles
gcc -o rpmdeps -lrpm -lrpmio -lrpmbuild rpmdeps.c
./rpmdeps < usrfiles
Hope that's helpful to make a more generic testcase
*** Bug 203754 has been marked as a duplicate of this bug. ***
*** Bug 203984 has been marked as a duplicate of this bug. ***