Running /usr/sbin/load_policy -q -b during today's rawhide install produces the attached backtrace from glibc.
Created attachment 134721 [details] glibc output
Per that output, the actual error is happening in libsepol, so it should be assigned to it, along with corresponding version info.
Actually, sepol_genbools shouldn't be called at all anymore. What do you have in your /etc/selinux/config? Should have SETLOCALDEFS=0. Possibly that should become the default in libselinux now.
sh-3.1# cat /etc/selinux/config SELINUX=permissive SELINUXTYPE=targeted
Ok, and what is providing that file? What package provides the initial contents for /etc/selinux for bootstrapping prior to installation of the normal selinux-policy* packages? Without SETLOCALDEFS=0, it should probe for an /etc/selinux/targeted/booleans* file, but you shouldn't have those. Do you? I'd like a copy of the entire /etc/selinux tree that you are using. If you can just provide the package from which it originates, that would suffice. Would also be good to know the architecture of your machine.
/etc/selinux/config is created by the tree composition process in anaconda (scripts/upd-instroot in the anaconda source tree, if you want to look). So we can add anything into that we need to - just takes an anaconda rebuild and a new tree to take effect. The /etc/selinux tree is built from upd-instroot as well. First we install various selinux packages into a tree (libselinux, libsepol, selinux-policy-targeted, libsemanage, policy, policycoreutils). Then we create the poliy by running: /usr/sbin/semodule -b /usr/share/selinux/targeted/base.pp -n -s targeted. Finally, we pick individual files from the built /etc/selinux that we want to be in the final installation environment. There is no booleans file. I've changed the architecture of the bug report to reflect that this is being seen on i386. I have not tested any other architectures.
Ok, moved aside my /etc/selinux/targeted directory, re-installed latest selinux-policy-targeted package, removed SETLOCALDEFS=0 from /etc/selinux/config, and ran load_policy -b, and reproduced the glibc backtrace. Seems to be happening upon a malloc call from hashtab_create from symtab_init from policydb_init from sepol_genbools. I don't see why/how that could cause corruption. Running it under valgrind reports no errors at all. Any chance this could be a false positive from glibc? You can avoid the problem by adding SETLOCALDEFS=1 to the /etc/selinux/config generated by upd-instroot so that sepol_genbools is not called.
Sorry, I meant SETLOCALDEFS=0.
(In reply to comment #7) > I don't see why/how that could cause > corruption. Running it under valgrind reports no errors at all. Any chance > this could be a false positive from glibc? I will not doubt glibc. I will not doubt glibc. I will not doubt glibc. The corruption being reported is a corruption in glibc's own internal lists for malloc, IIUC, and the actual corruption is happening earlier. sepol_genbools is just where it is being detected, upon a malloc that attempts to use the previously freed memory and finds it in a corrupted state. Still investigating precisely where the corruption is happening, but I suspect sepol_genusers.
How are you debugging this? We have a similar problem elsewhere and are having difficulties getting a handle on figuring it out.
I'm just trying to debug using gdb and trying some changes in the code to explore what precisely triggers the glibc backtrace, but I can't say that I really understand yet what is going on. The glibc developers might have some helpful hints on how to debug these kinds of errors. I'm still puzzled that valgrind doesn't report any errors.
Created attachment 134849 [details] Simpler testcase that triggers the glibc error message Simpler testcase that triggers the glibc error message for me. Build with: gcc -g -o testcase ./testcase.c /usr/lib/libselinux.a /usr/lib/libsepol.a Run with: ./testcase Just opens+mmap's the policy file and then runs a policydb_init();policydb_read();policydb_destroy() sequence in a loop, seems to hit the glibc error on the third iteration every time for me. valgrind still reports nothing.
We've been seeing a variety of similar bugs over the last couple days, so CC'ing Jakub on this problem as well.
Workaround specifically for /sbin/init dying on boot: mv /sbin/init /sbin/init.bin echo '#!/bin/sh MALLOC_CHECK_=0 exec /sbin/init.bin "$@" ' > /sbin/init chmod +x /sbin/init chcon --reference=/sbin/init.bin /sbin/init
Just in case I have to bail out before I have more info, this is most probably a glibc issue. The problem appear immediately after we bail out of sorting chunks because the limit is reached. This is new code. I'll look at it and Jakub will know where to look, too.
*** Bug 204027 has been marked as a duplicate of this bug. ***
glibc-2.4.90-25 has the sort iteration cap changes temporarily backed out.
I'm seeing a simlar issue with rpmfc on a package with a large number of files. it doesn't occur with 10625 files but does with greater than 10656. I'm trying to narrow it down more - would a reproducer with number be useful or are do you know what you need to look at?
Created attachment 134918 [details] Testcase from rpmfc 10652 seems to be the number that triggered it here. find /usr/lib | head -n 10652 | tee usrfiles gcc -o rpmdeps -lrpm -lrpmio -lrpmbuild rpmdeps.c ./rpmdeps < usrfiles Hope that's helpful to make a more generic testcase
*** Bug 203754 has been marked as a duplicate of this bug. ***
*** Bug 203984 has been marked as a duplicate of this bug. ***