Red Hat Bugzilla – Bug 250233
genhomedircon hangs system for hours
Last modified: 2007-11-30 17:12:12 EST
Description of problem:
Every update of selinux-policy-targeted triggers a run of genhomedircon. On a
terminal server with quite some users and the homedirs mounted over NFS,
genhomedircon takes at least 3 hours (we killed it after that since this is a
When strace'ing the genhomedircon process, it outputs this (some hundred times
futex(0x9a93f48, FUTEX_WAKE, 1) = 0
genhomedircon should skip NFS-mounted homedirs since they do not have SELinux
attributes anyway. An alternative would be to skip running genhomedircon if
SELinux is disabled.
Version-Release number of selected component (if applicable):
Fedora 7 - latest updates installed
Update selinux-policy-targeted using yum.
It seems to hang after:
Updating : selinux-policy-targeted ####################### [ 7/15]
Now, genhomedircon uses 100% CPU.
If you kill it now:
libsemanage.semanage_exec_prog: Child process /usr/sbin/genhomedircon did not
libsemanage.semanage_install_sandbox: genhomedircon returned error code -1.
100% over 3 hours but no visible progress
"genhomedircon skipped, SELinux is disabled." or
"Warning: Skipping NFS homedirs."
Add sys.exit(0) to the front of genhomedircon would be an easy work around for now.
Do you know what genhomedircon is doing when it goes wacky? It does try to
read the entire password database? Are you running nis or ldap? Do you have a
huge number of user accounts?
import sys, os, pwd, string, getopt, re
Of course the next time policycoreutils gets updated you will loose this change.
You can also remove selinux-policy-targeted from the system
yum remove selinux-policy-targeted
Okay, I added sys.exit( 0 ) after the import statements in the scripts and this
fixes now at least the problems with the script consuming CPU time.
We're using both Samba Winbind (pam_winbind) and NIS with an indeed huge number
of user accounts (possibly more than 100k), so it could be possible that
genhomedircon tries to enumerate all those users (only a dozen of the Winbind
users have a home directory on our server and the NIS users have their home
directories on the NFS storage which should not be touched by genhomedircon).
As I already posted, everything that genhomedircon does (when watching with
strace) is the following message that repeats some hundred times per second:
futex(0x9a93f48, FUTEX_WAKE, 1) = 0
Does this script give the same behaviour?
No, this one finishes after around one second. I made some tests and found out
that if I pass --nopasswd to genhomedircon, it finishes very fast because it
does not need to lookup the homedirs but uses a static set homedir path.
Anyway, the problematic time-consuming part is the loop starting in line 317 and
in this loop the line checking whether the homedir exists. This takes around 1-2
seconds for every user which has his homedir on our NFS share because you're
implementation of checkExists() is too slow.
Why do you do a ls in the directory and then read all of the output to check
whether it contains our directory to look for, if you do not need all those
regexp magic to only check the existance of a given homedir?
My suggestion would to check if the given argument for checkExist is not a
regexp and in this case just probe if you are able to open the directory
directly and abort if you are not (which either means directory not found or NFS
share with root_squash like in our case).
PS: It fails to enumerate the users from Winbind and just gets the users from
the NIS with the NFS homedirs (which are "only" 4000) because I have disabled
enumerating the user list in Samba:
winbind enum users = no
winbind enum groups = no
We had to do this because lookups in Winbind does not seem to be performed
asynchronously so those enumerations made Winbind hang for some minutes and grew
its cache to several hundred megabytes.
Created attachment 160457 [details]
Try this version of genhomedircon. Should be much faster.
Fine, now genhomedircon finishes in around ten seconds.