Bug 250233 - genhomedircon hangs system for hours
Summary: genhomedircon hangs system for hours
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: policycoreutils
Version: 7
Hardware: i686
OS: Linux
low
medium
Target Milestone: ---
Assignee: Daniel Walsh
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-31 08:53 UTC by Christian Mandery
Modified: 2007-11-30 22:12 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2007-08-06 08:43:56 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Try this version of genhomedircon. Should be much faster. (11.33 KB, application/octet-stream)
2007-08-01 20:04 UTC, Daniel Walsh
no flags Details

Description Christian Mandery 2007-07-31 08:53:32 UTC
Description of problem:
Every update of selinux-policy-targeted triggers a run of genhomedircon. On a
terminal server with quite some users and the homedirs mounted over NFS,
genhomedircon takes at least 3 hours (we killed it after that since this is a
productive system).

When strace'ing the genhomedircon process, it outputs this (some hundred times
per second).
futex(0x9a93f48, FUTEX_WAKE, 1)         = 0

genhomedircon should skip NFS-mounted homedirs since they do not have SELinux
attributes anyway. An alternative would be to skip running genhomedircon if
SELinux is disabled.

Version-Release number of selected component (if applicable):
Fedora 7 - latest updates installed

selinux-policy-targeted-2.6.4-29.fc7
selinux-policy-2.6.4-29.fc7
libselinux-2.0.14-4.fc7
libselinux-python-2.0.14-4.fc7

How reproducible:
Update selinux-policy-targeted using yum.
It seems to hang after:
  Updating  : selinux-policy-targeted      ####################### [ 7/15]

Now, genhomedircon uses 100% CPU.

If you kill it now:
libsemanage.semanage_exec_prog: Child process /usr/sbin/genhomedircon did not
exit cleanly.
libsemanage.semanage_install_sandbox: genhomedircon returned error code -1.
semodule:  Failed!
  
Actual results:
100% over 3 hours but no visible progress

Expected results:
"genhomedircon skipped, SELinux is disabled." or
"Warning: Skipping NFS homedirs."

Comment 1 Daniel Walsh 2007-07-31 14:41:52 UTC
Add sys.exit(0) to the front of genhomedircon would be an easy work around for now.

Do you know what genhomedircon is doing when it goes wacky?    It does try to
read the entire password database?  Are you running nis or ldap?  Do you have a
huge number of user accounts?



Comment 2 Daniel Walsh 2007-07-31 14:46:32 UTC
vi /usr/sbin/genhomedircon
#! /usr/bin/python
...
import sys, os, pwd, string, getopt, re
sys.exit(0)

Of course the next time policycoreutils gets updated you will loose this change.
You can also remove selinux-policy-targeted from the system

yum remove selinux-policy-targeted


Comment 3 Christian Mandery 2007-07-31 14:50:10 UTC
Okay, I added sys.exit( 0 ) after the import statements in the scripts and this
fixes now at least the problems with the script consuming CPU time.

We're using both Samba Winbind (pam_winbind) and NIS with an indeed huge number
of user accounts (possibly more than 100k), so it could be possible that
genhomedircon tries to enumerate all those users (only a dozen of the Winbind
users have a home directory on our server and the NIS users have their home
directories on the NFS storage which should not be touched by genhomedircon).

As I already posted, everything that genhomedircon does (when watching with
strace) is the following message that repeats some hundred times per second:
futex(0x9a93f48, FUTEX_WAKE, 1)         = 0

Comment 4 Daniel Walsh 2007-07-31 16:44:08 UTC
Does this script give the same behaviour?

#!/usr/bin/python
import pwd
pwd.getpwall()


Comment 5 Christian Mandery 2007-08-01 10:26:36 UTC
No, this one finishes after around one second. I made some tests and found out
that if I pass --nopasswd to genhomedircon, it finishes very fast because it
does not need to lookup the homedirs but uses a static set homedir path.

Anyway, the problematic time-consuming part is the loop starting in line 317 and
in this loop the line checking whether the homedir exists. This takes around 1-2
seconds for every user which has his homedir on our NFS share because you're
implementation of checkExists() is too slow.

Why do you do a ls in the directory and then read all of the output to check
whether it contains our directory to look for, if you do not need all those
regexp magic to only check the existance of a given homedir?

My suggestion would to check if the given argument for checkExist is not a
regexp and in this case just probe if you are able to open the directory
directly and abort if you are not (which either means directory not found or NFS
share with root_squash like in our case).

PS: It fails to enumerate the users from Winbind and just gets the users from
the NIS with the NFS homedirs (which are "only" 4000) because I have disabled
enumerating the user list in Samba:
winbind enum users = no
winbind enum groups = no

We had to do this because lookups in Winbind does not seem to be performed
asynchronously so those enumerations made Winbind hang for some minutes and grew
its cache to several hundred megabytes.

Comment 6 Daniel Walsh 2007-08-01 20:04:49 UTC
Created attachment 160457 [details]
Try this version of genhomedircon.  Should be much faster.

Comment 7 Christian Mandery 2007-08-06 08:43:56 UTC
Fine, now genhomedircon finishes in around ten seconds.

Thank you!


Note You need to log in before you can comment on or make changes to this bug.