Bug 485974

Summary: /var/run/gdm/auth-for* directories not removed.... slows boot!
Product: [Fedora] Fedora Reporter: Tom London <selinux>
Component: gdmAssignee: jmccann
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: bugs, cschalle, dkovalsk, jmccann, ma, mclasen, rstrode, wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-18 22:35:39 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Tom London 2009-02-17 13:56:03 EST
Description of problem:
My rawhide system has 324 directories in /var/run/gdm:

[root@tlondon ~]# ls gdm | wc
    324     324    6480
[root@tlondon ~]# 

They all look like "auth-for-tbl-XXXXX" or "auth-for-gdm-XXXXX".  

No one is clearing these out.

/etc/rc.d/rc.sysinit has the following snippet:

# Clean up /var.
rm -rf /var/lock/cvs/* /var/run/screen/*
find /var/lock /var/run ! -type d -exec rm -f {} \;
rm -f /var/lib/rpm/__db* &> /dev/null
rm -f /var/gdm/.gdmfifo &> /dev/null

Bootcharts indicate that the "find" is taking 4-5 seconds during the boot.  [It takes much less time after everything is cached.]

Manually removing all 324 directories (when in single-user) and rebooting shaves 4-5 seconds off the boot time.

Who/when should these entries be removed?


Version-Release number of selected component (if applicable):
gdm-2.25.2-3.fc11.x86_64

How reproducible:
Every time.....

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Tom London 2009-02-17 14:20:57 EST
Perhaps something like:

--- rc.sysinit	2009-01-28 08:32:51.000000000 -0800
+++ rc.sysinit.new	2009-02-17 11:08:41.000000000 -0800
@@ -748,6 +748,7 @@
 
 # Clean up /var.
 rm -rf /var/lock/cvs/* /var/run/screen/*
+rm -rf /var/run/gdm/auth-for* &> /dev/null
 find /var/lock /var/run ! -type d -exec rm -f {} \;
 rm -f /var/lib/rpm/__db* &> /dev/null
 rm -f /var/gdm/.gdmfifo &> /dev/null


This may be a big hammer, but .....
Comment 2 Will Woods 2009-02-18 01:04:24 EST
If we do change this bit of rc.sysinit, I'd like to see something more like this; It'll be slightly more efficient (less fork/execs) and prevent argument overflow from path expansion as a nice bonus.

--- rc.sysinit.orig	2009-02-18 00:32:18.360296480 -0500
+++ rc.sysinit	2009-02-18 00:58:07.964297197 -0500
@@ -747,10 +747,9 @@
 [ -f /var/run/utmpx -o -f /var/log/wtmpx ] && _NEED_XFILES=1
 
 # Clean up /var.
-rm -rf /var/lock/cvs/* /var/run/screen/*
-find /var/lock /var/run ! -type d -exec rm -f {} \;
-rm -f /var/lib/rpm/__db* &> /dev/null
-rm -f /var/gdm/.gdmfifo &> /dev/null
+find /var/lock/cvs /var/run/screen /var/run/gdm -mindepth 1 -delete &> /dev/null
+find /var/lock /var/run ! -type d -delete
+rm -f /var/lib/rpm/__db* /var/gdm/.gdmfifo &> /dev/null
 
 # Reset pam_console permissions
 [ -x /sbin/pam_console_apply ] && /sbin/pam_console_apply -r

But first, we need to know why gdm is leaving that stuff behind in the first place.
Comment 3 Tom London 2009-02-18 12:44:45 EST
(In reply to comment #2)
> If we do change this bit of rc.sysinit, I'd like to see something more like
> this; It'll be slightly more efficient (less fork/execs) and prevent argument
> overflow from path expansion as a nice bonus.
> 
> --- rc.sysinit.orig 2009-02-18 00:32:18.360296480 -0500
> +++ rc.sysinit 2009-02-18 00:58:07.964297197 -0500
> @@ -747,10 +747,9 @@
>  [ -f /var/run/utmpx -o -f /var/log/wtmpx ] && _NEED_XFILES=1
> 
>  # Clean up /var.
> -rm -rf /var/lock/cvs/* /var/run/screen/*
> -find /var/lock /var/run ! -type d -exec rm -f {} \;
> -rm -f /var/lib/rpm/__db* &> /dev/null
> -rm -f /var/gdm/.gdmfifo &> /dev/null
> +find /var/lock/cvs /var/run/screen /var/run/gdm -mindepth 1 -delete &>
> /dev/null
> +find /var/lock /var/run ! -type d -delete
> +rm -f /var/lib/rpm/__db* /var/gdm/.gdmfifo &> /dev/null
> 
>  # Reset pam_console permissions
>  [ -x /sbin/pam_console_apply ] && /sbin/pam_console_apply -r
> 
> But first, we need to know why gdm is leaving that stuff behind in the first
> place.

I've patched my /etc/rc.d/rc.sysinit as above.  It keeps /var/run/gdm clean (i.e., down to 2 active directories).

Instead of "working hard" to clean up expired session states, would it make sense instead to store this kind of "stuff" in a tmpfs, e.g., /var/ephemeral?

This state would get cleaned up for free, and the above tests/forks/etc. could be bypassed in most cases.  Could wrap the above in something like "find /var/ephemeral -fstype tmpfs ....".
Comment 4 Ray Strode [halfline] 2009-02-18 13:18:27 EST
yea, gdm should learn to clean up after itself...  Must have gotten broken when we moved to storing cookies in subdirs.
Comment 5 Matthias Clasen 2009-03-12 00:57:06 EDT
The code to clean up is in gdm_display_access_file_close which is called from gdm_display_remove_user_authorization. But I can't find any callers for that function.
Comment 6 Ray Strode [halfline] 2009-03-17 13:43:15 EDT
I looked at this a bit and I'm a little puzzled.

It's true gdm_display_remove_user_authorization isn't getting called, but we call display_access_file_close in other places.

display_access_file_close does:

g_unlink (file->priv->path);

and then

auth_dir = g_path_get_dirname (file->priv->path);
g_rmdir (auth_dir);

So ending up with empty directories doesn't make sense to me.  We should either have directories full of stale auth cookies, or no directories at all.

In fact, I can't even get it to happen here.  I always have exactly two directoreis in /var/run/gdm no matter if I hit ctrl-alt-backspace, log out, or initiate a reboot.  There are certainly machines near me where it has happened recently though, so I'm confused.
Comment 7 Ray Strode [halfline] 2009-03-17 13:44:12 EDT
Do you see a message of the form:

"GdmDisplayAccessFile: Unable to remove X11 authority directory '%s': %s"

in your logs somewhere?
Comment 8 Tom London 2009-03-17 13:53:29 EDT
(In reply to comment #7)
> Do you see a message of the form:
> 
> "GdmDisplayAccessFile: Unable to remove X11 authority directory '%s': %s"
> 
> in your logs somewhere?  

[root@tlondon ~]# grep GdmDisplayAccessFile /var/log/messages*
[root@tlondon ~]# 

[root@tlondon ~]# grep GdmDisplayAccessFile /var/log/gdm/*
[root@tlondon ~]# 

Not sure where else to look.....

Here is the current state of /var/run/gdm on my system:

[root@tlondon ~]# ls -l /var/run/gdm
total 80
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-12 11:26 auth-for-gdm-1oG5Eq
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-12 11:47 auth-for-gdm-2vIhaT
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-12 11:26 auth-for-gdm-5bztKQ
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-11 15:12 auth-for-gdm-Asoh8P
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-11 11:24 auth-for-gdm-IJsitE
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-12 07:07 auth-for-gdm-KRkUp7
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-17 08:44 auth-for-gdm-mYfO4G
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-17 06:39 auth-for-gdm-rFza9n
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-11 06:13 auth-for-gdm-RmaXGw
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-12 07:07 auth-for-gdm-TPaCtB
drwxrwxrwt. 2 gdm    gdm    4096 2009-03-12 11:26 auth-for-gdm-Vi7waD
drwxrwxrwt. 2 hobbes hobbes 4096 2009-03-12 11:26 auth-for-hobbes-8ftwlY
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-12 11:47 auth-for-tbl-0vVxD5
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-11 06:13 auth-for-tbl-2iglVU
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-17 06:39 auth-for-tbl-2nAJha
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-12 11:26 auth-for-tbl-C9g8Ne
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-12 07:07 auth-for-tbl-d9jh2Z
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-17 08:46 auth-for-tbl-QxzKJs
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-11 11:24 auth-for-tbl-RXWB8C
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-11 15:12 auth-for-tbl-tztHek
[root@tlondon ~]# 

and

[root@tlondon ~]# ls -l /var/run/gdm/*/*
-rw-------. 1 gdm gdm 65 2009-03-17 08:44 /var/run/gdm/auth-for-gdm-mYfO4G/database
-rw-------. 1 tbl tbl 65 2009-03-17 08:46 /var/run/gdm/auth-for-tbl-QxzKJs/database
[root@tlondon ~]# 

So I have lots of empty directories......
Comment 9 Tom London 2009-03-17 13:59:41 EDT
In case this has something to do with "shutting down", here are the last lines from my ~/.xsession-errors from the last session that left empty directories:

drwxrwxrwt. 2 gdm    gdm    4096 2009-03-17 06:39 auth-for-gdm-rFza9n
drwxrwxrwt. 2 tbl    tbl    4096 2009-03-17 06:39 auth-for-tbl-2nAJha


(audacity:4421): GVFS-RemoteVolumeMonitor-WARNING **: Owner :1.26 of volume monitor org.gtk.Private.GduVolumeMonitor disconnected from the bus; removing drives/volumes/mounts

(audacity:4421): GVFS-RemoteVolumeMonitor-WARNING **: Owner :1.30 of volume monitor org.gtk.Private.GPhoto2VolumeMonitor disconnected from the bus; removing drives/volumes/mounts
gnome-session[3058]: WARNING: Detected that screensaver has left the bus
** Message: Got disconnected from the session message bus; retrying to reconnect every 10 seconds
gnome-session: Fatal IO error 104 (Connection reset by peer) on X server :0.0.
gnome-session: Fatal IO error 104 (Connection reset by peer) on X server :0.0.
gnome-settings-daemon: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
gnome-panel: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.

(gnome-panel:3140): GLib-GObject-CRITICAL **: g_object_run_dispose: assertion `G_IS_OBJECT (object)' failed

(gnome-panel:3140): GLib-GObject-CRITICAL **: g_object_run_dispose: assertion `G_IS_OBJECT (object)' failed
gnome-settings-daemon: Fatal IO error 13 (Permission denied) on X server :0.0.
gtk-window-decorator: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
XIO:  fatal IO error 104 (Connection reset by peer) on X server ":0.0"
      after 481476 requests (481445 known processed) with 0 events remaining.
gtk-window-decorator: Fatal IO error 104 (Connection reset by peer) on X server :0.0.
gnome-screensaver: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
gnome-terminal: Fatal IO error 13 (Permission denied) on X server :0.0.
pidgin: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
pidgin: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.

(gnome-panel:3140): GLib-GObject-CRITICAL **: g_object_run_dispose: assertion `G_IS_OBJECT (object)' failed

(gnome-panel:3140): GLib-GObject-CRITICAL **: g_object_run_dispose: assertion `G_IS_OBJECT (object)' failed

(gnome-panel:3140): GLib-GObject-CRITICAL **: g_object_run_dispose: assertion `G_IS_OBJECT (object)' failed

(gnome-panel:3140): GLib-GObject-CRITICAL **: g_object_run_dispose: assertion `G_IS_OBJECT (object)' failed
Could not parse arguments: Cannot open display:
Comment 10 Ray Strode [halfline] 2009-03-17 14:38:00 EDT
Might be a race with shutdown where teh slave gets killed before it can clean up the directory.

I'm building http://koji.fedoraproject.org/koji/buildinfo?buildID=94530 now. Mind giving it a go?
Comment 11 Tom London 2009-03-17 16:31:10 EDT
OK.  Installed.

Here is the diff after installing this, logging out, logging in, logging out, and logging in again.

[tbl@tlondon ~]$ diff -u g.2 g.3
--- g.2	2009-03-17 13:24:50.000000000 -0700
+++ g.3	2009-03-17 13:26:18.000000000 -0700
@@ -9,13 +9,13 @@
 drwxrwxrwt. 2 gdm    gdm    4096 2009-03-11 06:13 auth-for-gdm-RmaXGw
 drwxrwxrwt. 2 gdm    gdm    4096 2009-03-12 07:07 auth-for-gdm-TPaCtB
 drwxrwxrwt. 2 gdm    gdm    4096 2009-03-12 11:26 auth-for-gdm-Vi7waD
-drwxrwxrwt. 2 gdm    gdm    4096 2009-03-17 13:23 auth-for-gdm-ZWXPtJ
+drwxrwxrwt. 2 gdm    gdm    4096 2009-03-17 13:25 auth-for-gdm-XtvvWT
 drwxrwxrwt. 2 hobbes hobbes 4096 2009-03-12 11:26 auth-for-hobbes-8ftwlY
 drwxrwxrwt. 2 tbl    tbl    4096 2009-03-12 11:47 auth-for-tbl-0vVxD5
 drwxrwxrwt. 2 tbl    tbl    4096 2009-03-11 06:13 auth-for-tbl-2iglVU
 drwxrwxrwt. 2 tbl    tbl    4096 2009-03-17 06:39 auth-for-tbl-2nAJha
+drwxrwxrwt. 2 tbl    tbl    4096 2009-03-17 13:25 auth-for-tbl-bwFNJZ
 drwxrwxrwt. 2 tbl    tbl    4096 2009-03-12 11:26 auth-for-tbl-C9g8Ne
 drwxrwxrwt. 2 tbl    tbl    4096 2009-03-12 07:07 auth-for-tbl-d9jh2Z
-drwxrwxrwt. 2 tbl    tbl    4096 2009-03-17 13:24 auth-for-tbl-f6Xawj
 drwxrwxrwt. 2 tbl    tbl    4096 2009-03-11 11:24 auth-for-tbl-RXWB8C
 drwxrwxrwt. 2 tbl    tbl    4096 2009-03-11 15:12 auth-for-tbl-tztHek
[tbl@tlondon ~]$

I'll test this with a shutdown....
Comment 12 Tom London 2009-03-17 16:41:50 EDT
Wow..... after a restart:

[tbl@tlondon ~]$ ls -l /var/run/gdm
total 8
drwxrwxrwt. 2 gdm gdm 4096 2009-03-17 13:33 auth-for-gdm-RMb5yQ
drwxrwxrwt. 2 tbl tbl 4096 2009-03-17 13:35 auth-for-tbl-Xcmj3P
[tbl@tlondon ~]$ 

What exactly cleared out /var/run/gdm?  Don't see any change in rc.sysinit....?
Comment 13 Tom London 2009-03-17 16:51:34 EDT
After a second restart:

[tbl@tlondon ~]$ ls /var/run/gdm
auth-for-gdm-7gkodJ  auth-for-tbl-qaZAI4
[tbl@tlondon ~]$ 

Looks good. Thanks.
Comment 14 Ray Strode [halfline] 2009-03-17 23:29:11 EDT
Since gdm owns the directory and creates the subdirectories, i'm having it clean up the empty ones.
Comment 15 Tom London 2009-03-18 10:03:56 EDT
I've now tested this through several logoff/login and restart/login cycles with no problems.

I also "hard power cycled", attempting to simulate a hard crash (with gdm not shutting down properly), and the results are the same on the next login:

[tbl@tlondon ~]$ ls /var/run/gdm
auth-for-gdm-1wLk4J  auth-for-tbl-Gpjqaa
[tbl@tlondon ~]$ 

Fix looks solid.....  Thanks!
Comment 16 Matthias Clasen 2009-03-18 22:35:39 EDT
I guess we can close this then...