I am starting to get errors like the following with gdm-binary Feb 5 11:19:38 hostname gdm-binary[16120]: CRITICAL: could not add display to access file: Too many open files with the result that new users can't connect and just get a black screen. If I examine lsof there are many entries like ... gdm-binar 16120 root 46u REG 253,0 54 400308 /var/run/gdm/auth-for-gdm-Tpyyg8/database ... gdm-binar 16120 root 1021u REG 253,0 55 417121 /var/run/gdm/auth-for-username-BON1Sx/database gdm-binar 16120 root 1022u REG 253,0 55 417126 /var/run/gdm/auth-for-gdm-5lepCn/database but we don't have anything like 1000 users connected. hence it looks like gdm is opening these xauth files but (at least in some cases) not closing them, and eventually exhausting its file allocation. The host involved receives almost all its connections from remote XDMCP connections.
We also have problems with connecting users to GDM, when number of open files exceeds 1000: # cat /var/log/messages |grep database ... Feb 24 18:25:10 hostname gdm-binary[3759]: CRITICAL: could not create display access file: Unable to open '/var/run/gdm/auth-for-gdm-mz2yRV/database': Too many open files # lsof -n |grep -c database ... 1016 Increasing max number of open files didn't help: # ulimit -a ... open files (-n) 32768 A temporary workaround for a few days is: # killall gdm-binary Version-Release number of selected component (if applicable): 2.6.27.41-170.2.117.fc10.x86_64
Created attachment 399058 [details] Possible patch I think the problem is that the xdmcp session was exiting without cleaning up properly. Something along the lines of the attached patch is needed. I have tested it (briefly) and the file descriptors are now closed at the end of the session.
Created attachment 399120 [details] gdm-2.24.1 patch (In reply to comment #2) We've adapted your patch for Gdm version 2.24.1-4 on our LTSP server. At a first glance everything is OK now with closing database files. We will check number of open files periodically and i'll give later the results. Thank you.
Related upstream bug: https://bugzilla.gnome.org/show_bug.cgi?id=606724
(In reply to comment #4) > Related upstream bug: > > https://bugzilla.gnome.org/show_bug.cgi?id=606724 I do not think so, bug #606724 definitely answer "Maximum number of open XDMCP sessions from host" - that is not related to leak pure leak of file descriptors not been closed, but session management level.
(In reply to comment #5) > (In reply to comment #4) > > Related upstream bug: > > > > https://bugzilla.gnome.org/show_bug.cgi?id=606724 > > I do not think so, bug #606724 definitely answer "Maximum number of open XDMCP > sessions from host" - that is not related to leak pure leak of file descriptors > not been closed, but session management level. I thought that at first, but if you compare the patch for this bug with the patch for the other, then they are both trying to do the same sort of thing in a different way. So along the problems seem different, I think they have the same underlying cause of a finished xdmcp session not being cleaned up properly.
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > Related upstream bug: > > > > > > https://bugzilla.gnome.org/show_bug.cgi?id=606724 [...] > I thought that at first, but if you compare the patch for this bug with the > patch for the other, then they are both trying to do the same sort of thing in > a different way. So along the problems seem different, I think they have the > same underlying cause of a finished xdmcp session not being cleaned up > properly. As you are author of original patch that works fine for us in intensive LTSP environment (we have approx 20...40 xdmcp terminal users per day) we can test patch from https://bugzilla.gnome.org/show_bug.cgi?id=606724 to see if it will have same behavior i.e. fix descriptors leak on users disconnect... BTW thats seems a regression from previous version of GDM we used with before upgrading to FC10
(In reply to comment #7) [...] > As you are author of original patch that works fine for us in intensive LTSP > environment (we have approx 20...40 xdmcp terminal users per day) we can test > patch from https://bugzilla.gnome.org/show_bug.cgi?id=606724 to see if it will > have same behavior i.e. fix descriptors leak on users disconnect... I tested patch but situation is worth - it even unable to login. /var/log/messages contains line: Mar 13 13:28:45 elbrus gdm-binary[29052]: WARNING: GdmXdmcpDisplayFactory: Failed to look up session id 1680284739 Mar 13 13:28:45 elbrus gdm-binary[29052]: WARNING: GdmXdmcpDisplayFactory: Failed to look up session id 1680284740 Mar 13 13:28:45 elbrus gdm-binary[29052]: WARNING: GdmXdmcpDisplayFactory: Failed to look up session id 1680284741 Mar 13 13:28:45 elbrus gdm-binary[29052]: WARNING: GdmXdmcpDisplayFactory: Failed to look up session id 1680284742 [...] Mar 13 13:28:46 elbrus gdm-binary[29052]: WARNING: GdmXdmcpDisplayFactory: Failed to look up session id 1680285750 Mar 13 13:28:46 elbrus gdm-binary[29052]: WARNING: GdmXdmcpDisplayFactory: Failed to look up session id 1680285751 Mar 13 13:28:46 elbrus gdm-binary[29052]: WARNING: GdmXdmcpDisplayFactory: Failed to look up session id 1680285752 Mar 13 13:28:46 elbrus gdm-binary[29052]: CRITICAL: could not add display to access file: Too many open files Mar 13 13:28:46 elbrus gdm-binary[29052]: WARNING: Unable to set up access control for display 1 Mar 13 13:28:46 elbrus gdm-binary[29052]: WARNING: GdmDisplay: display lasted 0,000427 seconds Mar 13 13:28:48 elbrus gdm-binary[29052]: CRITICAL: could not add display to access file: Too many open files Mar 13 13:28:48 elbrus gdm-binary[29052]: WARNING: Unable to set up access control for display 1 Mar 13 13:28:48 elbrus gdm-binary[29052]: WARNING: GdmDisplay: display lasted 0,000593 seconds Mar 13 13:28:49 elbrus gdm-simple-greeter[29136]: WARNING: Could not ask power manager if user can suspend: The name org.free Mar 13 13:28:49 elbrus gdm-simple-greeter[29136]: WARNING: Could not ask power manager if user can suspend: The name org.free Mar 13 13:28:49 elbrus gdm-simple-greeter[29136]: WARNING: Unable to run ck-history: Помилка виконання дочірнього процесу "ck Mar 13 13:28:52 elbrus gdm-binary[29052]: CRITICAL: could not add display to access file: Too many open files Mar 13 13:28:52 elbrus gdm-binary[29052]: WARNING: Unable to set up access control for display 1 Mar 13 13:28:52 elbrus gdm-binary[29052]: WARNING: GdmDisplay: display lasted 0,000371 seconds Mar 13 13:29:00 elbrus gdm-binary[29052]: CRITICAL: could not add display to access file: Too many open files Mar 13 13:29:00 elbrus gdm-binary[29052]: WARNING: Unable to set up access control for display 1 Mar 13 13:29:00 elbrus gdm-binary[29052]: WARNING: GdmDisplay: display lasted 0,000397 seconds Mar 13 13:29:04 elbrus init: prefdm main process ended, respawning May be proposed patch from https://bugzilla.gnome.org/show_bug.cgi?id=606724 misbehave with gdm-2.24.1-fix-xdmcp.patch present in rpm package?
Of the two patches I like mine better as I think it is a cleaner way to do it, and I am not convinced the other patch does a full clean up. Where I am less clear is if this is the best place to set up this finish hook - I have dipped into the code rather than studying it fully and the hook might fit better somewhere else, eg. in gdm-xdmcp-display-factory.c.
(In reply to comment #9) > Of the two patches I like mine better as I think it is a cleaner way to do it, Your patch really works in our configuration, everything seems to be OK now.
Created attachment 403776 [details] An alternate patch Here is another possible patch. This moves the code to a lower level, and is more minimal, though a bit of a hack. It does still fix the file leak though.
The patch in Comment 2 is now upstream in 2.30.1 onwards http://git.gnome.org/browse/gdm/commit/?id=0c34aa7949bc24a2a8b3217cefb3c978b892591b Can we have it back-ported to Fedora 12 please?
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This works in Fedora 14 (and I would guess F13 as well from the gdm version).