From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 Description of problem: I have 5 Linux machines, 3 of them run Redhat 9 and 2 of them run a version of Redhat Enterprise Linux. Logging in at the graphical login with any account works fine on the Redhat Enterprise Linux machines. However, on all 3 of the Redhat 9 machines, attempting to login with the graphical login causes a permanent hang in the graphical environment with certain users. More Details: Logging in at the graphical prompt with a local user works fine. I could log in as root and as a user found in the /etc/passwd file. However, logging in with any user that comes from NIS and whose home directory is served via NFS causes an indefinite hang. That is, any user that is obtained via NIS and whose directory is mounted via NFS (using 'yp' and 'autofs' in tandem) causes an indefinite hang. The login box disappears and the initial blue screen comes up. Normally you would then see a box appear that shows a bunch of icons as each subsystem loads, and then the desktop would appear. However, in this case it just hangs and the blue screen remains forever. This problem is critical. No one served via NIS can log in with the graphical login. Things that work fine: Logging in with an NIS user via telnet or ssh works fine. Logging into a terminal on the local node works fine (i.e. typing ctrl-alt-f1 then logging in). Version-Release number of selected component (if applicable): All components came with Redhat 9 How reproducible: Always Steps to Reproduce: 1. Configure your system to use NIS authentication and make sure autofs is turned on. Our NIS and NFS server is a Tru64 machine, but I'm not sure if the server makes a difference. 2. Attempt to login at the graphical prompt with any NIS user Actual Results: The blue screen will come up and it will hang forever. To kill the hanging process, grep for Xclient in the process list and kill all Xclient processes. Expected Results: Should continue into the x windows environment. Additional info: I selected the defaults with everything. Default window manager, etc . . . I didn't do anything strange with the machine. I didn't install any additional software. This behavior is seen on all 3 of my Redhat 9 machines. Here is the version: Red Hat Linux release 9 (Shrike) Workaround: I didn't find any workaround. Didn't try using another window manager or anyting as that's pretty drastic. Note: NFS and NIS are working fine. I can easily see my home directory and do 'ypcat' etc . . .
Must be related to tru64 or autofs; I have a NIS account and NFS homedir myself with no problems, and if those generically failed I'd expect to have heard about this bug more often and before now.
Is there some kind of tracing or logging I can turn on that would help to see what's going on? For me it's very easy to reproduce this. Perhaps it is something peculiar to my environment, but since it's so easy for me to reproduce and since I'm just using a stock Redhat 9 system, I'm surprised more people didn't run into this. Of course who knows how many people use Tru64 as an NIS server with Linux clients, but there has to be somebody out there . . . Here's a log entry of /var/log/messages of the time I was trying to log in to the graphical login. As you notice I tried to log into the graphical log in and when that failed logged into the ctrl-alt-f1 terminal at different times: Aug 14 10:59:12 mymachine gdm(pam_unix)[2050]: session opened for user myuser by (uid=0) Aug 14 11:00:01 mymachine login(pam_unix)[10920]: session opened for user myuser by (uid=0) Aug 14 11:00:01 mymachine -- myuser[10920]: LOGIN ON tty1 BY myuser Aug 14 11:00:24 mymachine login(pam_unix)[10920]: session closed for user myuser Aug 14 11:02:35 mymachine login(pam_unix)[3236]: session opened for user myuser by (uid=0) Aug 14 11:02:35 mymachine -- myuser[3236]: LOGIN ON tty1 BY myuser Aug 14 11:02:59 mymachine automount[21456]: attempting to mount entry /home/man Aug 14 11:02:59 mymachine automount[3309]: lookup(yp): lookup for man failed: No such key in map Aug 14 11:02:59 mymachine automount[21456]: attempting to mount entry /home/man1 Aug 14 11:02:59 mymachine automount[3310]: lookup(yp): lookup for man1 failed: No such key in map Aug 14 11:02:59 mymachine automount[21456]: attempting to mount entry /home/man8 Aug 14 11:02:59 mymachine automount[3311]: lookup(yp): lookup for man8 failed: No such key in map Aug 14 11:04:42 mymachine su(pam_unix)[3407]: session opened for user root by myuser(uid=xxx) Aug 14 11:04:58 mymachine su(pam_unix)[3407]: session closed for user root Aug 14 11:05:12 mymachine su(pam_unix)[3478]: session opened for user root by myuser(uid=xxx) Aug 14 11:05:53 mymachine gdm(pam_unix)[2050]: session closed for user myuser Aug 14 11:06:36 mymachine gdm(pam_unix)[2050]: session opened for user myuser by (uid=0) Aug 14 11:07:09 mymachine gdm(pam_unix)[2050]: session closed for user myuser Aug 14 11:07:21 mymachine login(pam_unix)[2002]: session opened for user myuser by LOGIN(uid=0) Aug 14 11:07:21 mymachine -- myuser[2002]: LOGIN ON tty2 BY myuser Aug 14 11:08:28 mymachine shutdown: shutting down for system reboot Aug 14 11:08:29 mymachine init: Switching to runlevel: 6 Aug 14 11:08:29 mymachine login(pam_unix)[2002]: session closed for user myuser Aug 14 11:08:29 mymachine login(pam_unix)[3236]: session closed for user myuser Aug 14 11:08:29 mymachine su(pam_unix)[3478]: session closed for user root
Honestly I'm not much of a NIS expert, I'm more of a GUI guy. cc'ing some people or maybe tech support could help track it down.
OK. Let me know if they want to turn on some kind of tracing or extra debugging. The log seems to indicate that once I log into the graphical interface, automount tries to mount /home/man, /home/man1, and /home/man8 for some reason. Don't know if that's the reason why it's hanging or not . . .
Trying to mount man, man1 & man8 on /home looks very suspicious. It leads me to believe that this is autofs related.
Reassigning to autofs, since this seemt to be whats causing the problem.
If it's any help, this problem only seems to affect gdm (Gnome Display Manager) logins. You can get NIS/NFS logins to work using the kdm login manager. Add or change: DISPLAYMANAGER="KDE" in /etc/sysconfig/desktop and restart X. Also gdm uses PAM, but KDM doesn't ?
Thanks for the tip. Further analysis using the ps -axf command reveals that indeed one of the process goes into the uninterruptable sleep state (note the last process is in the 'D' state): ---- 26665 ? S 0:00 [gdm-binary] 26708 ? S 0:00 \_ [gdm-binary] 26709 ? S 0:00 \_ /usr/X11R6/bin/X :0 - auth /var/gdm/:0.Xauth vt8 26718 ? S 0:00 \_ -/bin/csh -c /usr/bin/ssh- agent /etc/X11/xinit/Xclients 26773 ? S 0:00 \_ /usr/bin/gnome-session 26774 ? S 0:00 \_ /usr/bin/ssh- agent /etc/X11/xinit/Xclients 26777 ? D 0:00 \_ /usr/libexec/gconf-sanity- check-2 ---- lsof -n -p of the uninterruptible sleep process (26777) reveals: ---- gconf-san 26777 user cwd DIR 0,10 10240 698946 /home/user (remote:/usr/users/d4/user) gconf-san 26777 user rtd DIR 3,67 4096 2 / gconf-san 26777 user txt REG 3,67 10292 375531 /usr/libexec/gconf-sanity-check-2 gconf-san 26777 user mem REG 3,67 103044 179549 /lib/ld- 2.3.2.so gconf-san 26777 user mem REG 3,67 2426700 359327 /usr/lib/libgtk-x11-2.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 449396 359321 /usr/lib/libgdk-x11-2.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 97224 359285 /usr/lib/libatk-1.0.so.0.200.0 gconf-san 26777 user mem REG 3,67 74568 359323 /usr/lib/libgdk_pixbuf-2.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 132100 359319 /usr/lib/libpangoxft-1.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 49156 359317 /usr/lib/libpangox-1.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 205388 359313 /usr/lib/libpango-1.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 954660 359231 /usr/lib/libxml2.so.2.5.4 gconf-san 26777 user mem REG 3,67 52616 359138 /usr/lib/libz.so.1.1.4 gconf-san 26777 user mem REG 3,67 26896 359121 /usr/lib/libpopt.so.0.0.0 gconf-san 26777 user mem REG 3,67 219304 359329 /usr/lib/libgconf-2.so.4.1.0 gconf-san 26777 user mem REG 3,67 265368 359305 /usr/lib/libORBit-2.so.0.0.0 gconf-san 26777 user mem REG 3,67 211948 2252186 /lib/tls/libm- 2.3.2.so gconf-san 26777 user mem REG 3,67 10944 359106 /usr/lib/libgmodule-2.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 15084 179560 /lib/libdl- 2.3.2.so gconf-san 26777 user mem REG 3,67 28456 359303 /usr/lib/liblinc.so.1.0.0 gconf-san 26777 user mem REG 3,67 213996 359108 /usr/lib/libgobject-2.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 14948 359110 /usr/lib/libgthread-2.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 434960 359104 /usr/lib/libglib-2.0.so.0.200.1 gconf-san 26777 user mem REG 3,67 79744 2252188 /lib/tls/libpthread-0.29.so gconf-san 26777 user mem REG 3,67 9920 522459 /usr/X11R6/lib/libXrandr.so.2.0 gconf-san 26777 user mem REG 3,67 28016 522449 /usr/X11R6/lib/libXi.so.6.0 gconf-san 26777 user mem REG 3,67 53520 522441 /usr/X11R6/lib/libXext.so.6.4 gconf-san 26777 user mem REG 3,67 70408 522447 /usr/X11R6/lib/libXft.so.2.1 gconf-san 26777 user mem REG 3,67 27132 522461 /usr/X11R6/lib/libXrender.so.1.2 gconf-san 26777 user mem REG 3,67 146088 359222 /usr/lib/libfontconfig.so.1.0 gconf-san 26777 user mem REG 3,67 908016 522431 /usr/X11R6/lib/libX11.so.6.2 gconf-san 26777 user mem REG 3,67 327176 359218 /usr/lib/libfreetype.so.6.3.2 gconf-san 26777 user mem REG 3,67 130104 359094 /usr/lib/libexpat.so.0.4.0 gconf-san 26777 user mem REG 3,67 52472 179570 /lib/libnss_files-2.3.2.so gconf-san 26777 user mem REG 3,67 43456 179574 /lib/libnss_nis-2.3.2.so gconf-san 26777 user mem REG 3,67 91604 179564 /lib/libnsl- 2.3.2.so gconf-san 26777 user mem REG 3,67 1531064 2252184 /lib/tls/libc- 2.3.2.so gconf-san 26777 user 0r CHR 1,3 66819 /dev/null gconf-san 26777 user 1w REG 0,10 0 699012 /home/user/.xsession-errors (remote:/usr/users/d4/user) gconf-san 26777 user 2w REG 0,10 0 699012 /home/user/.xsession-errors (remote:/usr/users/d4/user) gconf-san 26777 user 3w REG 0,10 0 699110 /home/user/.gconf-test-locking-file (remote:/usr/users/d4/user) ---- So looks like the reason it's hanging is because that process goes into uninterruptable sleep state.
I'm experiencing the exact same symptoms on RHEL-3. We've got accounts to login over NIS/NFS. We are automounting home directories. When a user logs in, the password is accepted, as well as their SSH passphrase (the GUI prompt appears). Then everything just hangs. The "Red Hat" screen doesn't load. I've noticed that gconfd-2 goes into a coma ('D' listing in ps). Presumably this is what's getting hung? Any progress on this one? Thanks, Norman
Sorry for the extra e-mail... I've also seen odd automount requests for man, man1, and man8. Did you figure anything out on this one? Perhaps related? Thanks again, Norman
Here's another update -- I took the automounter (autofs) out of the picture, and mounted the home directory manually. The problem persisted. Presumably, the automounter isn't the problem. I took NFS out of the picture, by copying the home directory to the local machine. Now the login works fine. Could gconfd-2, which I know very little about, be getting hung up trying to create a lock file across NFS? Thanks for any ideas, Norman
Our solution - It turns out that the client was unable to communicate with the server's nlockmgr, due to a firewall issue. I had nlockmgr configured to use a specific port (4001), but this setting appears to have not worked during the last reboot. For whatever reason, everything works fine except the GUI login. When I adjusted the server's firewall to allow connections to the lock manager, the GUI login responded. Norman
Unfortunately, there is no firewall between my machine and the NFS server, so the issue is not a firewall in my case.
Here's some information about the mounting to /home/man8, etc. I found in the /etc/man.config file... # If people ask for "man foo" and have "/dir/bin/foo" in their PATH # and the docs are found in "/dir/man", then no mapping is required. So if you've got /home/bin in your path, by chance, it will look for /home/man every time you run "man". Autofs will try to mount a directory, causing a slight pause. There is an option (NOAUTOPATH) to turn this behavior off. Norman
I've also experienced this problem with RHEL-3.0 I think Norman is on the correct path with gconfd-2 have difficulties with the lock files across the NFS mount. There's some details here: http://www.gnome.org/projects/gconf/ I added GCONF_LOCAL_LOCKS=1 to /etc/profile.d/gnome-ssh-askpass.sh and this fixed the problem for me. Unfortunately, starting the nfslock service as suggested did not seem to help. Now, I have a new problem. Everything appears to work with gnome except the gnome-panel does not start completely. It just shows up as a grey box at the bottom of the screen. I'm able to launch everything else from the "start here" icon though. Garret
This problem seens closely interrelated to Bugzilla Bug 110421. I've been checking this out on RHEL-3.0 as well as on FC1. Both acting either as NFS server or client, or even two RHEL-3.0 server/client or two FC1 server/client combinations. Tru64 Unix acting as client has also problems with locking. I've observed the same problem with X-Login: gnome-panel does not complete during login. There is also problems with locking for email. Login at gnome causes client system to log something like: ... kernel: lockd: server x.x.x.x not responding, timed out. ... gconfd (user-pid): Failed to get lock for daemon, exiting. Failed to create or open ... . Discussion on firewall problems is futile here. I tested it with *no* firewalling on either hosts and just a switch in between. Again: all directory operations that do not require locking will definitively work. Linux NFS Server will cause an fh_verify kernel log under some - but not all - lockrequests (typically for email). Aron
I ran into this problem in RHEL 3.0 when I set the local user account name to be the same as the user account name for NFS. As a result, the system couldn't resolve the two accounts and resulted in the following error message: 'Could not open or create the file "/home/scott/.gconf-test-locking-file," ... The error was "Permission denied" (erno=13).'. The /var/log/messages appears identical to what is shown in comment #2 above. I found two ways to fix this problem. Delete the local user account and make a dummy local user account (e.g. localuser) since RedHat requests a local user account in addition to the root account during initial setup. Or you can edit the /etc/passwd file. For example, I copied the setting from the master NFS passwd file on a Sun station and commented out the one that linux generated for the local account that was causing the conflict. And it solved my problem (I also made a link to where tcsh is pointing to and updated /etc/shells since the shells are often in different path locations on Sun and RedHat workstations): #scott:x:500:500:Scott Anderson:/home/scott:/sbin/tcsh scott:fkyCwsxxWqP/M:2388:10:Scott Anderson (CRND):/home/scott:/usr/bin/tcsh You also need to turn off the linux firewall (see iptables and ip6tables status in Start > System Settings > Server Setting > Services), turn off advanced power management (/sbin/chkconfig apmd off) since it causes problems for NIS clients, verify /etc/nsswitch.conf, verify /etc/fstab, verify /etc/auto.master, and create mount point directories corresponding to fstab settings. In my case, I saw the same behavior on both Gnome and KDE. I hope this helps. Scott
On comment #17: 0.) Thanx for commenting. I got the impression that this is as communincating to a black hole. Send info in and get nothing out. 1.) We don't have twice same uid's. /etc/passwd and /etc/shadow contains basic configuration only, and we never did create an account during installation process. 2.) We have the .gconf-test-locking-file problem, but this is commented by the lockd, who isn't able to lock it on a RHEL 3.0 or FC1 NFS Server, when running gnome login on an RHEL 3.0 client (see comment #16). FC1 won't report anything. 3.) We have tested it with and without any firewalling and /etc/hosts.(allow,deny). 4.) I don't understand, why apmd should be responsible for a lockd trouble. I killed the apmd and repeated without success. May be you can comment on this. 5.) Our nsswitch is ok in our case and mountpoints are read+write nfsv3. Our time on this problem is running very low by now, and we only will be able to wait a few more days until workaround or solution. If not, we will have to remove RHEL and FC1 as NFS Servers for user-homedirectories. I want to repeat: This bug has same root cause as Bugzilla Bug 110421. Aron
Hello! I have the same problem as descibed above. So I think it is no Redhat specific problem but a general linux problem. I am using Debian GNU/Linux "Sarge" with a self compiled kernel 2.6.6 and KDE 3.2 on server and clients. The users and groups are distributed by LDAPS. The /home directory on the client is mounted via NFSv3. When I try to log in with kdm the authentication works. But then the loading process stops and nothing happens anymore. I get the error message "lockd: server XXX not responding, still trying". If I copy the users home directory to the client machine and try to log in everything works fine. Furthermore any other logins with ssh or shell login work fine. And I can perform all file operations (reading, writing) from command line in the NFS exported directories. Please tell me if you find any solution to this problem. Benjamin Eikel benjamin at eikel dot org
Hello again! I solved the problem for me. It was a problem with the locking daemon on the server. I used the old kernel command line paramter lockd.udpport=# and lockd.tcpport=# instead of the new lockd.nlm_udpport=# and lockd.nlm_tcpport=# to the the ports used by rpc.lockd. Because of this the clients were not able to communicate to the server throught the firewall. Now everything is working correctly. This means that the problem has something to do with the file locking. Perhaps this helps anybdoy. Benjamin Eikel benjamin at eikel dot org
Hello again, good point. Thinking on kernel 2.6 and NFS V4 I installed FC2 on a testbox as an NFS server. The graphical login problem seems do be solved by this. Some other locking troubles persist (see Bugzilla Bug 110421). Thanx for your hint. Aron
Red Hat apologizes that these issues have not been resolved yet. We do want to make sure that no important bugs slip through the cracks. Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc. They are maintained by the Fedora Legacy project (http://www.fedoralegacy.org/) for security updates only. If this is a security issue, please reassign to the 'Fedora Legacy' product in bugzilla. Please note that Legacy security update support for these products will stop on December 31st, 2006. If this is not a security issue, please check if this issue is still present in a current Fedora Core release. If so, please change the product and version to match, and check the box indicating that the requested information has been provided. If you are currently still running Red Hat Linux 7.3 or 9, please note that Fedora Legacy security update support for these products will stop on December 31st, 2006. You are strongly advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux or comparable. Some information on which option may be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/. Any bug still open against Red Hat Linux 7.3 or 9 at the end of 2006 will be closed 'CANTFIX'. Again, if this bug still exists in a current release, or is a security issue, please change the product as necessary. We thank you for your help, and apologize again that we haven't handled these issues to this point.
Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc. f you are currently still running Red Hat Linux 7.3 or 9, you are strongly advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux or comparable. Some information on which option may be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/. Closing as CANTFIX.