Bug 102402 - Redhat 9 hangs for graphical login with NIS/NFS user
Summary: Redhat 9 hangs for graphical login with NIS/NFS user
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: autofs
Version: 9
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Nalin Dahyabhai
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-08-14 19:09 UTC by Need Real Name
Modified: 2007-04-18 16:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-01-02 18:50:37 UTC
Embargoed:


Attachments (Terms of Use)

Description Need Real Name 2003-08-14 19:09:05 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2)
Gecko/20030208 Netscape/7.02

Description of problem:
I have 5 Linux machines, 3 of them run Redhat 9 and 2 of them run a version of
Redhat Enterprise Linux. Logging in at the graphical login with any account
works fine on the Redhat Enterprise Linux machines. However, on all 3 of the
Redhat 9 machines, attempting to login with the graphical login causes a
permanent hang in the graphical environment with certain users.

More Details: Logging in at the graphical prompt with a local user works fine. I
could log in as root and as a user found in the /etc/passwd file. However,
logging in with any user that comes from NIS and whose home directory is served
via NFS causes an indefinite hang.

That is, any user that is obtained via NIS and whose directory is mounted via
NFS (using 'yp' and 'autofs' in tandem) causes an indefinite hang. The login box
disappears and the initial blue screen comes up. Normally you would then see a
box appear that shows a bunch of icons as each subsystem loads, and then the
desktop would appear. However, in this case it just hangs and the blue screen
remains forever.

This problem is critical. No one served via NIS can log in with the graphical login.

Things that work fine: Logging in with an NIS user via telnet or ssh works fine.
Logging into a terminal on the local node works fine (i.e. typing ctrl-alt-f1
then logging in).



Version-Release number of selected component (if applicable):
All components came with Redhat 9

How reproducible:
Always

Steps to Reproduce:
1. Configure your system to use NIS authentication and make sure autofs is
turned on. Our NIS and NFS server is a Tru64 machine, but I'm not sure if the
server makes a difference.
2. Attempt to login at the graphical prompt with any NIS user 
    

Actual Results:  The blue screen will come up and it will hang forever. To kill
the hanging process, grep for Xclient in the process list and kill all Xclient
processes.

Expected Results:  Should continue into the x windows environment.

Additional info:

I selected the defaults with everything. Default window manager, etc . . . I
didn't do anything strange with the machine. I didn't install any additional
software.

This behavior is seen on all 3 of my Redhat 9 machines.

Here is the version:
Red Hat Linux release 9 (Shrike)

Workaround: I didn't find any workaround. Didn't try using another window
manager or anyting as that's pretty drastic.

Note: NFS and NIS are working fine. I can easily see my home directory and do
'ypcat' etc . . .

Comment 1 Havoc Pennington 2003-08-14 22:53:45 UTC
Must be related to tru64 or autofs; I have a NIS account and NFS homedir myself
with no problems, and if those generically failed
I'd expect to have heard about this bug more often and before now.


Comment 2 Need Real Name 2003-08-15 02:09:53 UTC
Is there some kind of tracing or logging I can turn on that would help to see 
what's going on? For me it's very easy to reproduce this. Perhaps it is 
something peculiar to my environment, but since it's so easy for me to 
reproduce and since I'm just using a stock Redhat 9 system, I'm surprised more 
people didn't run into this.

Of course who knows how many people use Tru64 as an NIS server with Linux 
clients, but there has to be somebody out there . . . 

Here's a log entry of /var/log/messages of the time I was trying to log in to 
the graphical login. As you notice I tried to log into the graphical log in and 
when that failed logged into the ctrl-alt-f1 terminal at different times:

Aug 14 10:59:12 mymachine gdm(pam_unix)[2050]: session opened for user myuser 
by (uid=0)
Aug 14 11:00:01 mymachine login(pam_unix)[10920]: session opened for user 
myuser by (uid=0)
Aug 14 11:00:01 mymachine  -- myuser[10920]: LOGIN ON tty1 BY myuser
Aug 14 11:00:24 mymachine login(pam_unix)[10920]: session closed for user myuser
Aug 14 11:02:35 mymachine login(pam_unix)[3236]: session opened for user myuser 
by (uid=0)
Aug 14 11:02:35 mymachine  -- myuser[3236]: LOGIN ON tty1 BY myuser
Aug 14 11:02:59 mymachine automount[21456]: attempting to mount entry /home/man
Aug 14 11:02:59 mymachine automount[3309]: lookup(yp): lookup for man failed: 
No such key in map
Aug 14 11:02:59 mymachine automount[21456]: attempting to mount entry /home/man1
Aug 14 11:02:59 mymachine automount[3310]: lookup(yp): lookup for man1 failed: 
No such key in map
Aug 14 11:02:59 mymachine automount[21456]: attempting to mount entry /home/man8
Aug 14 11:02:59 mymachine automount[3311]: lookup(yp): lookup for man8 failed: 
No such key in map
Aug 14 11:04:42 mymachine su(pam_unix)[3407]: session opened for user root by 
myuser(uid=xxx)
Aug 14 11:04:58 mymachine su(pam_unix)[3407]: session closed for user root
Aug 14 11:05:12 mymachine su(pam_unix)[3478]: session opened for user root by 
myuser(uid=xxx)
Aug 14 11:05:53 mymachine gdm(pam_unix)[2050]: session closed for user myuser
Aug 14 11:06:36 mymachine gdm(pam_unix)[2050]: session opened for user myuser 
by (uid=0)
Aug 14 11:07:09 mymachine gdm(pam_unix)[2050]: session closed for user myuser
Aug 14 11:07:21 mymachine login(pam_unix)[2002]: session opened for user myuser 
by LOGIN(uid=0)
Aug 14 11:07:21 mymachine  -- myuser[2002]: LOGIN ON tty2 BY myuser
Aug 14 11:08:28 mymachine shutdown: shutting down for system reboot
Aug 14 11:08:29 mymachine init: Switching to runlevel: 6
Aug 14 11:08:29 mymachine login(pam_unix)[2002]: session closed for user myuser
Aug 14 11:08:29 mymachine login(pam_unix)[3236]: session closed for user myuser
Aug 14 11:08:29 mymachine su(pam_unix)[3478]: session closed for user root

Comment 3 Havoc Pennington 2003-08-15 16:30:55 UTC
Honestly I'm not much of a NIS expert, I'm more of a GUI guy.
cc'ing some people or maybe tech support could help track it down.

Comment 4 Need Real Name 2003-08-17 09:38:37 UTC
OK. Let me know if they want to turn on some kind of tracing or extra 
debugging. The log seems to indicate that once I log into the graphical 
interface, automount tries to mount /home/man, /home/man1, and /home/man8 for 
some reason. Don't know if that's the reason why it's hanging or not . . .

Comment 5 Alexander Larsson 2003-08-18 07:30:13 UTC
Trying to mount man, man1 & man8 on /home looks very suspicious. It leads me to
believe that this is autofs related.

Comment 6 Alexander Larsson 2003-10-06 14:10:06 UTC
Reassigning to autofs, since this seemt to be whats causing the problem.

Comment 7 Ian 2003-11-13 16:11:58 UTC
If it's any help, this problem only seems to affect gdm (Gnome Display
Manager) logins. You can get NIS/NFS logins to work using
the kdm login manager.

Add or change:

DISPLAYMANAGER="KDE"

in /etc/sysconfig/desktop and restart X.

Also gdm uses PAM, but KDM doesn't ?


Comment 8 Need Real Name 2003-11-13 16:55:11 UTC
Thanks for the tip.

Further analysis using the ps -axf command reveals that indeed one of 
the process goes into the uninterruptable sleep state (note the last 
process is in the 'D' state):

----
26665 ?        S      0:00 [gdm-binary]
26708 ?        S      0:00  \_ [gdm-binary]
26709 ?        S      0:00      \_ /usr/X11R6/bin/X :0 -
auth /var/gdm/:0.Xauth vt8
26718 ?        S      0:00      \_ -/bin/csh -c /usr/bin/ssh-
agent /etc/X11/xinit/Xclients
26773 ?        S      0:00          \_ /usr/bin/gnome-session
26774 ?        S      0:00              \_ /usr/bin/ssh-
agent /etc/X11/xinit/Xclients
26777 ?        D      0:00              \_ /usr/libexec/gconf-sanity-
check-2
----

lsof -n -p of the uninterruptible sleep process (26777) reveals:
----
gconf-san 26777 user  cwd    DIR   0,10   10240  698946 /home/user 
(remote:/usr/users/d4/user)
gconf-san 26777 user  rtd    DIR   3,67    4096       2 /
gconf-san 26777 user  txt    REG   3,67   10292  
375531 /usr/libexec/gconf-sanity-check-2
gconf-san 26777 user  mem    REG   3,67  103044  179549 /lib/ld-
2.3.2.so
gconf-san 26777 user  mem    REG   3,67 2426700  
359327 /usr/lib/libgtk-x11-2.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67  449396  
359321 /usr/lib/libgdk-x11-2.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67   97224  
359285 /usr/lib/libatk-1.0.so.0.200.0
gconf-san 26777 user  mem    REG   3,67   74568  
359323 /usr/lib/libgdk_pixbuf-2.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67  132100  
359319 /usr/lib/libpangoxft-1.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67   49156  
359317 /usr/lib/libpangox-1.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67  205388  
359313 /usr/lib/libpango-1.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67  954660  
359231 /usr/lib/libxml2.so.2.5.4
gconf-san 26777 user  mem    REG   3,67   52616  
359138 /usr/lib/libz.so.1.1.4
gconf-san 26777 user  mem    REG   3,67   26896  
359121 /usr/lib/libpopt.so.0.0.0
gconf-san 26777 user  mem    REG   3,67  219304  
359329 /usr/lib/libgconf-2.so.4.1.0
gconf-san 26777 user  mem    REG   3,67  265368  
359305 /usr/lib/libORBit-2.so.0.0.0
gconf-san 26777 user  mem    REG   3,67  211948 2252186 /lib/tls/libm-
2.3.2.so
gconf-san 26777 user  mem    REG   3,67   10944  
359106 /usr/lib/libgmodule-2.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67   15084  179560 /lib/libdl-
2.3.2.so
gconf-san 26777 user  mem    REG   3,67   28456  
359303 /usr/lib/liblinc.so.1.0.0
gconf-san 26777 user  mem    REG   3,67  213996  
359108 /usr/lib/libgobject-2.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67   14948  
359110 /usr/lib/libgthread-2.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67  434960  
359104 /usr/lib/libglib-2.0.so.0.200.1
gconf-san 26777 user  mem    REG   3,67   79744 
2252188 /lib/tls/libpthread-0.29.so
gconf-san 26777 user  mem    REG   3,67    9920  
522459 /usr/X11R6/lib/libXrandr.so.2.0
gconf-san 26777 user  mem    REG   3,67   28016  
522449 /usr/X11R6/lib/libXi.so.6.0
gconf-san 26777 user  mem    REG   3,67   53520  
522441 /usr/X11R6/lib/libXext.so.6.4
gconf-san 26777 user  mem    REG   3,67   70408  
522447 /usr/X11R6/lib/libXft.so.2.1
gconf-san 26777 user  mem    REG   3,67   27132  
522461 /usr/X11R6/lib/libXrender.so.1.2
gconf-san 26777 user  mem    REG   3,67  146088  
359222 /usr/lib/libfontconfig.so.1.0
gconf-san 26777 user  mem    REG   3,67  908016  
522431 /usr/X11R6/lib/libX11.so.6.2
gconf-san 26777 user  mem    REG   3,67  327176  
359218 /usr/lib/libfreetype.so.6.3.2
gconf-san 26777 user  mem    REG   3,67  130104  
359094 /usr/lib/libexpat.so.0.4.0
gconf-san 26777 user  mem    REG   3,67   52472  
179570 /lib/libnss_files-2.3.2.so
gconf-san 26777 user  mem    REG   3,67   43456  
179574 /lib/libnss_nis-2.3.2.so
gconf-san 26777 user  mem    REG   3,67   91604  179564 /lib/libnsl-
2.3.2.so
gconf-san 26777 user  mem    REG   3,67 1531064 2252184 /lib/tls/libc-
2.3.2.so
gconf-san 26777 user    0r   CHR    1,3           66819 /dev/null
gconf-san 26777 user    1w   REG   0,10       0  
699012 /home/user/.xsession-errors (remote:/usr/users/d4/user)
gconf-san 26777 user    2w   REG   0,10       0  
699012 /home/user/.xsession-errors (remote:/usr/users/d4/user)
gconf-san 26777 user    3w   REG   0,10       0  
699110 /home/user/.gconf-test-locking-file (remote:/usr/users/d4/user)
----

So looks like the reason it's hanging is because that process goes 
into uninterruptable sleep state.

Comment 9 Norman Elton 2004-01-20 23:23:20 UTC
I'm experiencing the exact same symptoms on RHEL-3. We've got accounts
to login over NIS/NFS. We are automounting home directories. When a
user logs in, the password is accepted, as well as their SSH
passphrase (the GUI prompt appears). Then everything just hangs. The
"Red Hat" screen doesn't load.

I've noticed that gconfd-2 goes into a coma ('D' listing in ps).
Presumably this is what's getting hung?

Any progress on this one?

Thanks,

Norman

Comment 10 Norman Elton 2004-01-20 23:25:05 UTC
Sorry for the extra e-mail... I've also seen odd automount requests
for man, man1, and man8. Did you figure anything out on this one?
Perhaps related?

Thanks again,

Norman

Comment 11 Norman Elton 2004-01-21 16:09:36 UTC
Here's another update -- I took the automounter (autofs) out of the
picture, and mounted the home directory manually. The problem
persisted. Presumably, the automounter isn't the problem.

I took NFS out of the picture, by copying the home directory to the
local machine. Now the login works fine.

Could gconfd-2, which I know very little about, be getting hung up
trying to create a lock file across NFS?

Thanks for any ideas,

Norman

Comment 12 Norman Elton 2004-01-23 20:14:14 UTC
Our solution - It turns out that the client was unable to communicate
with the server's nlockmgr, due to a firewall issue. I had nlockmgr
configured to use a specific port (4001), but this setting appears to
have not worked during the last reboot.

For whatever reason, everything works fine except the GUI login. When
I adjusted the server's firewall to allow connections to the lock
manager, the GUI login responded.

Norman

Comment 13 Need Real Name 2004-01-23 21:22:24 UTC
Unfortunately, there is no firewall between my machine and the NFS 
server, so the issue is not a firewall in my case.

Comment 14 Norman Elton 2004-02-09 15:45:30 UTC
Here's some information about the mounting to /home/man8, etc. 

I found in the /etc/man.config file...

# If people ask for "man foo" and have "/dir/bin/foo" in their PATH
# and the docs are found in "/dir/man", then no mapping is required.

So if you've got /home/bin in your path, by chance, it will look for /home/man every time 
you run "man". Autofs will try to mount a directory, causing a slight pause.

There is an option (NOAUTOPATH) to turn this behavior off.

Norman

Comment 15 Garret Pick 2004-04-19 21:41:13 UTC
I've also experienced this problem with RHEL-3.0  I think Norman is 
on the correct path with gconfd-2 have difficulties with the lock 
files across the NFS mount.  There's some details here:

http://www.gnome.org/projects/gconf/

I added GCONF_LOCAL_LOCKS=1 to /etc/profile.d/gnome-ssh-askpass.sh 
and this fixed the problem for me.  Unfortunately, starting the 
nfslock service as suggested did not seem to help.

Now, I have a new problem.  Everything appears to work with gnome 
except the gnome-panel does not start completely.  It just shows up 
as a grey box at the bottom of the screen.  I'm able to launch 
everything else from the "start here" icon though.

Garret

Comment 16 aron vrtala 2004-05-12 13:48:09 UTC
This problem seens closely interrelated to Bugzilla Bug 110421. I've
been checking this out on RHEL-3.0 as well as on FC1. Both acting
either as NFS server or client, or even two RHEL-3.0 server/client or
two FC1 server/client combinations. Tru64 Unix acting as client has
also problems with locking. I've observed the same problem with
X-Login: gnome-panel does not complete during login. There is also
problems with locking for email.

Login at gnome causes client system to log something like:
... kernel: lockd: server x.x.x.x not responding, timed out.
... gconfd (user-pid): Failed to get lock for daemon, exiting. Failed
to create or open ... .

Discussion on firewall problems is futile here. I tested it with *no*
firewalling on either hosts and just a switch in between.

Again: all directory operations that do not require locking will
definitively work. Linux NFS Server will cause an fh_verify kernel log
under some - but not all - lockrequests (typically for email).

Aron

Comment 17 Scott Anderson 2004-05-25 03:27:37 UTC
I ran into this problem in RHEL 3.0 when I set the local user account
name to be the same as the user account name for NFS. As a result, the
system couldn't resolve the two accounts and resulted in the following
error message: 'Could not open or create the file
"/home/scott/.gconf-test-locking-file," ... The error was "Permission
denied" (erno=13).'. The /var/log/messages appears identical to what
is shown in comment #2 above. I found two ways to fix this problem.
Delete the local user account and make a dummy local user account
(e.g. localuser) since RedHat requests a local user account in
addition to the root account during initial setup. Or you can edit the
/etc/passwd file. For example, I copied the setting from the master
NFS passwd file on a Sun station and commented out the one that linux
generated for the local account that was causing the conflict. And it
solved my problem (I also made a link to where tcsh is pointing to and
updated /etc/shells since the shells are often in different path
locations on Sun and RedHat workstations):
                                                                     
          
#scott:x:500:500:Scott Anderson:/home/scott:/sbin/tcsh
scott:fkyCwsxxWqP/M:2388:10:Scott Anderson
(CRND):/home/scott:/usr/bin/tcsh
                                                                     
          
You also need to turn off the linux firewall (see iptables and
ip6tables status in Start > System Settings > Server Setting >
Services), turn off advanced power management (/sbin/chkconfig apmd
off) since it causes problems for NIS clients, verify
/etc/nsswitch.conf, verify /etc/fstab, verify /etc/auto.master, and
create mount point directories corresponding to fstab settings.

In my case, I saw the same behavior on both Gnome and KDE.
                                                                     
          
I hope this helps.
                                                                     
          
Scott


Comment 18 aron vrtala 2004-05-26 11:05:10 UTC
On comment #17:

0.) Thanx for commenting. I got the impression that this is as
communincating to a black hole. Send info in and get nothing out.
1.) We don't have twice same uid's. /etc/passwd and /etc/shadow
contains basic configuration only, and we never did create an account
during installation process.
2.) We have the .gconf-test-locking-file problem, but this is
commented by the lockd, who isn't able to lock it on a RHEL 3.0 or FC1
NFS Server, when running gnome login on an RHEL 3.0 client (see
comment #16). FC1 won't report anything.
3.) We have tested it with and without any firewalling and
/etc/hosts.(allow,deny).
4.) I don't understand, why apmd should be responsible for a lockd
trouble. I killed the apmd and repeated without success. May be you
can comment on this.
5.) Our nsswitch is ok in our case and mountpoints are read+write nfsv3.

Our time on this problem is running very low by now, and we only will
be able to wait a few more days until workaround or solution. If not,
we will have to remove RHEL and FC1 as NFS Servers for
user-homedirectories.

I want to repeat: This bug has same root cause as Bugzilla Bug 110421.

Aron

Comment 19 Benjamin Eikel 2004-05-28 15:41:59 UTC
Hello!

I have the same problem as descibed above. So I think it is no Redhat
specific problem but a general linux problem.
I am using Debian GNU/Linux "Sarge" with a self compiled kernel 2.6.6
and KDE 3.2 on server and clients. The users and groups are
distributed by LDAPS. The /home directory on the client is mounted via
NFSv3.

When I try to log in with kdm the authentication works. But then the
loading process stops and nothing happens anymore. I get the error
message "lockd: server XXX not responding, still trying".

If I copy the users home directory to the client machine and try to
log in everything works fine. Furthermore any other logins with ssh or
 shell login work fine. And I can perform all file operations
(reading, writing) from command line in the NFS exported directories.

Please tell me if you find any solution to this problem.

Benjamin Eikel
benjamin at eikel dot org

Comment 20 Benjamin Eikel 2004-05-28 16:47:33 UTC
Hello again!

I solved the problem for me.

It was a problem with the locking daemon on the server. I used the old
kernel command line paramter lockd.udpport=# and lockd.tcpport=#
instead of the new lockd.nlm_udpport=# and lockd.nlm_tcpport=# to the
the ports used by rpc.lockd. Because of this the clients were not able
to communicate to the server throught the firewall. Now everything is
working correctly. This means that the problem has something to do
with the file locking.

Perhaps this helps anybdoy.

Benjamin Eikel
benjamin at eikel dot org

Comment 21 aron vrtala 2004-06-04 13:51:31 UTC
Hello again,

good point. Thinking on kernel 2.6 and NFS V4 I installed FC2 on a
testbox as an NFS server.

The graphical login problem seems do be solved by this.
Some other locking troubles persist (see Bugzilla Bug 110421).

Thanx for your hint.

Aron

Comment 22 Bill Nottingham 2006-08-05 04:08:32 UTC
Red Hat apologizes that these issues have not been resolved yet. We do want to
make sure that no important bugs slip through the cracks.

Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc.
They are maintained by the Fedora Legacy project (http://www.fedoralegacy.org/)
for security updates only. If this is a security issue, please reassign to the
'Fedora Legacy' product in bugzilla. Please note that Legacy security update
support for these products will stop on December 31st, 2006.

If this is not a security issue, please check if this issue is still present
in a current Fedora Core release. If so, please change the product and version
to match, and check the box indicating that the requested information has been
provided.

If you are currently still running Red Hat Linux 7.3 or 9, please note that
Fedora Legacy security update support for these products will stop on December
31st, 2006. You are strongly advised to upgrade to a current Fedora Core release
or Red Hat Enterprise Linux or comparable. Some information on which option may
be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/.

Any bug still open against Red Hat Linux 7.3 or 9 at the end of 2006 will be
closed 'CANTFIX'. Again, if this bug still exists in a current release, or is a
security issue, please change the product as necessary. We thank you for your
help, and apologize again that we haven't handled these issues to this point.


Comment 24 Bill Nottingham 2007-01-02 18:50:37 UTC
Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc.
f you are currently still running Red Hat Linux 7.3 or 9, you are strongly
advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux
or comparable. Some information on which option may be right for you is
available at http://www.redhat.com/rhel/migrate/redhatlinux/.

Closing as CANTFIX.


Note You need to log in before you can comment on or make changes to this bug.