Bug 448244
Summary: | yp_all error on kernel 2.6.18-92.el5 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Darren <d-gitelman> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.2 | CC: | staubach, steved |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-11-25 12:32:22 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Darren
2008-05-24 22:01:28 UTC
Some questions... When you say "set up NIS" do you mean set this machine up as a server or client? (I presume client, but I'd like to be sure) What maps are you serving? What does /etc/nsswitch.conf look like on this host? By "automounts" here, you mean to set up NFS mounts in /etc/fstab, correct? When you say "hangs at NFS quota", do you mean that it hangs after printing out this message? Starting NFS quotas: Are you able to do "chkconfig nfs off" and then boot to the newer kernel? If so, does running "service nfs start" hang when starting rquotad? Machine is NIS client. Serving maps: passwd, shadow, group, hosts, rpc, services, netid, protocols, mail nsswitch.conf [root@chinook etc]# more /etc/nsswitch.conf # # /etc/nsswitch.conf # # An example Name Service Switch config file. This file should be # sorted with the most-used services at the beginning. # # The entry '[NOTFOUND=return]' means that the search for an # entry should stop if the search in the previous entry turned # up nothing. Note that if the search failed due to some other reason # (like no NIS server responding) then the search continues with the # next entry. # # Legal entries are: # # nis or yp Use NIS (NIS version 2), also called YP # dns Use DNS (Domain Name Service) # files Use the local files # db Use the local database (.db) files # compat Use NIS on compat mode # hesiod Use Hesiod for user lookups # ldap Use LDAP (only if nss_ldap is installed) # nisplus or nis+ Use NIS+ (NIS version 3), unsupported # [NOTFOUND=return] Stop searching if not found so far # # To use db, put the "db" in front of "files" for entries you want to be # looked up first in the databases # # Example: #passwd: db files ldap nis #shadow: db files ldap nis #group: db files ldap nis passwd: files nis shadow: files group: files nis #hosts: db files ldap nis dns hosts: files nis dns # Example - obey only what ldap tells us... #services: ldap [NOTFOUND=return] files #networks: ldap [NOTFOUND=return] files #protocols: ldap [NOTFOUND=return] files #rpc: ldap [NOTFOUND=return] files #ethers: ldap [NOTFOUND=return] files bootparams: files nis ethers: files nis netmasks: files nis networks: files nis protocols: files nis rpc: files nis services: files nis netgroup: files nis publickey: files nis automount: files aliases: files nis ############################################# Automounts refers to using the automounter (autofs) (exports and auto.*) Yes that's where it hangs. Yes I can do chkconfig nfs off and it will boot. Running service nfs start produces the same error : yp_all: clnt_call: RPC: Timed out Please additionally note the following [root@chinook init.d]# service nfs start Starting NFS services: [ OK ] Starting NFS quotas: yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out yp_all: clnt_call: RPC: Timed out [ OK ] This occurs when the system is booting or if I stop and then start the service. No messages relevant to this error appear in /var/log/messages No matter which kernel starts yp is appropriately bound to the NIS server ypwhich returns the name of the server and ypcat passwd returns the passwords. RPC appears to be running correctly [root@chinook ~]# rpcinfo -p localhost program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100007 2 udp 966 ypbind 100007 1 udp 966 ypbind 100007 2 tcp 969 ypbind 100007 1 tcp 969 ypbind 100024 1 udp 644 status 100024 1 tcp 652 status 100021 1 udp 32784 nlockmgr 100021 3 udp 32784 nlockmgr 100021 4 udp 32784 nlockmgr 100021 1 tcp 43375 nlockmgr 100021 3 tcp 43375 nlockmgr 100021 4 tcp 43375 nlockmgr 100011 1 udp 766 rquotad 100011 2 udp 766 rquotad 100011 1 tcp 790 rquotad 100011 2 tcp 790 rquotad 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 4 udp 2049 nfs 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100005 1 udp 804 mountd [root@chinook ~]# rpcinfo -u localhost ypbind program 100007 version 1 ready and waiting program 100007 version 2 ready and waiting ypdomainname, nisdomainname, hostname all return appropriate values, which are not different between kernels. Searching google returned a suggestion to change the file limits > ulimit -n 256, but this does not resolve the error. I am running the following versions of programs (of course these do not change when switching kernels) ypbind-1.19-8.el5 yp-tools-2.9-0.1 nfs-utils-1.0.9-33.el5 nfs-utils-lib-1.0.8-7.2.z2 nfs4-acl-tools-0.3.1-1.el5.1 One last note: the error also occurs for kernel: 2.6.18-92.1.6.el5 I gave a shot at reproducing this, and haven't been able to. The error messages you're seeing mean that the yp client is trying to query the yp server but it isn't getting responses. I have to wonder whether this symptom is just indicative of some sort of generic network connectivity problem between the yp client and server. Once you boot this machine to the new kernel, are you able to do: # ypcat hosts Can you also ping the yp server? Please see comment #3: "No matter which kernel starts yp is appropriately bound to the NIS server ypwhich returns the name of the server and ypcat passwd returns the passwords." So the answer is yes. This is what is so puzzling. If I boot to kernel 2.6.18-92 or 2.6.18-92.1.6 I get the error. If I boot to 2.6.18-53.1.21 I do not get the error. If I disable the NFS service there is no error. Both kernels show normal yp behavior and return the proper results for ypcat hosts, ypcat password, ypwhich, etc. Of course the user's home directory doesn't mount with NFS turned off but the system recognizes the user's login. Ok, then I'm stumped. At this point you're going to need to do some troubleshooting to narrow down the cause. I recommend opening an RH support case and working with the folks there to narrow down the reason for this. You should be able to refer to this BZ. If it turns out that this problem is due to a bug of some sort then we can transition this BZ to address it. That's too bad. I tried submitting this to RH, but they won't open a support case since I only have academic support which just covers RHN proxy. So I guess this dies here. Are there any suggested forums where I could post this, or do you have suggestions for troubleshooting that I then could submit the results to a forum. Darren Ahh, that is too bad. You'll need to do some legwork to track this down yourself then. You might also check the CentOS forums. It looks like the problem is confined to mountd and rquotad. Since mountd and rquotad both work with the export table, I suspect that the problem is related to something in /etc/exports. You may want to try narrowing down your exports table to see if you can determine if there's one or two that cause the problem. Since it's related to YP, you could also try running with or without nscd and see if it makes a difference. You could also try running mountd or rquotad by hand and seeing if you can replicate the problem with one of them. The goal here is to determine what's changed at the system call level (since most likely, something has). If you strace mountd on both kernels and compare them, that might also give you a hint. Maybe something like this: # strace -f -o /tmp/mountd.strace -tt -T rpc.mountd -F If it turns out to be a bug and you have a description of the problem from which I can work, or (even better) a way for me to reproduce this here, I'll be happy to look further. No word in quite some time. Closing case. Please reopen if you're able to provide more info... |