From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2 Description of problem: I did a fresh install of RHES4 on my server which formerly had RHES3 on it. My FC3 client mounts the home dirs via NFS from the server. When trying to open a mbox mailbox with mutt now I see a message: Waiting for fcntl-lock... counting up from 1 to 4 and then the mailbox file is opened. With my old RHES3 system the mbox files were opened immediately. I report this bug against mutt but I think it is in fact caused by either the kernel or NFS server. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. try to open a mbox box from a NFS mounted filesystem 2. 3. Additional info:
Is the lock daemon running?
Indeed the nfslock daemon seems to be the cause of the problem. service nfslock start gives [Ok] but the rpc.kstatd seems to die just instantly. /var/lock/subsys/nfslock and /var/run/rpc.statd.pid both exist. I already tried to remove both files and start nfslock again. Both files get created again but rpc.statd still does not run. I also could not find any errors in /var/log/messages.
Reassinging thusly.
Created attachment 112276 [details] strace of manual rpc.statd start
Created attachment 112363 [details] output of strace rpm.statd -F -d
hmm... it appears rpc.statd is seg faulting.... --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Would it be possible to allow rpc.statd to drop a core (i.e. by setting ulimit -c unlimited) and then going into gdb and to a backtrace to see where this is happening?
Weird, after "ulimit -c unlimited" and "rpc.statd -F -d" I'm still getting no core file at all. The only messages I see are: [root@picard ~]# rpc.statd -F -d 03/28/2005 16:10:36 rpc.statd[32117]: Version 1.0.6 Starting 03/28/2005 16:10:36 rpc.statd[32117]: Flags: No-Daemon Log-STDERR 03/28/2005 16:10:36 rpc.statd[32117]: New state: 37 Segmentation fault [root@picard ~]# rpc.statd -F -d 03/28/2005 16:10:37 rpc.statd[32118]: Version 1.0.6 Starting 03/28/2005 16:10:37 rpc.statd[32118]: Flags: No-Daemon Log-STDERR 03/28/2005 16:10:37 rpc.statd[32118]: New state: 39 Segmentation fault [root@picard ~]# rpc.statd -F -d 03/28/2005 16:10:38 rpc.statd[32119]: Version 1.0.6 Starting 03/28/2005 16:10:38 rpc.statd[32119]: Flags: No-Daemon Log-STDERR 03/28/2005 16:10:38 rpc.statd[32119]: New state: 41 Segmentation fault [root@picard ~]# rpc.statd -F -d 03/28/2005 16:10:39 rpc.statd[32120]: Version 1.0.6 Starting 03/28/2005 16:10:39 rpc.statd[32120]: Flags: No-Daemon Log-STDERR 03/28/2005 16:10:39 rpc.statd[32120]: New state: 43 Segmentation fault Notice that "New state:" always increases by +2.
What happens when you start it up in gdb meaning: gdb rpc.statd gdb> run -F -d
Here are the results: [root@picard ~]# gdb rpc.statd GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) run -F -d Starting program: /sbin/rpc.statd -F -d (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...03/28/2005 16:19:32 rpc.statd[32315]: Version 1.0.6 Starting 03/28/2005 16:19:32 rpc.statd[32315]: Flags: No-Daemon Log-STDERR 03/28/2005 16:19:32 rpc.statd[32315]: New state: 45 (no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x00d8f195 in main () from /sbin/rpc.statd (gdb)
Try the rpc.statd.debug in http://people.redhat.com/steved/bz151828/
Now with rpc.statd.debug: [root@picard ~]# gdb rpc.statd.debug GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) run -F -d Starting program: /root/rpc.statd.debug -F -d 03/28/2005 17:24:41 rpc.statd.debug[2398]: Version 1.0.6 Starting 03/28/2005 17:24:41 rpc.statd.debug[2398]: Flags: No-Daemon Log-STDERR 03/28/2005 17:24:41 rpc.statd.debug[2398]: New state: 49 Program received signal SIGSEGV, Segmentation fault. 0x0068f195 in process_entry (sockfd=7, lp=0x93509b8) at rmtcall.c:95 95 rmtcall.c: No such file or directory. in rmtcall.c (gdb)
Please try http://people.redhat.com/steved/bz151828/rpc.statd.1 and see if a message simlar to "reset_my_name: ifap->ifa_addr == NULL" is logged in /var/log/messages
Ok, at least it does not seg fault anymore. Even starting it with "service nfslock start" works now, but the mailbox locking still does not work. (gdb) run -F -d Starting program: /root/rpc.statd.1 -F -d 03/28/2005 18:12:36 rpc.statd.1[4503]: Version 1.0.6 Starting 03/28/2005 18:12:36 rpc.statd.1[4503]: Flags: No-Daemon Log-STDERR 03/28/2005 18:12:36 rpc.statd.1[4503]: New state: 51 03/28/2005 18:12:36 rpc.statd.1[4503]: reset_my_name: ifap->ifa_addr == NULL 03/28/2005 18:12:36 rpc.statd.1[4503]: Waiting for reply... (timeo 5) 03/28/2005 18:12:36 rpc.statd.1[4503]: reset_my_name: ifap->ifa_addr == NULL 03/28/2005 18:12:36 rpc.statd.1[4503]: Waiting for reply... (timeo 5) 03/28/2005 18:12:37 rpc.statd.1[4503]: Notification of 192.168.25.2 succeeded. 03/28/2005 18:12:37 rpc.statd.1[4503]: Unlinked /var/lib/nfs/statd/sm.bak/192.168.25.2 03/28/2005 18:12:37 rpc.statd.1[4503]: Waiting for client connections.
I forgot to mention that I see the "reset_my_name: ifap->ifa_addr == NULL" only when run as "rpc.statd -F -d" or in gdb but not when I start the service the normal way with "service nfslock start". But I guess theses messages come from the -d switch anyway.
hmm... it appears there is an issue with one of your network interfaces. Could you post the output of 'ifconfig -a' as well as runing rpc.statd.2 from http://people.redhat.com/steved/bz151828 which should log the interface its having a problem with.
Created attachment 112391 [details] output of ifconfig -a
I don't see any log messages about a problematic network interface. Running in gdb gives: (gdb) run -F -d Starting program: /root/rpc.statd.2 -F -d 03/28/2005 19:51:44 rpc.statd.2[8374]: Version 1.0.6 Starting 03/28/2005 19:51:44 rpc.statd.2[8374]: Flags: No-Daemon Log-STDERR 03/28/2005 19:51:44 rpc.statd.2[8374]: New state: 57 03/28/2005 19:51:44 rpc.statd.2[8374]: Waiting for client connections.
Created attachment 112394 [details] Patch to stop rpc.statd from seg faulting No messages does make sense since on the previous run rpc.statd was able to notify 192.168.25.2 (i.e. "Notification of 192.168.25.2") that it had rebooted (or in this case restarted). So the attached patch should stop rpc.statd from dropping seg faulting Now the reason the lock is still not working could be due to https://bugzilla.redhat.com/beta/show_bug.cgi?id=150151
So what's the conclusion so far? rpc.statd does not seg fault anymore, but locking still does not work. How can we go to fix this problem. Do you need any ethereal packet traces of the client - server communication?
Another thing I forgot to ask: I guess your attached patch is already in rpc.statd.2? Do you intend to create an updated rpm?
After rebooting my RHES4 server and using rpc.statd.2 locking now suddenly works. Thanks Steve for getting this problem fixed for me.
No Thanks you for allowing to find this problem.... it was definitely a team effort... imho... This is fixed in the nfs-utils-1.0.6-55 rpm which maybe in the RHEL4 U1 release but can be found at http://people.redhat.com/steved/bz151828
Thanks for providing the updated rpm. This is also working fine for me. Shall we close this bug now or wait until U1 is out?
*** Bug 149201 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-727.html