Bug 151828
Summary: | Mailbox locking problem | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Bernd Bartmann <bernd.bartmann> | ||||||||||
Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Ben Levenson <benl> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 4.0 | CC: | davej, paul+rhbugz | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | RHBA-2005-727 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2005-10-05 17:44:11 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 156322 | ||||||||||||
Attachments: |
|
Description
Bernd Bartmann
2005-03-22 18:02:19 UTC
Is the lock daemon running? Indeed the nfslock daemon seems to be the cause of the problem. service nfslock start gives [Ok] but the rpc.kstatd seems to die just instantly. /var/lock/subsys/nfslock and /var/run/rpc.statd.pid both exist. I already tried to remove both files and start nfslock again. Both files get created again but rpc.statd still does not run. I also could not find any errors in /var/log/messages. Reassinging thusly. Created attachment 112276 [details]
strace of manual rpc.statd start
Created attachment 112363 [details]
output of strace rpm.statd -F -d
hmm... it appears rpc.statd is seg faulting.... --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Would it be possible to allow rpc.statd to drop a core (i.e. by setting ulimit -c unlimited) and then going into gdb and to a backtrace to see where this is happening? Weird, after "ulimit -c unlimited" and "rpc.statd -F -d" I'm still getting no core file at all. The only messages I see are: [root@picard ~]# rpc.statd -F -d 03/28/2005 16:10:36 rpc.statd[32117]: Version 1.0.6 Starting 03/28/2005 16:10:36 rpc.statd[32117]: Flags: No-Daemon Log-STDERR 03/28/2005 16:10:36 rpc.statd[32117]: New state: 37 Segmentation fault [root@picard ~]# rpc.statd -F -d 03/28/2005 16:10:37 rpc.statd[32118]: Version 1.0.6 Starting 03/28/2005 16:10:37 rpc.statd[32118]: Flags: No-Daemon Log-STDERR 03/28/2005 16:10:37 rpc.statd[32118]: New state: 39 Segmentation fault [root@picard ~]# rpc.statd -F -d 03/28/2005 16:10:38 rpc.statd[32119]: Version 1.0.6 Starting 03/28/2005 16:10:38 rpc.statd[32119]: Flags: No-Daemon Log-STDERR 03/28/2005 16:10:38 rpc.statd[32119]: New state: 41 Segmentation fault [root@picard ~]# rpc.statd -F -d 03/28/2005 16:10:39 rpc.statd[32120]: Version 1.0.6 Starting 03/28/2005 16:10:39 rpc.statd[32120]: Flags: No-Daemon Log-STDERR 03/28/2005 16:10:39 rpc.statd[32120]: New state: 43 Segmentation fault Notice that "New state:" always increases by +2. What happens when you start it up in gdb meaning: gdb rpc.statd gdb> run -F -d Here are the results: [root@picard ~]# gdb rpc.statd GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) run -F -d Starting program: /sbin/rpc.statd -F -d (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...03/28/2005 16:19:32 rpc.statd[32315]: Version 1.0.6 Starting 03/28/2005 16:19:32 rpc.statd[32315]: Flags: No-Daemon Log-STDERR 03/28/2005 16:19:32 rpc.statd[32315]: New state: 45 (no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x00d8f195 in main () from /sbin/rpc.statd (gdb) Try the rpc.statd.debug in http://people.redhat.com/steved/bz151828/ Now with rpc.statd.debug: [root@picard ~]# gdb rpc.statd.debug GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) run -F -d Starting program: /root/rpc.statd.debug -F -d 03/28/2005 17:24:41 rpc.statd.debug[2398]: Version 1.0.6 Starting 03/28/2005 17:24:41 rpc.statd.debug[2398]: Flags: No-Daemon Log-STDERR 03/28/2005 17:24:41 rpc.statd.debug[2398]: New state: 49 Program received signal SIGSEGV, Segmentation fault. 0x0068f195 in process_entry (sockfd=7, lp=0x93509b8) at rmtcall.c:95 95 rmtcall.c: No such file or directory. in rmtcall.c (gdb) Please try http://people.redhat.com/steved/bz151828/rpc.statd.1 and see if a message simlar to "reset_my_name: ifap->ifa_addr == NULL" is logged in /var/log/messages Ok, at least it does not seg fault anymore. Even starting it with "service nfslock start" works now, but the mailbox locking still does not work. (gdb) run -F -d Starting program: /root/rpc.statd.1 -F -d 03/28/2005 18:12:36 rpc.statd.1[4503]: Version 1.0.6 Starting 03/28/2005 18:12:36 rpc.statd.1[4503]: Flags: No-Daemon Log-STDERR 03/28/2005 18:12:36 rpc.statd.1[4503]: New state: 51 03/28/2005 18:12:36 rpc.statd.1[4503]: reset_my_name: ifap->ifa_addr == NULL 03/28/2005 18:12:36 rpc.statd.1[4503]: Waiting for reply... (timeo 5) 03/28/2005 18:12:36 rpc.statd.1[4503]: reset_my_name: ifap->ifa_addr == NULL 03/28/2005 18:12:36 rpc.statd.1[4503]: Waiting for reply... (timeo 5) 03/28/2005 18:12:37 rpc.statd.1[4503]: Notification of 192.168.25.2 succeeded. 03/28/2005 18:12:37 rpc.statd.1[4503]: Unlinked /var/lib/nfs/statd/sm.bak/192.168.25.2 03/28/2005 18:12:37 rpc.statd.1[4503]: Waiting for client connections. I forgot to mention that I see the "reset_my_name: ifap->ifa_addr == NULL" only when run as "rpc.statd -F -d" or in gdb but not when I start the service the normal way with "service nfslock start". But I guess theses messages come from the -d switch anyway. hmm... it appears there is an issue with one of your network interfaces. Could you post the output of 'ifconfig -a' as well as runing rpc.statd.2 from http://people.redhat.com/steved/bz151828 which should log the interface its having a problem with. Created attachment 112391 [details]
output of ifconfig -a
I don't see any log messages about a problematic network interface. Running in gdb gives: (gdb) run -F -d Starting program: /root/rpc.statd.2 -F -d 03/28/2005 19:51:44 rpc.statd.2[8374]: Version 1.0.6 Starting 03/28/2005 19:51:44 rpc.statd.2[8374]: Flags: No-Daemon Log-STDERR 03/28/2005 19:51:44 rpc.statd.2[8374]: New state: 57 03/28/2005 19:51:44 rpc.statd.2[8374]: Waiting for client connections. Created attachment 112394 [details] Patch to stop rpc.statd from seg faulting No messages does make sense since on the previous run rpc.statd was able to notify 192.168.25.2 (i.e. "Notification of 192.168.25.2") that it had rebooted (or in this case restarted). So the attached patch should stop rpc.statd from dropping seg faulting Now the reason the lock is still not working could be due to https://bugzilla.redhat.com/beta/show_bug.cgi?id=150151 So what's the conclusion so far? rpc.statd does not seg fault anymore, but locking still does not work. How can we go to fix this problem. Do you need any ethereal packet traces of the client - server communication? Another thing I forgot to ask: I guess your attached patch is already in rpc.statd.2? Do you intend to create an updated rpm? After rebooting my RHES4 server and using rpc.statd.2 locking now suddenly works. Thanks Steve for getting this problem fixed for me. No Thanks you for allowing to find this problem.... it was definitely a team effort... imho... This is fixed in the nfs-utils-1.0.6-55 rpm which maybe in the RHEL4 U1 release but can be found at http://people.redhat.com/steved/bz151828 Thanks for providing the updated rpm. This is also working fine for me. Shall we close this bug now or wait until U1 is out? *** Bug 149201 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-727.html An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-727.html |