Bug 151828 - Mailbox locking problem
Summary: Mailbox locking problem
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: nfs-utils
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Steve Dickson
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks: 156322
TreeView+ depends on / blocked
 
Reported: 2005-03-22 18:02 UTC by Bernd Bartmann
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHBA-2005-727
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-05 17:44:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
strace of manual rpc.statd start (2.70 KB, text/plain)
2005-03-23 19:16 UTC, Bernd Bartmann
no flags Details
output of strace rpm.statd -F -d (23.42 KB, text/plain)
2005-03-26 20:45 UTC, Bernd Bartmann
no flags Details
output of ifconfig -a (3.05 KB, text/plain)
2005-03-28 17:54 UTC, Bernd Bartmann
no flags Details
Patch to stop rpc.statd from seg faulting (1021 bytes, patch)
2005-03-28 19:06 UTC, Steve Dickson
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:727 0 qe-ready SHIPPED_LIVE nfs-utils bug fix update 2005-10-05 04:00:00 UTC

Description Bernd Bartmann 2005-03-22 18:02:19 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
I did a fresh install of RHES4 on my server which formerly had RHES3
on it. My FC3 client mounts the home dirs via NFS from the server.
When trying to open a mbox mailbox with mutt now I see a message:

Waiting for fcntl-lock...

counting up from 1 to 4 and then the mailbox file is opened. With my
old RHES3 system the mbox files were opened immediately.

I report this bug against mutt but I think it is in fact caused by either the kernel or NFS server.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. try to open a mbox box from a NFS mounted filesystem
2.
3.
  

Additional info:

Comment 1 Bill Nottingham 2005-03-23 04:08:07 UTC
Is the lock daemon running?

Comment 2 Bernd Bartmann 2005-03-23 15:19:01 UTC
Indeed the nfslock daemon seems to be the cause of the problem.
service nfslock start gives [Ok] but the rpc.kstatd seems to die just instantly.
/var/lock/subsys/nfslock and /var/run/rpc.statd.pid both exist. I already tried
to remove both files and start nfslock again. Both files get created again but
rpc.statd still does not run. I also could not find any errors in /var/log/messages.

Comment 3 Bill Nottingham 2005-03-23 17:18:54 UTC
Reassinging thusly.

Comment 4 Bernd Bartmann 2005-03-23 19:16:58 UTC
Created attachment 112276 [details]
strace of manual rpc.statd start

Comment 5 Bernd Bartmann 2005-03-26 20:45:17 UTC
Created attachment 112363 [details]
output of strace rpm.statd -F -d

Comment 6 Steve Dickson 2005-03-28 12:51:44 UTC
hmm... it appears rpc.statd is seg faulting....

--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

Would it be possible to allow rpc.statd to 
drop a core (i.e. by setting ulimit -c unlimited)
and then going into gdb and to a backtrace to see
where this is happening?




Comment 7 Bernd Bartmann 2005-03-28 14:11:35 UTC
Weird, after "ulimit -c unlimited" and "rpc.statd -F -d" I'm still getting no
core file at all. The only messages I see are:

[root@picard ~]# rpc.statd -F -d
03/28/2005 16:10:36 rpc.statd[32117]: Version 1.0.6 Starting
03/28/2005 16:10:36 rpc.statd[32117]: Flags: No-Daemon Log-STDERR
03/28/2005 16:10:36 rpc.statd[32117]: New state: 37
Segmentation fault
[root@picard ~]# rpc.statd -F -d
03/28/2005 16:10:37 rpc.statd[32118]: Version 1.0.6 Starting
03/28/2005 16:10:37 rpc.statd[32118]: Flags: No-Daemon Log-STDERR
03/28/2005 16:10:37 rpc.statd[32118]: New state: 39
Segmentation fault
[root@picard ~]# rpc.statd -F -d
03/28/2005 16:10:38 rpc.statd[32119]: Version 1.0.6 Starting
03/28/2005 16:10:38 rpc.statd[32119]: Flags: No-Daemon Log-STDERR
03/28/2005 16:10:38 rpc.statd[32119]: New state: 41
Segmentation fault
[root@picard ~]# rpc.statd -F -d
03/28/2005 16:10:39 rpc.statd[32120]: Version 1.0.6 Starting
03/28/2005 16:10:39 rpc.statd[32120]: Flags: No-Daemon Log-STDERR
03/28/2005 16:10:39 rpc.statd[32120]: New state: 43
Segmentation fault

Notice that "New state:" always increases by +2.

Comment 8 Steve Dickson 2005-03-28 14:18:09 UTC
What happens when you start it up in gdb meaning:

gdb rpc.statd
gdb> run -F -d




Comment 9 Bernd Bartmann 2005-03-28 14:20:38 UTC
Here are the results:

[root@picard ~]# gdb rpc.statd
GNU gdb Red Hat Linux (6.1post-1.20040607.62rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols
found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) run -F -d
Starting program: /sbin/rpc.statd -F -d
(no debugging symbols found)...(no debugging symbols found)...(no debugging
symbols found)...(no debugging symbols found)...(no debugging symbols
found)...03/28/2005 16:19:32 rpc.statd[32315]: Version 1.0.6 Starting
03/28/2005 16:19:32 rpc.statd[32315]: Flags: No-Daemon Log-STDERR
03/28/2005 16:19:32 rpc.statd[32315]: New state: 45
(no debugging symbols found)...
Program received signal SIGSEGV, Segmentation fault.
0x00d8f195 in main () from /sbin/rpc.statd
(gdb)


Comment 10 Steve Dickson 2005-03-28 15:15:56 UTC
Try the rpc.statd.debug in http://people.redhat.com/steved/bz151828/

Comment 11 Bernd Bartmann 2005-03-28 15:25:40 UTC
Now with rpc.statd.debug:

[root@picard ~]# gdb rpc.statd.debug
GNU gdb Red Hat Linux (6.1post-1.20040607.62rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".

(gdb) run -F -d
Starting program: /root/rpc.statd.debug -F -d
03/28/2005 17:24:41 rpc.statd.debug[2398]: Version 1.0.6 Starting
03/28/2005 17:24:41 rpc.statd.debug[2398]: Flags: No-Daemon Log-STDERR
03/28/2005 17:24:41 rpc.statd.debug[2398]: New state: 49

Program received signal SIGSEGV, Segmentation fault.
0x0068f195 in process_entry (sockfd=7, lp=0x93509b8) at rmtcall.c:95
95      rmtcall.c: No such file or directory.
        in rmtcall.c
(gdb)


Comment 12 Steve Dickson 2005-03-28 15:47:05 UTC
Please try http://people.redhat.com/steved/bz151828/rpc.statd.1
and see if a message simlar to "reset_my_name: ifap->ifa_addr == NULL"
is logged in /var/log/messages



Comment 13 Bernd Bartmann 2005-03-28 16:17:28 UTC
Ok, at least it does not seg fault anymore. Even starting it with "service
nfslock start" works now, but the mailbox locking still does not work.

(gdb) run -F -d
Starting program: /root/rpc.statd.1 -F -d
03/28/2005 18:12:36 rpc.statd.1[4503]: Version 1.0.6 Starting
03/28/2005 18:12:36 rpc.statd.1[4503]: Flags: No-Daemon Log-STDERR
03/28/2005 18:12:36 rpc.statd.1[4503]: New state: 51
03/28/2005 18:12:36 rpc.statd.1[4503]: reset_my_name: ifap->ifa_addr == NULL
03/28/2005 18:12:36 rpc.statd.1[4503]: Waiting for reply... (timeo 5)
03/28/2005 18:12:36 rpc.statd.1[4503]: reset_my_name: ifap->ifa_addr == NULL
03/28/2005 18:12:36 rpc.statd.1[4503]: Waiting for reply... (timeo 5)
03/28/2005 18:12:37 rpc.statd.1[4503]: Notification of 192.168.25.2 succeeded.
03/28/2005 18:12:37 rpc.statd.1[4503]: Unlinked
/var/lib/nfs/statd/sm.bak/192.168.25.2
03/28/2005 18:12:37 rpc.statd.1[4503]: Waiting for client connections.


Comment 14 Bernd Bartmann 2005-03-28 16:31:54 UTC
I forgot to mention that I see the "reset_my_name: ifap->ifa_addr == NULL" only
when run as "rpc.statd -F -d" or in gdb but not when I start the service the
normal way with "service nfslock start". But I guess theses messages come from
the -d switch anyway.

Comment 15 Steve Dickson 2005-03-28 17:31:11 UTC
hmm... it appears there is an issue with one of your network interfaces.
Could you post the output of 'ifconfig -a' as well as runing rpc.statd.2
from http://people.redhat.com/steved/bz151828 which should log the 
interface its having a problem with.

Comment 16 Bernd Bartmann 2005-03-28 17:54:46 UTC
Created attachment 112391 [details]
output of ifconfig -a

Comment 17 Bernd Bartmann 2005-03-28 17:56:34 UTC
I don't see any log messages about a problematic network interface. Running in
gdb gives:

(gdb) run -F -d
Starting program: /root/rpc.statd.2 -F -d
03/28/2005 19:51:44 rpc.statd.2[8374]: Version 1.0.6 Starting
03/28/2005 19:51:44 rpc.statd.2[8374]: Flags: No-Daemon Log-STDERR
03/28/2005 19:51:44 rpc.statd.2[8374]: New state: 57
03/28/2005 19:51:44 rpc.statd.2[8374]: Waiting for client connections.


Comment 18 Steve Dickson 2005-03-28 19:06:25 UTC
Created attachment 112394 [details]
Patch to stop rpc.statd from seg faulting 

No messages does make sense since on the previous run rpc.statd
was able to notify 192.168.25.2 (i.e. "Notification of 192.168.25.2")
that it had rebooted (or in this case restarted). So the attached patch
should stop rpc.statd from dropping seg faulting

Now the reason the lock is still not working could be due to
https://bugzilla.redhat.com/beta/show_bug.cgi?id=150151

Comment 19 Bernd Bartmann 2005-03-28 19:30:58 UTC
So what's the conclusion so far? rpc.statd does not seg fault anymore, but
locking still does not work. How can we go to fix this problem. Do you need any
ethereal packet traces of the client - server communication?

Comment 20 Bernd Bartmann 2005-03-28 19:33:04 UTC
Another thing I forgot to ask: I guess your attached patch is already in
rpc.statd.2? Do you intend to create an updated rpm?

Comment 21 Bernd Bartmann 2005-03-28 20:08:31 UTC
After rebooting my RHES4 server and using rpc.statd.2 locking now suddenly works.
Thanks Steve for getting this problem fixed for me.

Comment 22 Steve Dickson 2005-03-29 11:45:59 UTC
No Thanks you for allowing to find this problem....
it was definitely a team effort... imho...

This is fixed in the nfs-utils-1.0.6-55 rpm which
maybe in the RHEL4 U1 release but can be found at
http://people.redhat.com/steved/bz151828

Comment 23 Bernd Bartmann 2005-03-29 15:22:15 UTC
Thanks for providing the updated rpm. This is also working fine for me. Shall we
close this bug now or wait until U1 is out?

Comment 24 Steve Dickson 2005-04-01 12:55:24 UTC
*** Bug 149201 has been marked as a duplicate of this bug. ***

Comment 29 Red Hat Bugzilla 2005-10-05 17:30:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-727.html


Comment 30 Red Hat Bugzilla 2005-10-05 17:44:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-727.html



Note You need to log in before you can comment on or make changes to this bug.