Description of problem: After booting, NFS starts properly with a rpc.mountd daemon running, but as soon as a remote RHEL4 system attempts NFS mount an exported filesystem rpc.mountd exits with a segfault and we are unable to mount any of the filesystems. Version-Release number of selected component (if applicable): RHEL4 Update 4 How reproducible: Consistent Steps to Reproduce: 1.Boot System A running RHEL4 Update4 2.Mount systema:/directory from a RHEL4 Update2 system 3.rpc.mountd segfaults on System A Actual results: Should have NFS mounted systema:/directory Expected results: No mount (RPC error) Additional info: kernel: rpc.mountd[5515]: segfault at 0000000000000000 rip 0000002a958fd560 rsp 0000007fbfffc648 error 4 from /var/log/messages
One additional tidbit of information. The exports file on this RHEL4 Update 4 system contains netgroups. If we remove the netgroups and just add individual node names rpc.mountd does NOT crash. As a test we added a netgroup that contained only two nodenames, and it still segfaults. So...it would appear the problem seems to lie in the inability of rpc.mountd to process netgroups. Versions of NFS packages loaded: nfs-utils-1.0.6-70.EL4 nfs-utils-lib-1.0.6-3 running kernel 2.6.9-42.0.2.ELsmp
We've found a workaround, so we can probably drop the severity level of this down to High. In /etc/nsswitch.conf, the default entry for netgroup is: netgroup: files nisplus nis if we change this to be: netgroup: nis then rpc.mountd will not segfault when we NFS mount a filesystem. I also thought I'd mention that I did, one time only (not reproduceable) get a segfault with exportfs during a reboot: exportfs[3923]: segfault at 0000000000000000 rip 0000003f0fc70560 rsp 0000007fbfffd888 error 4 which might be worth a look as well. Thanks!
Is selinux enabled?
No. selinux is disabled.
Would it be possible to install the nfs-utils-debuginfo patch and then start up rpc.mountd through gdb (i.e. gdb rpc.mountd) then type r at the gdb prompt... hopefully this will show where the segfault is happening...
Steve: Can you enlighten me on how to go about installing the nfs-utils-debuginfo patch and I'll give it a shot? Thanks!
Steve: After some google-searching I found and installed nfs-utils-debuginfo but it didn't give the info as hoped. Here's the output: gdb /usr/sbin/rpc.mountd GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found) Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) r Starting program: /usr/sbin/rpc.mountd (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) Detaching after fork from child process 6184. Program exited normally. (gdb) Then when I did an NFS mount, while the mount failed, there was no debug info in shell window where gdb rpc.mountd was running. The only thing that showed up was the segfault in messages file: Sep 26 08:56:10 xyz kernel: rpc.mountd[4315]: segfault at 0000000000000000 rip 0000002a958fd560 rsp 0000007fbfffca98 error 4 Anything else I should be doing in gdb to get traceback info? Other than installing the nfs-utils-debuginfo RPM, is there anything else I need to get debugger enabled?
Well congratulation! It appears you uncover another bug :-\ Im sure if you do a 'rpm -qli nfs-utils-debuginfo' it will show *no* files... Please grab the rpms out of http://people.redhat.com/steved/bz209121/ to see if the problem goes a way and if not, install the debuginfo from that directory and start up rpc.mountd using gdb...
Sorry, new version failed to solve the problem. What we have loaded: nfs-utils-debuginfo-1.0.6-72.EL4 nfs-utils-lib-1.0.6-3 nfs-utils-1.0.6-72.EL4 Rebooted, then killed rcp.mountd, and manually started it thusly: gdb /usr/sbin/rpc.mountd GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) r Starting program: /usr/sbin/rpc.mountd Detaching after fork from child process 6023. Program exited normally. (gdb) From another shell window logged into a remote system and tried (and failed) to mount filesystem on local machine. The following error appeared in messages: Oct 4 00:25:22 xyz kernel: rpc.mountd[6023]: segfault at 0000000000000000 rip 0000002a958fd560 rsp 0000007fbfffc318 error 4 Put netgroup configuration back to nis as the only entry in nsswitch.conf and restarted rpc.mountd and NFS mounts are working again. Am I starting the debug version of rcp.mountd incorrectly as I'm puzzled why I'm not getting any useful traceback information for you?
try using the -f flag when you start rpc.mount from the debugger.
Also try getting a core dump by setting core file size to unlimited (i.e. ulimit -c unlimited). Once you get the core file, using gdb, you should be able to get a backtrace...
Created attachment 138041 [details] Core file for segfault of rpc.mountd
I'm still unable to get a stack trace while running gdb /usr/sbin/rpc.mountd -f, but I did get a core dump, which I've attached.
Unfortunately, for some strange reason, I can't read that core So please try: gdb /usr/sbin/rpc.mountd core.10895 gdb> bt to see if you can get a backtrace...
Sorry, I deleted the core file so I've generated a new one (which I can send if interested). Here is the bt info from this latest core file. Core was generated by `/usr/sbin/rpc.mountd'. Program terminated with signal 11, Segmentation fault. Loaded symbols for /usr/sbin/rpc.mountd Reading symbols from /usr/lib64/libwrap.so.0...done. Loaded symbols for /usr/lib64/libwrap.so.0 Reading symbols from /lib64/libnsl.so.1...done. Loaded symbols for /lib64/libnsl.so.1 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 Reading symbols from /lib64/libnss_nisplus.so.2...done. Loaded symbols for /lib64/libnss_nisplus.so.2 Reading symbols from /lib64/libnss_nis.so.2...done. Loaded symbols for /lib64/libnss_nis.so.2 #0 0x0000002a958fd560 in strlen () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000002a958fd560 in strlen () from /lib64/tls/libc.so.6 #1 0x0000002a958fd2a6 in strdup () from /lib64/tls/libc.so.6 #2 0x0000002a9578111c in nis_list () from /lib64/libnsl.so.1 #3 0x0000002a95bd7ddb in _nss_nisplus_setnetgrent () from /lib64/libnss_nisplus.so.2 #4 0x0000002a9596d67c in innetgr () from /lib64/tls/libc.so.6 #5 0x000000552aab1bc2 in client_check (clp=Variable "clp" is not available. ) at client.c:368 #6 0x000000552aab1d69 in client_compose (addr=Variable "addr" is not available. ) at client.c:255 #7 0x000000552aaaf523 in auth_authenticate (what=0x552aab7c81 "mount", caller=0x552abc45b4, path=0x7fbfffdd50 "/home") at auth.c:83 #8 0x000000552aaae4a7 in get_rootfh (rqstp=Variable "rqstp" is not available. ) at mountd.c:302 #9 0x000000552aaae7f8 in mount_mnt_3_svc (rqstp=0x7fbffff030, path=0x7fbfffef18, res=0x7fbfffef20) at mountd.c:267 #10 0x000000552aab5b8f in rpc_dispatch (rqstp=0x7fbffff030, transp=0x552abc45a0, dtable=Variable "dtable" is not available. ) at rpcdispatch.c:53 #11 0x000000552aaaf331 in mount_dispatch (rqstp=0x7fbffff030, transp=0x552abc45a0) at mount_dispatch.c:81 #12 0x0000002a95976a36 in svc_getreq_common_internal () from /lib64/tls/libc.so.6 #13 0x0000002a959766cd in svc_getreqset_internal () from /lib64/tls/libc.so.6 #14 0x000000552aab0f1b in my_svc_run () at svc_run.c:86 #15 0x000000552aaaf06a in main (argc=Variable "argc" is not available. ) at mountd.c:636
Can you please also install glibc-debuginfo-2.3.4-2.25.x86_64.rpm and get the backtrace once again, so that we can see the exact arguments and source locations in the backtrace? Thanks.
After installing glibc-debuginfo and rebooting, here's the latest backtrace. gdb /usr/sbin/rpc.mountd core.6245 GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". Core was generated by `/usr/sbin/rpc.mountd --foreground'. Program terminated with signal 11, Segmentation fault. Loaded symbols for /usr/sbin/rpc.mountd Reading symbols from /usr/lib64/libwrap.so.0...done. Loaded symbols for /usr/lib64/libwrap.so.0 Reading symbols from /lib64/libnsl.so.1...Reading symbols from /usr/lib/debug/lib64/libnsl-2.3.4.so.debug...done. done. Loaded symbols for /lib64/libnsl.so.1 Reading symbols from /lib64/tls/libc.so.6...Reading symbols from /usr/lib/debug/lib64/tls/libc-2.3.4.so.debug...done. done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.3.4.so.debug...done. done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.3.4.so.debug...done. done. Loaded symbols for /lib64/libnss_files.so.2 Reading symbols from /lib64/libnss_nisplus.so.2...Reading symbols from /usr/lib/debug/lib64/libnss_nisplus-2.3.4.so.debug...done. done. Loaded symbols for /lib64/libnss_nisplus.so.2 Reading symbols from /lib64/libnss_nis.so.2...Reading symbols from /usr/lib/debug/lib64/libnss_nis-2.3.4.so.debug...done. done. Loaded symbols for /lib64/libnss_nis.so.2 #0 0x0000002a958fd560 in strlen () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000002a958fd560 in strlen () from /lib64/tls/libc.so.6 #1 0x0000002a958fd2a6 in strdup () from /lib64/tls/libc.so.6 #2 0x0000002a9578111c in *__GI_nis_list (name=Variable "name" is not available. ) at nis_table.c:250 #3 0x0000002a95bd7ddb in _nss_nisplus_setnetgrent (group=0x552abc8ff9 "SGI", netgrp=0x7fbfffcb50) at nss_nisplus/nisplus-netgrp.c:162 #4 0x0000002a9596d67c in *__GI_innetgr (netgroup=0x552abc8ff9 "SGI", host=0x552abccba0 "go.ca.boeing.com", user=0x0, domain=0x0) at getnetgrent_r.c:354 #5 0x000000552aab1bc2 in client_check (clp=Variable "clp" is not available. ) at client.c:368 #6 0x000000552aab1d69 in client_compose (addr=Variable "addr" is not available. ) at client.c:255 #7 0x000000552aaaf523 in auth_authenticate (what=0x552aab7c81 "mount", caller=0x552abc45b4, path=0x7fbfffdd10 "/home") at auth.c:83 #8 0x000000552aaae4a7 in get_rootfh (rqstp=Variable "rqstp" is not available. ) at mountd.c:302 #9 0x000000552aaae7f8 in mount_mnt_3_svc (rqstp=0x7fbfffeff0, path=0x7fbfffeed8, res=0x7fbfffeee0) at mountd.c:267 #10 0x000000552aab5b8f in rpc_dispatch (rqstp=0x7fbfffeff0, transp=0x552abc45a0, dtable=Variable "dtable" is not available. ) at rpcdispatch.c:53 #11 0x000000552aaaf331 in mount_dispatch (rqstp=0x7fbfffeff0, transp=0x552abc45a0) at mount_dispatch.c:81 #12 0x0000002a95976a36 in svc_getreq_common (fd=Variable "fd" is not available. ) at svc.c:465 #13 0x0000002a959766cd in svc_getreqset (readfds=Variable "readfds" is not available. ) at svc.c:376 #14 0x000000552aab0f1b in my_svc_run () at svc_run.c:86 #15 0x000000552aaaf06a in main (argc=Variable "argc" is not available. ) at mountd.c:636
Thanks, I managed to reproduce this with a simple: #include <netdb.h> int main (void) { innetgr ("baz", "foo.bar.com", 0, 0); return 0; } with netgroup: nisplus in /etc/nsswitch.conf and no NIS+ configured at all. Not sure if this is nis_list's fault (should check if nis_getnames (ibreq->ibr_name)[0] == NULL) or nis_getnames' fault yet.
If you don't have NIS+ configured, the best workaround would be to remove nisplus from netgroup entry in /etc/nsswitch.conf.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
*** Bug 208718 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0210.html