Created attachment 358735 [details] Info from gdb for four cores Description of problem: I experience random crashes of autofs that creates core files in /. I am using the following auto.master: /misc /etc/auto.misc /net -hosts +auto.master auto.master contains itself: #ypcat auto.master auto_apps -ro,intr,bg,nosuid auto_data -rw,intr,bg,nosuid,nobrowse auto_home -rw,intr,bg,nosuid,nobrowse auto_ftp -rw,intr,bg,nosuid,nobrowse auto_saf -ro,intr,bg,nosuid auto_direct -ro,intr,bg,nosuid,nobrowse and auto_home contains around 2500 lines of shares defined. I constantly using my home directory and ocasionally a directory under /net Version-Release number of selected component (if applicable): autofs-5.0.4-36 How reproducible: It seems random. I cannot link the crashes with any event. More strange, most of the crashes occur at night, so I am not using the using the remote directories. Additional info: I gathered backtraces data for four cores using gdb. Please find them enclosed as attachments.
(In reply to comment #0) > Created an attachment (id=358735) [details] > Info from gdb for four cores > > Description of problem: > I experience random crashes of autofs that creates core files in /. > I am using the following auto.master: > > /misc /etc/auto.misc > /net -hosts > +auto.master > > auto.master contains itself: > #ypcat auto.master > auto_apps -ro,intr,bg,nosuid > auto_data -rw,intr,bg,nosuid,nobrowse > auto_home -rw,intr,bg,nosuid,nobrowse > auto_ftp -rw,intr,bg,nosuid,nobrowse > auto_saf -ro,intr,bg,nosuid > auto_direct -ro,intr,bg,nosuid,nobrowse > > and auto_home contains around 2500 lines of shares defined. > I constantly using my home directory and ocasionally a directory under /net > > Version-Release number of selected component (if applicable): > autofs-5.0.4-36 > > How reproducible: > It seems random. I cannot link the crashes with any event. More strange, most > of the crashes occur at night, so I am not using the using the remote > directories. Mmmm .... I can't see what's causing this. Could you get a debug log covering the time of one of these crashes please. Ian
Hi, here are messages from /var/log/messages just before some of the crashes: messages-20090816:Aug 11 18:25:45 ga014699 kernel: automount[17006] general protection ip:7fe838e1f411 sp:7fe81ddd39f8 error:0 in libc-2.10.1.so (deleted)[7fe838da0000+164000] messages-20090816:Aug 12 19:16:52 ga014699 automount[1787]: update_negative_cache: key "kde-devel" not found in map. messages-20090816:Aug 12 19:17:17 ga014699 kernel: automount[29456] general protection ip:7f55bbc62541 sp:7f55a88d2a28 error:0 in libc-2.10.1.so[7f55bbbe3000+164000] messages-20090816:Aug 14 23:29:57 ga014699 kernel: automount[32531]: segfault at 60 ip 00007fb0c98da541 sp 00007fb0aa8ee8a8 error 4 in libc-2.10.1.so[7fb0c985b000+164000] messages-20090823:Aug 18 03:37:20 ga014699 kernel: automount[5388] general protection ip:7f499cffb541 sp:7f4981bcf8a8 error:0 in libc-2.10.1.so[7f499cf7c000+164000] messages-20090823:Aug 20 08:19:03 ga014699 kernel: automount[22477]: segfault at 307262726970 ip 00007f5bb319e541 sp 00007f5ba81d98a8 error 4 in libc-2.10.1.so[7f5bb311f000+164000] messages-20090830:Aug 26 04:01:51 ga014699 kernel: automount[24865] general protection ip:7f37cca24541 sp:7f37c83f58a8 error:0 in libc-2.10.1.so[7f37cc9a5000+164000] messages-20090830:Aug 27 04:27:24 ga014699 kernel: automount[25240] general protection ip:7f6b0fcc7541 sp:7f6b0c5b28a8 error:0 in libc-2.10.1.so[7f6b0fc48000+164000] At least all of them seem to be at the same point.. There were some messages like: messages-20090823:Aug 17 11:35:33 ga014699 automount[5680]: update_negative_cache: key ".directory" not found in map. messages-20090823:Aug 17 12:05:36 ga014699 automount[5680]: update_negative_cache: key "buildmeister" not found in map. in the logs, but not before the crashes.
How about the debug log? See http://people.redhat.com/jmoyer for info on how to get one. Ian
(In reply to comment #3) > How about the debug log? > See http://people.redhat.com/jmoyer for info on how to get one. And could you also get me a gdb backtrace of all of the threads from one of these cores. Use "thr a a bt". Ian
Hi, I include a debug log and a backtrace of all the threads for the last core generated (it was running two weeks without crashing but after an upgrade of F11 it just crashed from the beginning) Debug log: Sep 14 11:47:17 host automount[1715]: Starting automounter version 5.0.4-38, master map auto.master Sep 14 11:47:17 host automount[1715]: using kernel protocol version 5.01 Sep 14 11:47:17 host automount[1715]: lookup_nss_read_master: reading master files auto.master Sep 14 11:47:17 host automount[1715]: parse_init: parse(sun): init gathered global options: (null) Sep 14 11:47:17 host automount[1715]: lookup_read_master: lookup(file): read entry /misc Sep 14 11:47:17 host automount[1715]: lookup_read_master: lookup(file): read entry /net Sep 14 11:47:17 host automount[1715]: lookup_read_master: lookup(file): read entry +auto.master Sep 14 11:47:17 host automount[1715]: lookup_nss_read_master: reading master files auto.master Sep 14 11:47:17 host automount[1715]: parse_init: parse(sun): init gathered global options: (null) Sep 14 11:47:17 host automount[1715]: lookup_nss_read_master: reading master nis auto.master Sep 14 11:47:17 host automount[1715]: parse_init: parse(sun): init gathered global options: (null) Sep 14 11:47:17 host automount[1715]: master_do_mount: mounting /misc Sep 14 11:47:17 host automount[1715]: automount_path_to_fifo: fifo name /var/run/autofs.fifo-misc Sep 14 11:47:17 host automount[1715]: lookup_nss_read_map: reading map file /etc/auto.misc Sep 14 11:47:17 host automount[1715]: parse_init: parse(sun): init gathered global options: (null) Sep 14 11:47:17 host automount[1715]: mounted indirect on /misc with timeout 300, freq 75 seconds Sep 14 11:47:17 host automount[1715]: st_ready: st_ready(): state = 0 path /misc Sep 14 11:47:17 host automount[1715]: master_do_mount: mounting /net Sep 14 11:47:17 host automount[1715]: automount_path_to_fifo: fifo name /var/run/autofs.fifo-net Sep 14 11:47:17 host automount[1715]: lookup_nss_read_map: reading map hosts (null) Sep 14 11:47:17 host automount[1715]: parse_init: parse(sun): init gathered global options: (null) Sep 14 11:47:17 host automount[1715]: mounted indirect on /net with timeout 300, freq 75 seconds Sep 14 11:47:17 host automount[1715]: st_ready: st_ready(): state = 0 path /net Sep 14 11:47:17 host automount[1715]: master_do_mount: mounting /apps Sep 14 11:47:17 host automount[1715]: automount_path_to_fifo: fifo name /var/run/autofs.fifo-apps Sep 14 11:47:17 host automount[1715]: lookup_nss_read_map: reading map files auto_apps Sep 14 11:47:17 host automount[1715]: file map /etc/auto_apps not found Sep 14 11:47:17 host automount[1715]: lookup_nss_read_map: reading map nis auto_apps Sep 14 11:47:17 host automount[1715]: parse_init: parse(sun): init gathered global options: ro,intr,bg,nosuid Sep 14 11:47:17 host automount[1715]: mounted indirect on /apps with timeout 300, freq 75 seconds Sep 14 11:47:17 host automount[1715]: st_ready: st_ready(): state = 0 path /apps Sep 14 11:47:17 host automount[1715]: master_do_mount: mounting /data Sep 14 11:47:17 host automount[1715]: automount_path_to_fifo: fifo name /var/run/autofs.fifo-data Sep 14 11:47:17 host automount[1715]: lookup_nss_read_map: reading map files auto_data Sep 14 11:47:17 host automount[1715]: file map /etc/auto_data not found Sep 14 11:47:17 host automount[1715]: lookup_nss_read_map: reading map nis auto_data Backtrace: ---------- (gdb) thr a a bt Thread 7 (Thread 1724): #0 0x00007f02d4e650d3 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f02d636c738 in get_pkt (pkt=<value optimized out>, ap=<value optimized out>) at automount.c:889 #2 handle_packet (pkt=<value optimized out>, ap=<value optimized out>) at automount.c:1026 #3 0x00007f02d636e022 in handle_mounts (arg=0x7fff58aa2b70) at automount.c:1555 #4 0x00007f02d5f2c86a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #5 0x00007f02d4e6e3bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Current language: auto; currently asm Thread 6 (Thread 1720): #0 0x00007f02d4e650d3 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f02d636c738 in get_pkt (pkt=<value optimized out>, ap=<value optimized out>) at automount.c:889 #2 handle_packet (pkt=<value optimized out>, ap=<value optimized out>) at automount.c:1026 #3 0x00007f02d636e022 in handle_mounts (arg=0x7fff58aa2b70) at automount.c:1555 #4 0x00007f02d5f2c86a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #5 0x00007f02d4e6e3bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Current language: auto; currently minimal Thread 5 (Thread 1717): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:220 #1 0x00007f02d6378a22 in st_queue_handler (arg=<value optimized out>) at state.c:1117 #2 0x00007f02d5f2c86a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #3 0x00007f02d4e6e3bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 4 (Thread 1716): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:220 #1 0x00007f02d6380c4a in alarm_handler (arg=<value optimized out>) at alarm.c:203 #2 0x00007f02d5f2c86a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #3 0x00007f02d4e6e3bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Current language: auto; currently asm Thread 3 (Thread 1723): #0 0x00007f02d4e650d3 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f02d636c738 in get_pkt (pkt=<value optimized out>, ap=<value optimized out>) at automount.c:889 #2 handle_packet (pkt=<value optimized out>, ap=<value optimized out>) at automount.c:1026 #3 0x00007f02d636e022 in handle_mounts (arg=0x7fff58aa2b70) at automount.c:1555 #4 0x00007f02d5f2c86a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #5 0x00007f02d4e6e3bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 2 (Thread 1715): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:261 #1 0x00007f02d63838c3 in master_do_mount (entry=<value optimized out>) at master.c:1012 #2 master_mount_mounts (entry=<value optimized out>) at master.c:1153 #3 0x00007f02d6383af0 in master_read_master (master=0x7f02d8487060, age=1252921637, readall=0) at master.c:815 #4 0x00007f02d636d842 in main (argc=<value optimized out>, argv=<value optimized out>) at automount.c:2126 Current language: auto; currently minimal ---Type <return> to continue, or q <return> to quit--- Thread 1 (Thread 1725): #0 strlen () at ../sysdeps/x86_64/strlen.S:31 #1 0x00007f02d5d1a9f7 in xdr_string (xdrs=0x7f02bc002aa8, cpp=0xffffffff, maxsize=4294967295) at xdr.c:673 #2 0x00007f02d5d0dafa in clnt_vc_call (cl=<value optimized out>, proc=<value optimized out>, xdr_args=<value optimized out>, args_ptr=<value optimized out>, xdr_results=<value optimized out>, results_ptr=<value optimized out>, timeout={tv_sec = 25, tv_usec = 0}) at clnt_vc.c:367 #3 0x00007f02d4b7ae70 in yp_bind_ypbindprog (domain=0x617461 <Address 0x617461 out of bounds>, ysd=0x7f02bc000a80) at ypclnt.c:143 #4 0x00007f02d4b7b89f in do_ypcall (domain=0x617461 <Address 0x617461 out of bounds>, prog=<value optimized out>, xargs=<value optimized out>, req=<value optimized out>, xres=0x7f02d4b7a4a0 <*__GI_xdr_ypresp_val>, resp=0x7f02d1edfb50 "") at ypclnt.c:363 #5 0x00007f02d4b7c405 in do_ypcall_tr (resp=<value optimized out>, xres=<value optimized out>, req=<value optimized out>, xargs=<value optimized out>, prog=<value optimized out>, domain=<value optimized out>) at ypclnt.c:384 #6 yp_match (resp=<value optimized out>, xres=<value optimized out>, req=<value optimized out>, xargs=<value optimized out>, prog=<value optimized out>, domain=<value optimized out>) at ypclnt.c:466 #7 0x00007f02d1ee798e in get_map_order (domain=0x7f02d4d8d380 "eso", map=<value optimized out>) at lookup_yp.c:77 #8 0x00007f02d1ee8900 in lookup_init (mapfmt=0x0, argc=5, argv=<value optimized out>, context=0x7f02bc000a10) at lookup_yp.c:140 #9 0x00007f02d637573a in open_lookup (name=0x7f02bc0008e0 "nis", err_prefix=0x7f02d638b9d9 "", mapfmt=<value optimized out>, argc=<value optimized out>, argv=0x7f02bc000900) at module.c:117 #10 0x00007f02d637708b in do_read_map (ap=0x7f02d84af710, map=0x7f02bc000b70, age=1252921637) at lookup.c:269 #11 0x00007f02d63774e9 in read_source_instance (age=<value optimized out>, type=<value optimized out>, map=<value optimized out>, ap=<value optimized out>) at lookup.c:359 #12 read_map_source (age=<value optimized out>, type=<value optimized out>, map=<value optimized out>, ap=<value optimized out>) at lookup.c:378 #13 lookup_nss_read_map (age=<value optimized out>, type=<value optimized out>, map=<value optimized out>, ap=<value optimized out>) at lookup.c:518 #14 0x00007f02d6370071 in mount_autofs_indirect (ap=0x7f02d84af710, root=0x7f02bc0008c0 "/data") at indirect.c:207 #15 0x00007f02d636df9c in mount_autofs (root=<value optimized out>, ap=<value optimized out>) at automount.c:1012 #16 handle_mounts (root=<value optimized out>, ap=<value optimized out>) at automount.c:1530 #17 0x00007f02d5f2c86a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #18 0x00007f02d4e6e3bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #19 0x0000000000000000 in ?? ()
I've looked back at this a few times now and we've had another report of it. While going through our autofs bugs I noticed another bug that, although apparently unrelated, has made me wonder if there is a problem with our build system. The thing that concerns me is stack entry #2 of thread 1 in the trace above. When I first looked at the trace I noticed that the function signature was different to what it should be. Initially I thought it was just a gdb inaccuracy but in another bug which reported unexplained strange behaviour, building autofs locally from its source rpm resolved the issue. So I'm wondering if we have some sort of library function signature mismatch between an update that is present on systems and our build system environment. The code in this area of autofs hasn't changed much at all for a long time so that makes it all that much more suspicious. Could you try building and installing the autofs source rpm locally and see if that makes a difference please. Ian
Created attachment 364753 [details] glibc patch demonstrating the cause of this problem This patch to glibc makes the problem here go away. It is not a solution to the reported crash issue, at least not a complete one anyway. It does however provide an example to for the following problem description.
The problem we are seeing in this bug appears to be caused
(In reply to comment #8) > The problem we are seeing in this bug appears to be caused Oops, incomplete entry. Anyway, the problem here appears to be caused by conflicting RPC calls within the NIS code of glibc, when an application is linked against libtirpc. I was easily able to reproduce the problem by stopping the NIS server process and performing a few automount lookups and I haven't been able to reproduce it again with the above patch. Note that the problem doesn't seem to happen when the server is actually down (or unplugged) but I may be mistaken about that, I just didn't see it in testing. The patch shows the only call I could find that is problematic. It specifically uses the glibc RPC infrastructure where all other calls are directed at the dynamically linked libtirpc. Although the RPC CLIENT structure is quite similar the private data field (cl_private) points to a fairly different structure. I believe this is the main cause of the occasional crash. But I don't think this patch alone is sufficient to fully resolve the issue. For example I cannot see any handling of the new (well reasonably new) close-on-exec functionality in libtirpc. Which obviously means problems already solved by this feature will still be present in applications using libtirpc. Can I have everyone's thoughts on how to deal with this please. Ian
We have to extend libtirpc for Linux to match the libc implementation. The alternative is to provide a separate libnsl implementation and perhaps others. It should be trivial to provide at least the little __libc_clntudp_bufcreate function. In fact, clntudp_bufcreate is a little wrapper around __libc_clntudp_bufcreate. This change is easy enough to maintain as a patch going forward. And the change is good security-wise. We want all possible callers to use O_CLOEXEC. Steve, the additional parameter is ORed into the parameter to socket calls so that we can OR in SOCK_CLOEXEC. That's all.
We are seeing this sort of thing in Fedora 12, using automount with nis maps. eg kernel: automount[23845] general protection ip:7f675b3b5cd2 sp:7f67588bd7d8 error:0 in libc-2.11.so (deleted)[7f675b337000+16f000] Is this the same problem as was originally reported in this bug? If so, can someone give me an update on resolving this issue please. Its a bit of a killer in our environment. Thanks
Created attachment 378268 [details] Patch to add compat functions to libtirpc Try this untested patch. It compiles and the symbol is created but that's the extend of testing I've done. I don't know whether upstream is open to this type of change. It's only useful on systems with a socket() call that can handle SOCK_CLOEXEC. I think I added the appropriate backward compatibility support to handle old kernels. And certainly the existing libtirpc functions are unchanged. Only one additional extenal interface (__libc_clntudp_bufcreate) is added. The remaining changes are necessary to implement it. The changes are straightforward.
ok thanks. I'll give this a go tomorrow and see what happens.
Ulrich: Very many thanks. This patch seems to work well. Assuming it holds up ok over the next few days, is there a chance that this can make it into fedora-updates? Thanks again.
This patch works for me too.
Hi, I would like to try the patch. Has anybody created a patched version of the libtirpc rpm? If not, could you please briefly explain how to create it? Thank you very much in advance, Enrique
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This is apparently still an issue. Steve, can you please just take the patch I provided? We really need it for libtirpc to be a replacement.
*** Bug 554744 has been marked as a duplicate of this bug. ***
I confirm that the patch from comment #12 seems to solve the problem I described in bug #54744. We applied the patch from comment #12 to libtirpc-0.2.0-4 and created a new package. With this patched libtirpc-0.2.0-4 autofs didn't crash any more. In the meantime a update to libtirpc-0.2.1-1.fc12.x86_64 has appeared, and has overwritten our manual patched libtirpc-0.2.0-4 version. Now, after reboot, we have crashes of autofs again: May 1 16:09:50 localhost kernel: automount[22807]: segfault at 307262726970 ip 00007f4b031bd2a2 sp 00007f4b00353808 error 4 in libc-2.11.1.so[7f4b0313e000+16f000] May 1 18:52:45 localhost kernel: automount[24708] general protection ip:7fe1fe2ed2a2 sp:7fe1fde4a5d8 error:0 in libc-2.11.1.so[7fe1fe26e000+16f000] It seams that the patch was not applied upstream?
libtirpc-0.2.1-3.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/libtirpc-0.2.1-3.fc13
libtirpc-0.2.1-3.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update libtirpc'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/libtirpc-0.2.1-3.fc13
Also seeing this on Fedora 12.
libtirpc-0.2.1-3.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.
> libtirpc-0.2.1-3.fc13 has been pushed to the Fedora 13 stable repository. Any chance of pushing this to Fedora 12 also. The src.rpm from F13 builds on F12 and is running ok but having it available through the regular update channels would be nice. Thanks
I am now using Fedora 13 and I haven't experienced any problem so far. Thank you
Would also appreciate this for F12.
(In reply to comment #27) > Would also appreciate this for F12. The F13 src rpm builds and runs on F12. Been running it for 7 weeks now without problems.
Since this bug is closed, I've opend a new bug report against Fedora 12: Bug 621387
@ Ian: Thanks.