Description of problem: automount dumps core on a daily basis: $ file /core.690 /core.690: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'automount' $ file /core.* /core.1815: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'automount' /core.2671: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'automount' After $ service autofs restart everything is fine $ cat /etc/auto.master| grep -v '^#' /misc /etc/auto.misc /net /etc/auto.net /home yp:auto.home.nis -tcp +auto.master auto.master from yp has 10 entries, some direct and some indirect maps. Version-Release number of selected component (if applicable): autofs-5.0.1-0.rc1.6 kernel 2.6.17-1.2519.4.21.el5 Seen on i686 and x86_64.
(In reply to comment #0) > Description of problem: > > automount dumps core on a daily basis: > > $ file /core.690 > /core.690: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, > from 'automount' > > $ file /core.* > /core.1815: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, > from 'automount' > /core.2671: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, > from 'automount' > > > After > $ service autofs restart > everything is fine > > $ cat /etc/auto.master| grep -v '^#' > /misc /etc/auto.misc > /net /etc/auto.net > /home yp:auto.home.nis -tcp > +auto.master > > auto.master from yp has 10 entries, some direct and some indirect maps. Can you include your nis master map and output from syslog. > > Version-Release number of selected component (if applicable): > autofs-5.0.1-0.rc1.6 > kernel 2.6.17-1.2519.4.21.el5 > > Seen on i686 and x86_64. A debug log would be usefull for me to try and locate where this is happening. Add "--debug" to OPTIONS in /etc/sysconfig/autofs and ensure that syslog is sending daemon.* is being send to a log file. Also I would appreciate it if you could try the latest available version in Rawhide. Ian
> Also I would appreciate it if you could try the latest available > version in Rawhide. Seems like this version $ rpm -q autofs autofs-5.0.1-0.rc2.1 is better, no more core dumps from automount, however now I see core dumps from umount.nfs: $ ls -l /core.* -rw------- 1 root root 110592 Sep 30 07:42 /core.12821 -rw------- 1 root root 110592 Oct 3 02:58 /core.5092 -rw------- 1 root root 36139008 Sep 24 20:32 /core.690 -rw------- 1 root root 110592 Oct 1 16:58 /core.7371 $ file /core.* /core.12821: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'mount.nfs' /core.5092: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'umount.nfs' /core.690: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'automount' /core.7371: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'umount.nfs
(In reply to comment #2) > Seems like this version > > $ rpm -q autofs > autofs-5.0.1-0.rc2.1 We have autofs-5.0.1-0.rc2.4 in beta2. > > is better, no more core dumps from automount, however now I see core dumps from > umount.nfs: > > $ ls -l /core.* > -rw------- 1 root root 110592 Sep 30 07:42 /core.12821 > -rw------- 1 root root 110592 Oct 3 02:58 /core.5092 > -rw------- 1 root root 36139008 Sep 24 20:32 /core.690 > -rw------- 1 root root 110592 Oct 1 16:58 /core.7371 > > $ file /core.* > /core.12821: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, > from 'mount.nfs' > /core.5092: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, > from 'umount.nfs' > /core.690: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, > from 'automount' > /core.7371: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, > from 'umount.nfs There have been a couple of fixes with the latest revision of nfs-utils. What version are you using? It would be good to know if the latest is still a problem. Ian
(In reply to comment #3) > > > > is better, no more core dumps from automount, however now I see core dumps from > > umount.nfs: > > I just checked a recent RHEL5 test install I have and I don't see any cores. I've been running tests that mount/umount several hundred mounts over the last couple of days. util-linux-2.13-0.42.el5 nfs-utils-1.0.9-8.fc6 nfs-utils-lib-1.0.8-7.2 The later nfs-utils version was needed to resolve a problem with incorect return status from mount. Ian
Still observed in RHEL5 Beta2 :( This is actually impacting our testing since we use NFS heavily in one of our offices to get at files in our home directories and various data repositories.
In my case, I can test like this... (rhel5 beta2, ia64 in this case) - Reboot machine :) - ls ~erikj (which mounts my home directory over NFS) - Wait about 30 minutes - Then find I can't get to ~erikj any more - Then find a core in /.
I'm going to attach some files about how we have automount set up here. Also... Not that this is a huge help, but here is a paste from gdb: [root@minime1 /]# gdb --core=core.1889 GNU gdb Red Hat Linux (6.5-12.el5rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "ia64-redhat-linux-gnu". (no debugging symbols found) Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `automount'. Program terminated with signal 11, Segmentation fault. #0 0x20000008000dec60 in ?? ()
Created attachment 141502 [details] /etc/auto.master file This mainly just shows we get auto_master_linux from NIS.
Created attachment 141504 [details] auto_master_linux NIS map from ypcat -k This shows ypcat -k output from the map referenced in /etc/auto.master.
Created attachment 141505 [details] ypcat -k of auto_home This shows the auto_home NIS map. I choose this one because I referred to my home directory earlier.
Created attachment 141506 [details] /etc/sysconfig/autofs
Created attachment 141508 [details] Example core file (compressed Note that this file will expand to be like 500 megabytes or so. This is an ia64 core dump file.
(In reply to comment #12) > Created an attachment (id=141508) [edit] > Example core file (compressed > > Note that this file will expand to be like 500 megabytes or so. > > This is an ia64 core dump file. > Doesn't sound good. I see there's no backtrace and it looks like you don't have the autofs-debuginfo package installed. Getting some backtrace info would be good. And a debug log would also be good. I'm going to attach some instructions for what I'd like since downloading the core is likely to take a while. Ian
Created attachment 141510 [details] Debuginfo collection instructions I'll need to install the autofs version you are using to do anything with the core so if you have time it would be helpfull if you could post the backtrace information. What was that version again? And kernel version? Ian
(In reply to comment #9) > Created an attachment (id=141504) [edit] > auto_master_linux NIS map from ypcat -k > > This shows ypcat -k output from the map referenced in /etc/auto.master. Why do you like to use "soft" mounts?
(In reply to comment #14) > Created an attachment (id=141510) [edit] > Debuginfo collection instructions > > I'll need to install the autofs version you are using to > do anything with the core so if you have time it would be > helpfull if you could post the backtrace information. But I don't have ia64 system. Afraid I'll need that debug info.
> Why do you like to use "soft" mounts? I'm not the IT department, I just use what's in those maps :) I'm a victum :-)
> But I don't have ia64 system You can reserve one of the SGI (or non-SGI ones) out of the Westford lab if needed. I'll try to collect what you asked for here. -Erik
Created attachment 141523 [details] nsswitch.conf file per request Some other requested info so far... [root@minime1 sysconfig]# rpm -q autofs -q kernel autofs-5.0.1-0.rc2.15 kernel-2.6.18-1.2747.el5 [root@minime1 sysconfig]# uname -a Linux minime1 2.6.18-1.2747.el5 #1 SMP Thu Nov 9 18:56:16 EST 2006 ia64 ia64 ia64 GNU/Linux
Core was generated by `automount'. Program terminated with signal 11, Segmentation fault. #0 0x20000008000dec60 in pthread_barrier_init () from /lib/libpthread.so.0 (gdb) bt #0 0x20000008000dec60 in pthread_barrier_init () from /lib/libpthread.so.0 #1 0x200000080002c520 in st_queue_handler (arg=0x2000000800411800) at state.c:944 #2 0x20000008000d3190 in pthread_create@@GLIBC_2.2 () from /lib/libpthread.so.0 #3 0xc000000000000610 in ?? () #4 0x200000080002c520 in st_queue_handler (arg=0x200000000017f240) at state.c:944 Previous frame inner to this frame (corrupt stack?) (gdb) thr a a bt Thread 22 (process 1889): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008000e5810 in open64 () from /lib/libpthread.so.0 Previous frame inner to this frame (corrupt stack?) Thread 21 (process 1890): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008000dce80 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 Previous frame inner to this frame (corrupt stack?) Thread 20 (process 1894): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 19 (process 1897): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 18 (process 1898): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 17 (process 1899): ---Type <return> to continue, or q <return> to quit--- #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 16 (process 1900): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 15 (process 1901): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 14 (process 1904): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 13 (process 1905): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 12 (process 1906): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) ---Type <return> to continue, or q <return> to quit--- Thread 11 (process 1907): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 10 (process 1908): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 9 (process 1909): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 8 (process 1910): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 7 (process 1911): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 6 (process 1912): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) ---Type <return> to continue, or q <return> to quit--- Thread 5 (process 1913): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 4 (process 1914): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 3 (process 1915): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 2 (process 1916): #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000008002e9730 in fts_build () from /lib/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 1 (process 1891): #0 0x20000008000dec60 in pthread_barrier_init () from /lib/libpthread.so.0 #1 0x200000080002c520 in st_queue_handler (arg=0x2000000800411800) at state.c:944 #2 0x20000008000d3190 in pthread_create@@GLIBC_2.2 () from /lib/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #3 0xc000000000000610 in ?? () #4 0x200000080002c520 in st_queue_handler (arg=0x200000000017f240) at state.c:944 Previous frame inner to this frame (corrupt stack?)
Hello. I'm hoping the last add is what you needed. I went searching for debuginfo packages for RHEL5 Beta2 on the public server. I only see selected debuginfo packages as being available and autofs isn't one of them. I think I recall, on the internal RH network, that I can get to these - but I would have to get permission to export the package from RH to SGI since we're seeing the problem on the SGI side. Feel free to attach the appropriate debuginfo package and I'll install it. It seems very repeatable over here, so I can help test a fixed pacakge too.
Created attachment 141530 [details] automount dumped core, and thne I sent this file... It just dumped core again, but I had the logging going this time. Here is the log.
(In reply to comment #21) > Hello. I'm hoping the last add is what you needed. Afraid not. That stack corruption (real or apparent) can't be relied upon. I'll need to work out how to manually decode the stack call trace for ia64. > > I went searching for debuginfo packages for RHEL5 Beta2 on the public > server. I only see selected debuginfo packages as being available and > autofs isn't one of them. Is it accetable for you to build from a source rpm? Either I could provide a patch and you could add it and build it or I could provide an rpm with a patch included. This would be purely for possibility elimination testing and would only need to be run long enough to establish if there is any change. Although I doubt very much the stack of the task dispatcher has overflowed I'd like to eliminate this easily checked obvious possiblily first. > > It seems very repeatable over here, so I can help test a fixed pacakge too. That is the puzzle, maybe it is simply a stack overflow. The task dispatcher uses a small stack (64k). I'd like to check what happens if I increase that to 256k. The dispatcher doesn't actually do much and threads it launches have a much bigger stack (which probably should be smaller). Ian
I'm familiar with how SRPMS work and could add a patch and build if that would be easier for you. Or you could provide a binary RPM; whatever is best. Just be sure we're starting from the base SRPM :) I'm using RHEL5 Beta2 now, autofs-5.0.1-0.rc2.15. -Erik
(In reply to comment #24) > I'm familiar with how SRPMS work and could add a patch and build if that > would be easier for you. Or you could provide a binary RPM; whatever is best. That's great. > > Just be sure we're starting from the base SRPM :) I'm using RHEL5 Beta2 now, > autofs-5.0.1-0.rc2.15. -Erik I did some testing over the weekend and located a couple of memory access violations. I don't think that this will resolve the problem but one was a used after free type error with could be triggering a problem on your hardware. Ian
Created attachment 141618 [details] Fix illegal memory access in lookup_yp.c This patch applies cleanly against autofs-5.0.1-0.rc2.15. Please try it if you can get time. In the mean time I will continue with organizing access to ia64 hardware to try and duplicate the problem. Ian
Hello. I expanded the autofs SRPM and put your patch in place in the spec file. I confirmed the new was insteed applied by the build. I installed the base RPM and the debuginfo RPM. And now we wait :)
Created attachment 141661 [details] backtraces from another core dump It went boom a while ago. Here are backtraces on all threads from the core file. Not sure if it's very useful but here it is. I'll attach the daemon log again too.
Created attachment 141662 [details] debug log
(In reply to comment #28) > Created an attachment (id=141661) [edit] > backtraces from another core dump > > It went boom a while ago. Can you note the time of the core file and verify whether this happens before the attempted mount or as a result of it. So far I've not been able to duplicate this on ia64. Ian
The test machine I was using got erased; I'm going to build the RPM again with your patch (need to build it anyway to get debuginfo installed)... I'll then get the timing for the core dump and such. More in a bit.
Created attachment 141830 [details] gdb output (that probably isn't useful) Ok, back how things were... rhel5 beta2 The autofs rpm has the patch from comment 26. [root@minime1 /]# rpm -q autofs autofs-debuginfo kernel autofs-5.0.1-0.rc2.15erikj autofs-debuginfo-5.0.1-0.rc2.15erikj kernel-2.6.18-1.2747.el5 Turned debug mode on, enabled collection of the daemon.debug syslog stuffs... I restarted autofs. A while later, boom. [root@minime1 /]# ls -l /core* -rw------- 1 root root 507248640 Nov 21 13:31 /core.1904 I'll attach the daemon log in a moment. Attached here is the gdb output again - from the latest event.
Created attachment 141833 [details] daemon.debug log output If I understand the question right... Here is the time info for the core file - 13:31:22 [root@minime1 ia64]# stat /core.1904 File: `/core.1904' Size: 507248640 Blocks: 4824 IO Block: 16384 regular file Device: 815h/2069d Inode: 156052 Links: 1 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2006-11-21 15:20:34.000000000 -0600 Modify: 2006-11-21 13:31:22.000000000 -0600 Change: 2006-11-21 13:31:22.000000000 -0600 But I'm not sure the granularity was enough to figure out exactly what it was paired with in the daemon debug file: Nov 21 13:31:21 minime1 automount[1904]: expire_cleanup: got thid 2305843009746104896 path /data/lwork stat 2 Nov 21 13:31:21 minime1 automount[1904]: expire_cleanup: sigchld: exp 2305843009746104896 finished, switching from 2 to 1 Nov 21 13:31:21 minime1 automount[1904]: st_ready: st_ready(): state = 2 path /data/lwork Nov 21 13:31:22 minime1 automount[1904]: st_expire: state 1 path /data/eagan Nov 21 13:31:22 minime1 automount[1904]: expire_proc: exp_proc = 2305843009746104896 path /data/eagan Nov 21 13:31:22 minime1 automount[1904]: expire_cleanup: got thid 2305843009746104896 path /data/eagan stat 0 Nov 21 13:31:22 minime1 automount[1904]: expire_cleanup: sigchld: exp 2305843009746104896 finished, switching from 2 to 1 Nov 21 13:31:22 minime1 automount[1904]: st_ready: st_ready(): state = 2 path /data/eagan Nov 21 13:31:22 minime1 automount[1904]: expire_proc_indirect: 1 remaining in /home Nov 21 13:31:22 minime1 automount[1904]: mount still busy /home Nov 21 13:31:22 minime1 automount[1904]: expire_cleanup: got thid 2305843009720398400 path /home stat 2 Nov 21 13:31:22 minime1 automount[1904]: expire_cleanup: sigchld: exp 2305843009720398400 finished, switching from 2 to 1 Nov 21 13:31:22 minime1 automount[1904]: st_ready: st_ready(): state = 2 path /home Nov 21 13:31:34 minime1 dhclient: DHCPREQUEST on eth0 to 128.162.243.246 port 67
Created attachment 141845 [details] strace output I booted with selinux=0 to disable selinux. That's because selinux prevented me from attaching to the automount process. Shortly after automount started, I started strace like this: strace -o /tmp/automount-strace-out -f -p 2531 After it dumped core, I made this attachment of automount-strace-out in case it's helpful some how. Note that I had to kill -9 the strace process that was attached to what had become a defunct process. I'm not sure if it's helpful :(
I *think* this is unrelated. I filed it in an SGI bug report for us to look in to and possibly file a rhel5 bug. I panicked the system trying to use gdb against the automount process (and threads). Here is the text of the sgi bug report. Hopefully unrelated as I said but I wanted to put it here in the interest of full disclosure. This was observed with RHEL5 Beta2. What was happening? I was trying to debug the automount core dump problem. So I used gdb to attacah to the automount process. It went along fine for some time. Then the system went boom. I wonder to myself if the system went boom at the same point the automount task would have seg faulted if I didn't have gdb attached. [root@minime1 ~]# kernel BUG at kernel/exit.c:76! automount[2926]: bugcheck! 0 [1] Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 vfat fat dm_mirror dm_mod button parport_pc lp parport mca_recovery ide_cd cdrom tg3 sg mptsas scsi_transport_sas mptscsih mptbase sata_vsc libata qla1280 sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 2926, CPU 1, comm: automount psr : 00001010085a2010 ifs : 800000000000038a ip : [<a00000010007df60>] Not tainted ip is at release_task+0x140/0x7e0 unat: 0000000000000000 pfs : 000000000000038a rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : 6516606a9965a565 ldrs: 0000000000000000 ccv : 0000000000000010 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a00000010007df60 b6 : a0000001004d7840 b7 : a0000001003c88e0 f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db r1 : a000000100bef1f0 r2 : a000000100a06750 r3 : a000000100938860 r8 : 0000000000000023 r9 : 0000000000000026 r10 : a000000100a06780 r11 : a000000100a06780 r12 : e00000301c2efe00 r13 : e00000301c2e8000 r14 : a000000100a06750 r15 : 0000000000000000 r16 : a000000100938868 r17 : e000003071ae7e18 r18 : 0000000000000000 r19 : e0000030031171e3 r20 : 0000000000000000 r21 : a0000001009ef820 r22 : e000003003120000 r23 : a000000100845200 r24 : a0000001009ef820 r25 : a000000100a06758 r26 : a000000100a06758 r27 : 0000000000000000 r28 : 0000000000000026 r29 : 80000001fdc00000 r30 : 0000000000000000 r31 : 0000000000000000 Call Trace: [<a000000100014140>] show_stack+0x40/0xa0 sp=e00000301c2ef990 bsp=e00000301c2e9358 [<a000000100014a40>] show_regs+0x840/0x880 sp=e00000301c2efb60 bsp=e00000301c2e9300 [<a000000100037c60>] die+0x1c0/0x2c0 sp=e00000301c2efb60 bsp=e00000301c2e92b8 [<a000000100037db0>] die_if_kernel+0x50/0x80 sp=e00000301c2efb80 bsp=e00000301c2e9288 [<a0000001006147f0>] ia64_bad_break+0x270/0x4a0 sp=e00000301c2efb80 bsp=e00000301c2e9260 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280 sp=e00000301c2efc30 bsp=e00000301c2e9260 [<a00000010007df60>] release_task+0x140/0x7e0 sp=e00000301c2efe00 bsp=e00000301c2e9210 [<a0000001000ef020>] check_noreap+0xa0/0x160 sp=e00000301c2efe00 bsp=e00000301c2e91d0 [<a0000001000f15d0>] utrace_report_death+0x650/0x680 sp=e00000301c2efe00 bsp=e00000301c2e9178 [<a000000100081b50>] do_exit+0x1330/0x14a0 sp=e00000301c2efe10 bsp=e00000301c2e9120 [<a000000100081e80>] sys_exit+0x20/0x40 sp=e00000301c2efe30 bsp=e00000301c2e90c8 [<a00000010000c490>] __ia64_trace_syscall+0xd0/0x110 sp=e00000301c2efe30 bsp=e00000301c2e90c8 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e00000301c2f0000 bsp=e00000301c2e90c8 <0>Kernel panic - not syncing: Fatal exception
It just happened again (kernel panic tyring to use gdb to debug the running automount process). Similar kernel backtrace.
(In reply to comment #33) > Created an attachment (id=141830) [edit] > gdb output (that probably isn't useful) No, this is much better. gdb has been able to decode the stack trace so I'm able to make sence of it and have some confidence in it. It's consistent with the previous cores in that it says that autofs crashed while checking for the existence of a thread when trying to identify completed tasks (they are typically expires). I checked this part of the code following your first core but it looked ok. I have seen this type of problem before with the dispatcher but I'm puzzled as to why it appears to be happening when a seemingly unrelated action, a mount request, comes in (although we can't yet be sure that it is coincident with the mount request). Ian
Created attachment 141877 [details] Patch to remove need to call pthread_kill when checking for task done. This assumes there is a problem with detached thread id reuse, possibly related to order of execution imposed by scheduling and the pthread library is unable to handle a call to ptherad_kill during thread setup. Again, as I can't reproduce the problem, this is just a guess as to what might be happening so please give it a try (also please continue using the first patch). Ian
Sorry to report that on ia64 I'm having trouble building with that patch applied. Here is some proof that the patch applied (21 and 22 are from you) Patch #20 (autofs-5.0.1-rc2-numeric-ldap-host-name.patch): + patch -p1 -s + echo 'Patch #21 (autofs-fixup-from-rh):' Patch #21 (autofs-fixup-from-rh): + patch -p1 -s + echo 'Patch #22 (no-pthread-kill-from-rh):' Patch #22 (no-pthread-kill-from-rh): + patch -p1 -s + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.87964 Here are lines leading up to the build failure. Would it be helpful if I reserved one of the SGI ia64 systems in Westford and tried to get RHEL5 Beta2 installed on it? I can't promise I can duplicate the problem there, but I can try that too. That might let you fly less blind :) gcc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -D_REENTRANT -D_REENTRANT -rdynamic -fPIE -D_GNU_SOURCE -I../include -DAUTOFS_LIB_DIR=\"/usr/lib/autofs\" -DAUTOFS_MAP_DIR=\"/etc\" -DAUTOFS_CONF_DIR=\"/etc/sysconfig\" -DVERSION_STRING=\"5.0.1-0.rc2.15erikj2\" -c state.c state.c: In function 'st_set_done': state.c:57: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:83: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:97: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:201: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:217: warning: empty declaration state.c:229: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:253: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:317: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:340: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:347: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:429: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:451: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:511: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:541: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:571: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:596: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:621: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:640: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:739: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:796: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:841: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:856: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:877: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:997: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token state.c:1018: error: old-style parameter declarations in prototyped function definition state.c:1018: error: expected '{' at end of input make[1]: *** [state.o] Error 1 make[1]: Leaving directory `/usr/src/redhat/BUILD/autofs-5.0.1/daemon' make: *** [daemon] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.87964 (%build) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.87964 (%build)
altix3.lab.boston.redhat.com is available. I have caught it failing once already in my simple test case set up there. /etc/auto.master has a map reference to /etc/auto.test. /etc/auto.test just mounts a couple nfs directories under /test/t1, /test/t2, and /test/t3. /usr/src/redhat has autofs in it with your first patch but not your second applied. I have just restarted this version of autofs with the patch in place. There are a couple example cores in /, but some of them were with the original rhel5 beta2 automount process. The new version of autofs that has your patch applied was installed (along with debuginfo) Wed Nov 22 11:01:56 EST 2006 so any core in / older than that should associate with this install. With my small non-NIS test case on altix3, it doesn't seem to trigger as often but still does. I wish we could find an exact trigger. Please feel free to log in to this machine as root to test and debug. I hope this helps.
(In reply to comment #40) > Sorry to report that on ia64 I'm having trouble building with that patch > applied. > Arrgh .. I was sure I fixed that in the patch I posted. It's a missing ";" static void st_set_thid(struct autofs_point *, pthread_t); +static void st_set_done(struct autofs_point *ap) as you can see.
(In reply to comment #41) > altix3.lab.boston.redhat.com is available. I have caught it failing once > already in my simple test case set up there. Excellent. Good work. > > /etc/auto.master has a map reference to /etc/auto.test. > /etc/auto.test just mounts a couple nfs directories under > /test/t1, /test/t2, and /test/t3. > > /usr/src/redhat has autofs in it with your first patch but not your second > applied. I have just restarted this version of autofs with the patch > in place. > > There are a couple example cores in /, but some of them were with the original > rhel5 beta2 automount process. > > The new version of autofs that has your patch applied was installed (along > with debuginfo) Wed Nov 22 11:01:56 EST 2006 so any core in / older > than that should associate with this install. > > With my small non-NIS test case on altix3, it doesn't seem to trigger > as often but still does. I wish we could find an exact trigger. > > Please feel free to log in to this machine as root to test and debug. Certainly. > > I hope this helps. Yep. This helps a lot. Given that we know it fails I'm going to add the second patch and do a quick test with it. We can always take it out to get more info later. Ian
Hi Ian. The version of the RPMs you created on altix3 have run on an SGI machine for quite some time now and I'm fairly confident the new rpm fixes the issue. This being near a holiday, I'm not sure what exposure we'll get from the people doing the MPI regression testing but I've asked them to try the patched RPMs too. Erik
Created attachment 141964 [details] Patch to remove need to call pthread_kill when checking for task done (with correction).
(In reply to comment #44) > Hi Ian. The version of the RPMs you created on altix3 have run on an SGI > machine for quite some time now and I'm fairly confident the new rpm fixes > the issue. That sound promising. > > This being near a holiday, I'm not sure what exposure we'll get from the people > doing the MPI regression testing but I've asked them to try the patched RPMs > too. It needs some more work anyway. I'll do some verification (I'd hate to slowly growing task list) while we wait. Ian
Got a new core file on RHEL 5 Beta 2 x86_64 , autofs-5.0.1-0.rc2.15. Some gdb output (info threads) (I did not find any debuginfo package): #0 0x000055555556caec in add_source () from /usr/sbin/automount (gdb) list threads No symbol table is loaded. Use the "file" command. (gdb) info threads 18 process 7134 0x00002aaaaacd3b18 in do_sigwait () from /lib64/libpthread.so.0 17 process 7135 0x00002aaaaacd0607 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 16 process 7136 0x00002aaaaacd0607 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 15 process 7139 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 14 process 7142 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 13 process 7143 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 12 process 7144 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 11 process 7145 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 10 process 7146 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 9 process 7147 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 8 process 7148 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 7 process 7149 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 6 process 7150 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 5 process 7151 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 4 process 7395 0x00002aaaaacd0416 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 process 7398 0x00002aaaab3b1487 in mkdir () from /lib64/libc.so.6 2 process 7405 0x00002aaaab3cb3f8 in __lll_mutex_lock_wait () from /lib64/libc.so.6 * 1 process 7407 0x000055555556caec in add_source () from /usr/sbin/automount Some another bugs too: o /net don't work with -hosts, however /net with /etc/auto.net works. o direct mounts is here and then I really miss -null (/opt is direct mount in nis maps and I want that to be a local file system on Linux clients...) o man pages really need a refresher e.g. autofs(8): this is not correct: /etc/init.d/autofs status will display the current configuration and a list of currently running automount daemons.
Created attachment 141990 [details] gzipped core from comment #47
Created attachment 141992 [details] automount log from crash in comment #47 Lots of failed to get group info from getgrgid_r then: lookup_nss_read_map: can't to read name service switch config. segfault at 0000000000000008 rip 000055555556caec rsp 000000004202a940 error 4 BTW: $ cat /etc/nsswitch.conf | grep -v '#' | sed '/^$/d' passwd: files nis shadow: files nis group: files nis hosts: files nis dns bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files nis rpc: files services: files nis netgroup: files nis publickey: nisplus automount: files nis aliases: files nisplus
(In reply to comment #47) > Got a new core file on RHEL 5 Beta 2 x86_64 , autofs-5.0.1-0.rc2.15. > > Some gdb output (info threads) (I did not find any debuginfo package): > > #0 0x000055555556caec in add_source () from /usr/sbin/automount That's new, maybe. How about a backtrace? > (gdb) list threads > No symbol table is loaded. Use the "file" command. > (gdb) info threads snip ... > 2 process 7405 0x00002aaaab3cb3f8 in __lll_mutex_lock_wait () > from /lib64/libc.so.6 > * 1 process 7407 0x000055555556caec in add_source () from /usr/sbin/automount And no line number. It's probably the nsswitch parser locking bug. I'll prepare a patch. In fact it's probably better for me to update the RHEL5 package with the patch developed previously in this bug and include the nsswitch and macro table locking fixes. I was planning on doing that this week anyway. I'll sort that out tomorrow and post to the bug. > > Some another bugs too: > > o /net don't work with -hosts, however /net with /etc/auto.net works. No information. We'll have to work on this later. > > o direct mounts is here and then I really miss -null (/opt is direct mount > in nis maps and I want that to be a local file system on Linux clients...) Yep. I'm aware that the implementation in 0.rc2.15 is incorrect. I've fixed this but it's not quite finished yet, however, it does make autofs null entries work as as they do in Solaris. Once I've done the updates above I can prepare a temporary patch for you to use until the package is updated. Hopefully I'll be able to complete that tomorrow as well. > > o man pages really need a refresher e.g. autofs(8): this is not correct: > > /etc/init.d/autofs status > will display the current configuration and a list of currently running > automount daemons. Ooops! Ian
(In reply to comment #49) > Created an attachment (id=141992) [edit] > automount log from crash in comment #47 > > Lots of > failed to get group info from getgrgid_r These messages are a bit puzling. We'll try with the nsswitch parser locking patch first and see how that goes. How long had autofs been running before the trouble? Ian
(In reply to comment #49) > Created an attachment (id=141992) [edit] > automount log from crash in comment #47 > > Lots of > failed to get group info from getgrgid_r Oh .. another thing. Tell me your not running a 32 bit package on a 64 bit arch! That won't work at this stage. Ian
> It's probably the nsswitch parser locking bug. > I'll prepare a patch. > In fact it's probably better for me to update the RHEL5 package > with the patch developed previously in this bug and include the > nsswitch and macro table locking fixes. I was planning on doing > that this week anyway. Sounds great, I have no problems testing patches. >> Lots of >> failed to get group info from getgrgid_r >These messages are a bit puzling. >We'll try with the nsswitch parser locking patch first and >see how that goes. >How long had autofs been running before the trouble? Not long, 24 hours or something. However I believe the getgrgid_rm warnings are present right after startup. >Oh .. another thing. >Tell me your not running a 32 bit package on a 64 bit arch! >That won't work at this stage. Don't think so, as I have not done anything fancy, will check later.
Created attachment 142002 [details] Patch to fix nsswitch parser locking
Created attachment 142003 [details] Patch to fix macro table locking
Here are the two patches for nsswitch parser and macro table locking I mentioned. I recommend using the previous patches for the illegal memory access and the one two avoid the use of pthread_kill as well. Please give them a try.
Created attachment 142005 [details] Interim patch to fix null map handling semantics I haven't re-tested this patch. I had to make a few changes due to dependencies on other updates not in 0.rc2.15. Hopefully it will be OK. I'll check it tomorrow. Ian
> Interim patch to fix null map handling semantics Thanks, will test tomorrow. BTW: srpms with all 5 patches added available here: http://www.pvv.ntnu.no/~terjeros/rpms/autofs/
Some more testing with the 5 pathces add: good: o -null works. Thanks bad: o -hosts don't work (not important here) o more core dumps: $ gdb -c /core.6997 /usr/sbin/automount [snip] Reading symbols from /lib64/libnss_nis.so.2...done. Loaded symbols for /lib64/libnss_nis.so.2 Core was generated by `automount'. Program terminated with signal 11, Segmentation fault. #0 tree_free_mnt_tree (tree=0x5555557acec0) at mounts.c:495 495 p = p->next; (gdb) info threads 14 process 6997 0x00002aaaaacd3b18 in do_sigwait () from /lib64/libpthread.so.0 13 process 6998 0x00002aaaaacd0607 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 process 6999 0x00002aaaaacd0416 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 11 process 7002 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 10 process 7005 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 9 process 7006 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 8 process 7007 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 7 process 7008 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 6 process 7009 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 5 process 7010 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 4 process 7011 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 3 process 7012 0x00002aaaab3b6b96 in poll () from /lib64/libc.so.6 2 process 7013 0x00002aaaaacd2909 in __lll_mutex_unlock_wake () from /lib64/libpthread.so.0 * 1 process 7395 tree_free_mnt_tree (tree=0x5555557acec0) at mounts.c:495 (gdb) thread 1 [Switching to thread 1 (process 7395)]#0 tree_free_mnt_tree (tree=0x5555557acec0) at mounts.c:495 495 p = p->next; (gdb) bt #0 tree_free_mnt_tree (tree=0x5555557acec0) at mounts.c:495 #1 0x0000555555561e95 in expire_proc_direct (arg=<value optimized out>) at /usr/include/pthread.h:579 #2 0x00002aaaaaccc305 in start_thread () from /lib64/libpthread.so.0 #3 0x00002aaaab3bf66d in clone () from /lib64/libc.so.6 #4 0x0000000000000000 in ?? () (gdb) thread 2 [Switching to thread 2 (process 7013)]#0 0x00002aaaaacd2909 in __lll_mutex_unlock_wake () from /lib64/libpthread.so.0 (gdb) bt #0 0x00002aaaaacd2909 in __lll_mutex_unlock_wake () from /lib64/libpthread.so.0 #1 0x00002aaaaaccf9d9 in _L_mutex_unlock_59 () from /lib64/libpthread.so.0 #2 0x00002aaaaaccf69b in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0 #3 0x0000555555569624 in st_add_task (ap=0x5555557a9370, state=ST_EXPIRE) at state.c:758 #4 0x000055555555d8e6 in handle_mounts (arg=0x5555557a9370) at automount.c:626 #5 0x00002aaaaaccc305 in start_thread () from /lib64/libpthread.so.0 #6 0x00002aaaab3bf66d in clone () from /lib64/libc.so.6 #7 0x0000000000000000 in ?? ()
More good things: o the error msgs do_mount_indirect: failed to get group info from getgrgid_r is not here any more.
Created attachment 142056 [details] automount log from crash in comment #59 Logs from automount with debug option leading to crash.
(In reply to comment #59) > Some more testing with the 5 pathces add: > > good: > o -null works. Thanks Cool. Hopefully I'll be able to sort out the error check I need and add it to the package soon. > > bad: > o -hosts don't work (not important here) Yep. We'll get to it. > o more core dumps: > > $ gdb -c /core.6997 /usr/sbin/automount > > [snip] > > Reading symbols from /lib64/libnss_nis.so.2...done. > Loaded symbols for /lib64/libnss_nis.so.2 > Core was generated by `automount'. > Program terminated with signal 11, Segmentation fault. > #0 tree_free_mnt_tree (tree=0x5555557acec0) at mounts.c:495 > 495 p = p->next; Ha .. this has to be an error from altering the null map patch. I had to make some changes in this area. I'll check it out but I'll probably recommend using an updated package. Still haven't managed to do that sorry. Ian
Created attachment 142080 [details] Interim patch to fix null map handling semantics - fix This patch seems to fix the error I made with the interim patch above. Hopefully this will allow you to continue your testing and give me time to complete the null map patch and consolidate the current changes into the RHEL package and perform proper testing. Ian
(In reply to comment #62) > > bad: > > o -hosts don't work (not important here) > > Yep. We'll get to it. This is a bit of a second priority but can you give some more information on this please. A debug log would be good including startup of autofs and an attempt to access a server. Output of "showmount -e <server>" for a server you are having trouble with and its /etc/exports. Ian
Created attachment 142171 [details] logs as requested in comment #64 About automount log: first startup, then /net (with -hosts map) (failure) and last /net.program (with /etc/auto.net as map ) to the same host (works).
Hi. I'm now running with 6 patches to the base rhel5 beta2 version of autofs. Hopefully that's the right number :) I started with the SRPM with the rp22 version from altix3.lab.boston.redhat.com. That included autofs-fixup-from-rh and autofs-5.0.1-rc2-use-task-done.patch. From this bug, I added: autofs-nsswitch-parser-locking (comment 54) autofs-macro-table-locking-patch (comment 55) autofs-null-map-handling-try1-patch (comment 57) autofs-null-map-handling-try2-patch (comment 63) The problem I hit was already fixed but I'm now running with all these to be sure things keep chugging along over here. So far so good.
(In reply to comment #66) > Hi. I'm now running with 6 patches to the base rhel5 beta2 version of > autofs. > > Hopefully that's the right number :) Looks ok. > > I started with the SRPM with the rp22 version from altix3.lab.boston.redhat.com. > That included autofs-fixup-from-rh and autofs-5.0.1-rc2-use-task-done.patch. > > From this bug, I added: > autofs-nsswitch-parser-locking (comment 54) > autofs-macro-table-locking-patch (comment 55) These two would be the next segv you would see, good. > autofs-null-map-handling-try1-patch (comment 57) > autofs-null-map-handling-try2-patch (comment 63) And these of course if you need the "-null" option. The thing that remains with this patch is an error that I'm having a little trouble working out. Other than that it should provide the correct "-null" functionality. I have applied all the above patches, except the null map, to RHEL 5 cvs (revision 24). If we see further problems we'll deal with them as they arise. Thanks Ian
(In reply to comment #67) > The thing that remains with this patch is an error > that I'm having a little trouble working out. Other > than that it should provide the correct "-null" > functionality. That's error check, not error, sorry. Ian
(In reply to comment #65) > Created an attachment (id=142171) [edit] > logs as requested in comment #64 > > About automount log: > first startup, > then /net (with -hosts map) (failure) > and last /net.program (with /etc/auto.net as map ) to the same host (works). I think this may be a problem that I know about with the matching of a simple host name against FQDN in the export list. It may also be (as well) that autofs doesn't understand some of the Sun style export access control syntax. I was alerted to the problem recently and haven't yet started work on fixing it. When I start this work it may be best for us to open another bz specifically for it. We'll see. So I have to recommend using the old script, as you are doing, in the mean time. Ian
So far so good on our systems. I haven't been testing the null stuff, just making sure the changes are working together ok. I'll note that on 3 machines we upgraded autofs on, the /etc/auto.master was renamed to .rpmsave and _no_ auto.master file resulted - making it so autofs would fail to restart after upgrading the autofs rpm. Probably a separate bug. We just copied valid auto.master's back in to place and were back in business.
(In reply to comment #70) > So far so good on our systems. I haven't been testing the null stuff, just > making sure the changes are working together ok. > > I'll note that on 3 machines we upgraded autofs on, the /etc/auto.master > was renamed to .rpmsave and _no_ auto.master file resulted - making it > so autofs would fail to restart after upgrading the autofs rpm. > Wow .. that's no good. I've never seen that happen before and I update my autofs package a lot on serveral different installs. > Probably a separate bug. We just copied valid auto.master's back in to place > and were back in business. That would be best for tracking purposes. Ian
Yeah, it didn't always happen. I wonder to myself if it's the missingok in the spec file. At SGI, we have a management step in opening red hat bugs so I proposed it on the sgi side. I should have a RH bug for you tomorrow. Thanks!
(In reply to comment #71) > > I'll note that on 3 machines we upgraded autofs on, the /etc/auto.master > > was renamed to .rpmsave and _no_ auto.master file resulted - making it > > so autofs would fail to restart after upgrading the autofs rpm. > > > > Wow .. that's no good. > I've never seen that happen before and I update my autofs > package a lot on serveral different installs. I have also seen this, I believe it was on some FC5 systems in the autofs-4.1.4-32 or -33 update. It's of course very nasty.
(In reply to comment #69) > When I start this work it may be best for us to open > another bz specifically for it. We'll see. Ok. > So I have to recommend using the old script, as you are > doing, in the mean time. Yeah, running with patch from comment #63 and fix from bz #208244 and things seems very happy, (I even have DEFAULT_BROWSE_MODE="yes" now).
QE ack for RHEL5. Quite a mess, but we'll do some focused automount testing.
(In reply to comment #75) > QE ack for RHEL5. Quite a mess, but we'll do some focused automount testing. Of course, it's serious rewrite going from multi process to multi thread, and implementing lots of new features at the same time. However I am happy user now. Please test yp, ldap, files + different nfs servers: linux, solaris (9+10), hp-ux, aix, fredbsd, upd,tcp, rsize+wsize (large values values with tcp seems to be flakey/slow) and -hosts and -null options, this is not trivial stuff. Thanks to Ian and Jeffrey, things are getting much better!
(In reply to comment #76) > (In reply to comment #75) > > QE ack for RHEL5. Quite a mess, but we'll do some focused automount testing. > > Of course, it's serious rewrite going from multi process to > multi thread, and implementing lots of new features at the same time. Sure is, but when I look at the new features list it seems quite short considering the amount of change. Never the less our compatibility is much better now. > > However I am happy user now. That's great to hear. > > Please test yp, ldap, files + different nfs servers: linux, solaris (9+10), > hp-ux, aix, fredbsd, upd,tcp, rsize+wsize (large values values with tcp > seems to be flakey/slow) and -hosts and -null options, this is not trivial stuff. Our focus has been on compatibility with Solaris, the assumption bieng that if we achieve that then everything should work with other servers since the Solaris implementation is the expected standard behavior. The -null map semantics should now be correct in all cases as of autofs-5.0.1-0.rc2.27 in RHEL5. The -hosts export list access validation still needs more work for some of the Solaris specific options. Certainly the sources yp and files should be fine. They've had lots of exersise. LDAP should be fine as well but at this time it needs a change to the configuration to tell autofs to the rfc2307bis schema for Solaris servers. Ian
FYI - SGI has had good results with autofs in rhel5 rc snapshot2. thanks.
So far the updates that have been applied to autofs as a result of this investigation appear to have resolved the issues (including the corrections for "-null" map semantics). The issue raised in comment #72 has been raised as a separate bug in bz 217575. So I'm setting this bug to modified. Ian
A package has been built which should help the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you.