449041 – autofs spontaneously stops

Bug 449041 - autofs spontaneously stops

Summary: autofs spontaneously stops

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	autofs
Sub Component:
Version:	9
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Ian Kent
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-30 00:01 UTC by David
Modified:	2009-06-10 02:15 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-06-10 02:15:17 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
tail of /var/log/messages after sysctl -w kernel/sysrq=1; echo t > /proc/sysrq-trigger (235.45 KB, application/octet-stream) 2008-06-05 18:26 UTC, David	no flags	Details
contents of /var/log/debug (607.60 KB, text/plain) 2008-06-05 18:30 UTC, David	no flags	Details
the requested core dump? (246.61 KB, application/x-gzip) 2008-06-05 20:13 UTC, David	no flags	Details
getpwuid_r issue I saw recently (2.06 KB, text/plain) 2008-06-06 13:47 UTC, Ian Kent	no flags	Details
another core dump backtrace (5.34 KB, text/plain) 2008-06-12 16:59 UTC, David	no flags	Details
View All

Description David 2008-05-30 00:01:28 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008043010 Fedora/3.0-0.60.beta5.fc9 Firefox/3.0b5

Description of problem:
autofs service has stopped three times in about 3 days (twice today).


Version-Release number of selected component (if applicable):
autofs-5.0.3-15.i386

How reproducible:
Didn't try


Steps to Reproduce:
1.
2.
3.

Actual Results:


Expected Results:


Additional info:
auto.master looks like this:
/misc   /etc/auto.misc
/net    -hosts
/home                   auto_home       -rw,nosuid,intr
/projects               auto_projects   -rw,intr


auto_home nfs mounts /home directories on a netapp and a f8 nfs server.
auto_projects mounts /projects directories on a netapp.

Comment 1 Ian Kent 2008-05-30 02:43:23 UTC

We will need a debug log which includes the problem.
Please see http://people.redhat.com/jmoyer for directions.

Comment 2 David Jansen 2008-05-30 11:58:29 UTC

I have experienced similar problems, although in my case, autofs usually was
still there, just not mounting anything. A related issue, rpc.mountd appeared to
have crashed, but just restarting the nfs service, didn't bring autofs back.
I will turn on debugging information in autofs and see what shows up. So far,
without debugging information, the logs have nothing indicating a problem,
autofs just stops mounting anything.

Comment 3 David 2008-05-30 15:02:52 UTC

uname -r:
2.6.25.3-18.fc9.i686

nsswitch.conf:
passwd:     files ldap
shadow:     files ldap
group:      files ldap
hosts:      files dns
bootparams: nisplus [NOTFOUND=return] files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files
netgroup:   files ldap
publickey:  nisplus
automount:  files ldap
aliases:    files nisplus

There was no syslog.conf file... I created one.  I also added the daemon debug
line to rsyslog.conf.  You need to update the directions if syslog.conf is now
named rsyslog.conf.

/etc/sysconfig/autofs didn't have a "DEFAULT_LOGGING" line, so I added one.
I also uncommented the "LOGGING" line and made it "debug".

Comment 4 David 2008-06-05 18:24:06 UTC

#/etc/init.d/autofs status
automount is stopped
# ps auxwww | grep automount
root     19590  0.0  0.0   4120   692 pts/4    S+   11:20   0:00 grep automount

Comment 5 David 2008-06-05 18:26:35 UTC

Created attachment 308470 [details]
tail of /var/log/messages after sysctl -w kernel/sysrq=1; echo t > /proc/sysrq-trigger

Comment 6 David 2008-06-05 18:30:38 UTC

Created attachment 308471 [details]
contents of /var/log/debug

Comment 7 David 2008-06-05 19:26:12 UTC

Do you want this info from multiple events, or is the one enough?
It just died again, but I'll assume the first logs were enough for now.

Comment 8 Jeff Moyer 2008-06-05 19:33:14 UTC

Did the automount daemon die?  It looks as though processing just stopped, given
the debug log you posted.  Can you check to see if there are any core files on
your system?  Most likely they would be in the / directory.

Your sysrq-t shows three processes waiting for autofs, but that's not abnormal.
 If there are no core files, you'll have to set ulimit -c unlimited in the
autofs script and try again.  I think that's our best bet for the time being.

Thanks!

Comment 9 Jeff Moyer 2008-06-05 19:51:58 UTC

(In reply to comment #3)
> There was no syslog.conf file... I created one.  I also added the daemon debug
> line to rsyslog.conf.  You need to update the directions if syslog.conf is now
> named rsyslog.conf.

I've updated the documentation, thanks.  You only need to modify the
rsyslog.conf, since nothing will read the syslog.conf file.

> /etc/sysconfig/autofs didn't have a "DEFAULT_LOGGING" line, so I added one.
> I also uncommented the "LOGGING" line and made it "debug".

You only need to specify one or the other.  Older versions of v5 used the
DEFAULT_LOGGING name, newer version support either LOGGING or DEFAULT_LOGGING. 
It was changed as some users thought that naming configurable options "DEFAULT_"
was confusing.  I've also added this to the documentation.  Thanks for the feedback.

Comment 10 David 2008-06-05 20:13:46 UTC

Created attachment 308481 [details]
the requested core dump?

Comment 11 Jeff Moyer 2008-06-05 21:12:19 UTC

Great!  If you could, would you mind installing the i386 debuginfo package found
here:
  http://koji.fedoraproject.org/koji/buildinfo?buildID=50254

And then do:

gdb /usr/sbin/automount core.1845
 
and at the gdb prompt:

gdb> thr a a bt

and give us the output?

It's tough to cobble together a system that looks exactly like your in order to
get the debug data.  Sorry!  I'll keep trying if you're unable to do this.

Thanks!

Comment 12 David 2008-06-05 21:42:12 UTC

will "debuginfo-install autofs" work instead of getting it from that site?
I'd like to stick to one mechanism for getting debug info if possible.
Otherwise I'll grab it from the koji site.

Comment 13 David 2008-06-05 22:33:28 UTC

I tried debuginfo-install autofs ... here's the output from gdb.  Let
me know if there still isn't enough debug info.

Thread 8 (process 1846):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x00138ec2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/libpthread-2.8.so
#2  0xb7fdab44 in alarm_handler (arg=0x0) at alarm.c:204
#3  0x0013532f in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#4  0x0023927e in clone () from /lib/libc-2.8.so

Thread 7 (process 1847):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x00138ec2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/libpthread-2.8.so
#2  0xb7fd4893 in st_queue_handler (arg=0x0) at state.c:965
#3  0x0013532f in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#4  0x0023927e in clone () from /lib/libc-2.8.so

Thread 6 (process 1850):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x0022eac7 in __poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#2  0xb7fc6878 in handle_packet (ap=0xb82c6b68) at automount.c:909
#3  0xb7fc720b in handle_mounts (arg=0xb82c6b68) at automount.c:1542
#4  0x0013532f in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#5  0x0023927e in clone () from /lib/libc-2.8.so

Thread 5 (process 1853):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x0022eac7 in __poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#2  0xb7fc6878 in handle_packet (ap=0xb82c7120) at automount.c:909
#3  0xb7fc720b in handle_mounts (arg=0xb82c7120) at automount.c:1542
#4  0x0013532f in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#5  0x0023927e in clone () from /lib/libc-2.8.so

Thread 4 (process 1854):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x0022eac7 in __poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#2  0xb7fc6878 in handle_packet (ap=0xb82c7770) at automount.c:909
#3  0xb7fc720b in handle_mounts (arg=0xb82c7770) at automount.c:1542
#4  0x0013532f in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#5  0x0023927e in clone () from /lib/libc-2.8.so
---Type <return> to continue, or q <return> to quit---

Thread 3 (process 1855):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x0013b94c in __lll_unlock_wake () from /lib/libpthread-2.8.so
#2  0x00138c6d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread-2.8.so
#3  0xb7fc81b1 in handle_packet_missing_indirect (ap=0xb82c7da0, 
    pkt=0xb7c8911c) at indirect.c:918
#4  0xb7fc6c23 in handle_packet (ap=0xb82c7da0) at automount.c:1086
#5  0xb7fc720b in handle_mounts (arg=0xb82c7da0) at automount.c:1542
#6  0x0013532f in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#7  0x0023927e in clone () from /lib/libc-2.8.so

Thread 2 (process 1845):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x0013ce30 in __sigwait (set=<value optimized out>, 
    sig=<value optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:63
#2  0xb7fc5d84 in main (argc=0, argv=0xbfdf1bc8) at automount.c:1366

Thread 1 (process 19501):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x00185660 in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x00187028 in abort () at abort.c:88
#3  0x0017e57e in __assert_fail (assertion=<value optimized out>, 
    file=<value optimized out>, line=<value optimized out>, 
    function=<value optimized out>) at assert.c:78
#4  0x0051528a in ?? () from /usr/lib/libnss_ldap.so.2
#5  0x004fb729 in ?? () from /usr/lib/libnss_ldap.so.2
#6  0x004fbb1f in ?? () from /usr/lib/libnss_ldap.so.2
#7  0x004fbd02 in ?? () from /usr/lib/libnss_ldap.so.2
#8  0x004ef166 in ?? () from /usr/lib/libnss_ldap.so.2
#9  0x004ef2f8 in ?? () from /usr/lib/libnss_ldap.so.2
#10 0x004e055b in ?? () from /usr/lib/libnss_ldap.so.2
#11 0x004df798 in ?? () from /usr/lib/libnss_ldap.so.2
#12 0x004e02be in ?? () from /usr/lib/libnss_ldap.so.2
#13 0x004e0a87 in ?? () from /usr/lib/libnss_ldap.so.2
#14 0x004e1140 in _nss_ldap_getpwuid_r () from /usr/lib/libnss_ldap.so.2
#15 0x001f49c2 in __getpwuid_r (uid=<value optimized out>, 
    resbuf=<value optimized out>, buffer=<value optimized out>, 
    buflen=<value optimized out>, result=<value optimized out>)
    at ../nss/getXXbyYY_r.c:253
#16 0xb7fc996d in do_mount_indirect (arg=0xb82eea20) at indirect.c:746
#17 0x0013532f in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#18 0x0023927e in clone () from /lib/libc-2.8.so

Comment 14 David 2008-06-05 22:37:58 UTC

Did a "debuginfo-install nss_ldap" and got a little more debug info for thr1:

Thread 1 (process 19501):
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x00185660 in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x00187028 in abort () at abort.c:88
#3  0x0017e57e in __assert_fail (assertion=<value optimized out>, 
    file=<value optimized out>, line=<value optimized out>, 
    function=<value optimized out>) at assert.c:78
#4  0x0051528a in ber_flush2 () from /usr/lib/libnss_ldap-259.so
#5  0x004fb729 in ldap_int_flush_request () from /usr/lib/libnss_ldap-259.so
#6  0x004fbb1f in ldap_send_server_request () from /usr/lib/libnss_ldap-259.so
#7  0x004fbd02 in ldap_send_initial_request () from /usr/lib/libnss_ldap-259.so
#8  0x004ef166 in ldap_search () from /usr/lib/libnss_ldap-259.so
#9  0x004ef2f8 in ldap_search_st () from /usr/lib/libnss_ldap-259.so
#10 0x004e055b in do_search_s (
    base=0x5289d3 "ou=People,dc=headquarters,dc=integrinautics,dc=com", 
    scope=1, filter=0xb7b8684c "(&(objectClass=User)(msSFU30UidNumber=1003))", 
    attrs=0x5296e0, sizelimit=1, res=0xb7b87094) at ldap-nss.c:2739
#11 0x004df798 in do_with_reconnect (
    base=0x5289d3 "ou=People,dc=headquarters,dc=integrinautics,dc=com", 
    scope=1, filter=0xb7b8684c "(&(objectClass=User)(msSFU30UidNumber=1003))", 
    attrs=0x5296e0, sizelimit=1, private=0xb7b87094, 
    search_func=0x4e04d0 <do_search_s>) at ldap-nss.c:2630
#12 0x004e02be in _nss_ldap_search_s (args=0xb7b870e0, 
    filterprot=0x52fea0 "(&(objectClass=User)(msSFU30UidNumber=%d))", 
    sel=LM_PASSWD, user_attrs=0x0, sizelimit=1, res=0xb7b87094)
    at ldap-nss.c:3154
#13 0x004e0a87 in _nss_ldap_getbyname (args=0xb7b870e0, result=0xb7b871f4, 
    buffer=0xb82f5258 "ldap", buflen=1024, errnop=0xb7b88b58, 
    filterprot=0x52fea0 "(&(objectClass=User)(msSFU30UidNumber=%d))", 
    sel=LM_PASSWD, parser=0x4e0cd0 <_nss_ldap_parse_pw>) at ldap-nss.c:3501
#14 0x004e1140 in _nss_ldap_getpwuid_r (uid=1003, result=0xb7b871f4, 
---Type <return> to continue, or q <return> to quit---
    buffer=0xb82f5258 "ldap", buflen=1024, errnop=0xb7b88b58) at ldap-pwd.c:263
#15 0x001f49c2 in __getpwuid_r (uid=<value optimized out>, 
    resbuf=<value optimized out>, buffer=<value optimized out>, 
    buflen=<value optimized out>, result=<value optimized out>)
    at ../nss/getXXbyYY_r.c:253
#16 0xb7fc996d in do_mount_indirect (arg=0xb82eea20) at indirect.c:746
#17 0x0013532f in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#18 0x0023927e in clone () from /lib/libc-2.8.so

Comment 15 Ian Kent 2008-06-06 13:10:10 UTC

(In reply to comment #13)
> 
> Thread 1 (process 19501):
> #0  0x0012e416 in __kernel_vsyscall ()
> #1  0x00185660 in raise (sig=<value optimized out>)
>     at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #2  0x00187028 in abort () at abort.c:88
> #3  0x0017e57e in __assert_fail (assertion=<value optimized out>, 
>     file=<value optimized out>, line=<value optimized out>, 
>     function=<value optimized out>) at assert.c:78
> #4  0x0051528a in ?? () from /usr/lib/libnss_ldap.so.2
> #5  0x004fb729 in ?? () from /usr/lib/libnss_ldap.so.2
> #6  0x004fbb1f in ?? () from /usr/lib/libnss_ldap.so.2
> #7  0x004fbd02 in ?? () from /usr/lib/libnss_ldap.so.2
> #8  0x004ef166 in ?? () from /usr/lib/libnss_ldap.so.2
> #9  0x004ef2f8 in ?? () from /usr/lib/libnss_ldap.so.2
> #10 0x004e055b in ?? () from /usr/lib/libnss_ldap.so.2
> #11 0x004df798 in ?? () from /usr/lib/libnss_ldap.so.2
> #12 0x004e02be in ?? () from /usr/lib/libnss_ldap.so.2
> #13 0x004e0a87 in ?? () from /usr/lib/libnss_ldap.so.2
> #14 0x004e1140 in _nss_ldap_getpwuid_r () from /usr/lib/libnss_ldap.so.2
> #15 0x001f49c2 in __getpwuid_r (uid=<value optimized out>, 
>     resbuf=<value optimized out>, buffer=<value optimized out>, 
>     buflen=<value optimized out>, result=<value optimized out>)
>     at ../nss/getXXbyYY_r.c:253
> #16 0xb7fc996d in do_mount_indirect (arg=0xb82eea20) at indirect.c:746
> #17 0x0013532f in start_thread (arg=<value optimized out>)
>     at pthread_create.c:297
> #18 0x0023927e in clone () from /lib/libc-2.8.so
> 

Oh man, this looks similar to something else I've seen just
recently. On a Rawhide install I think. But my system just
hung so I thought it was a locking issue.

I made some changes to work around it that might help but
I can't find it just now. I'll have a look around. But,
that can only be a temporary fix. I think this is a glibc
function so we need to seek the advice of those folks.

Ian

Comment 16 Ian Kent 2008-06-06 13:47:21 UTC

Created attachment 308529 [details]
getpwuid_r issue I saw recently

fwi, this is the gdb trace of the issue I mentioned.
I wasn't using LDAP so it may well be something quite
different but, since there have been few changes to 
autofs in this area for some time I felt it may well be
a glibc issue.

Comment 17 Jeff Moyer 2008-06-06 15:31:37 UTC

OK, it looks like we are doing an ldap lookup, even though you don't have any
maps in ldap.  Ian, is the default nss action supposed to be
[notfound=continue]?  That sounds not quite right to me, but it looks like what
is happenning.  Even though we found the map (auto_projects) on the file system,
when we don't find the key we want it continues on to the next source (ldap in
this case).

Now, in the specific instance that caused the problem, it *looks* like the file
map wasn't even consulted.  I'm not sure if that's the case though, as maybe the
logs didn't make it out to syslog or to disk before the damon died.

David, if it is indeed the case that you don't use ldap (sure looks that way),
then you can modify /etc/nsswitch.conf to use only files, like so:

automount: files

That should fix your problem for the time being.  I'll try to reproduce this
problem locally.

Moving on to the core dump:

So, the stack trace here doesn't really implicate glibc directly.  It's not
clear what the problem is, though, as gcc helpfully optimized out which assert
triggered the problem.  *sigh*  Anyway, the top of the stack trace is all in the
openldap code.  There are four asserts which could have caused the problem:

int
ber_flush2( Sockbuf *sb, BerElement *ber, int freeit )
{
        ber_len_t       towrite;
        ber_slen_t      rc;

        assert( sb != NULL );
        assert( ber != NULL );

        assert( SOCKBUF_VALID( sb ) );
        assert( LBER_VALID( ber ) );

This is called from:

int
ldap_int_flush_request(
        LDAP *ld,
        LDAPRequest *lr )
{
        LDAPConn *lc = lr->lr_conn;

        if ( ber_flush2( lc->lconn_sb, lr->lr_ber, LBER_FLUSH_FREE_NEVER ) != 0 ) {

Which is in turn called from ldap_int_flush_request:

        /* If we still have an incomplete write, try to finish it before
         * dealing with the new request. If we don't finish here, return
         * LDAP_BUSY and let the caller retry later. We only allow a single
         * request to be in WRITING state.
         */
        rc = 0;
        if ( ld->ld_requests &&
                ld->ld_requests->lr_status == LDAP_REQST_WRITING &&
                ldap_int_flush_request( ld, ld->ld_requests ) < 0 )
        {
                rc = -1;
        }
        if ( rc ) return rc;


ldap_send_initial_request:

#ifdef LDAP_R_COMPILE
        ldap_pvt_thread_mutex_lock( &ld->ld_req_mutex );
#endif
        rc = ldap_send_server_request( ld, ber, msgid, NULL,
                NULL, NULL, NULL );

Notice, here, how ldap_send_server_request is protected by a mutex.  So, Ian,
I'm not sure wrapping the call with another mutex will fix the problem.  At
least, we don't have enough evidence of what the problem actually is to be sure.

Now, given that we don't actually have an ldap server that responds to requests,
I'm guessing that any one of these problems could trigger.  That is most likely
a bug in the ldap library, but I'll need to reproduce it to find out for sure.

Comment 18 Jeff Moyer 2008-06-06 15:54:46 UTC

Jun  5 08:37:47 chewbacca automount[1845]: parse_server_string: lookup(ldap):
Attempting to parse LDAP information from string "auto_home".
Jun  5 08:37:47 chewbacca automount[1845]: parse_server_string: lookup(ldap):
mapname auto_home
Jun  5 08:37:47 chewbacca automount[1845]: parse_ldap_config: lookup(ldap): ldap
authentication configured with the following options:
Jun  5 08:37:47 chewbacca automount[1845]: parse_ldap_config: lookup(ldap):
use_tls: 0, tls_required: 0, auth_required: 1, sasl_mech: (null)
Jun  5 08:37:47 chewbacca automount[1845]: parse_ldap_config: lookup(ldap):
user: (null), secret: unspecified, client principal: (null) credential cache: (null)
Jun  5 08:37:47 chewbacca automount[1845]: do_bind: lookup(ldap): auth_required:
1, sasl_mech (null)
Jun  5 08:37:47 chewbacca automount[1845]: do_bind: lookup(ldap): ldap anonymous
bind returned 0
Jun  5 08:37:47 chewbacca automount[1845]: get_query_dn: lookup(ldap): query
failed for (&(objectclass=nisMap)(nisMapName=auto_home)): Operations error
Jun  5 08:37:47 chewbacca automount[1845]: get_query_dn: lookup(ldap): query
failed for (&(objectclass=automountMap)(ou=auto_home)): Operations error
Jun  5 08:37:47 chewbacca automount[1845]: get_query_dn: lookup(ldap): query
failed for (&(objectclass=automountMap)(automountMapName=auto_home)): Operations
error
Jun  5 08:37:47 chewbacca automount[1845]: lookup(ldap): failed to find valid
query dn
Jun  5 08:37:47 chewbacca automount[1845]: lookup(ldap): couldn't connect to
server default
Jun  5 08:37:47 chewbacca automount[1845]: do_read_map: lookup module ldap failed

So, an anonymous bind succeeds, but searches fail.  What LDAP server are you
using, and is it configured to disallow anonymous searches?

Comment 19 Nalin Dahyabhai 2008-06-06 17:39:08 UTC

Let's see if we can get a better idea of which part of this puzzle's causing
this crash.  If you start the 'nscd' service with '/sbin/service nscd start',
does this problem go away?  If it does, then there's something wrong in nss_ldap
that's being triggered by autofs.

Thanks!

Comment 20 Jeff Moyer 2008-06-06 20:39:38 UTC

After speaking with Nalin, we think that the anonymous bind may not actually be
passed over the wire, which is causing the Operations Error for the subsequent
search requests.  Ian and Nalin both mentioned that we are likely dealing with
an Active Directory server.  I was not able to reproduce that behaviour using
openldap as the server.  I'll try with FDS, but I'm not sure if I will be able
to reproduce this.

I'd like to note that my suggestion of removing ldap from the automount sources
list may not address your problem.  I forgot that the abort was triggered in the
nss library, not in our lookup module.  However, if your maps aren't in ldap, it
is probably a good idea to remove that from your automount line anyway.

Comment 21 David L. 2008-06-06 21:53:00 UTC

I asked my IT guy but he hasn't gotten back to me yet.  But I'm pretty sure it
is an active directory server like you thought.

Comment 22 Jeff Moyer 2008-06-10 14:57:17 UTC

Still waiting to hear the results from trying the suggestion in comment #19.

Comment 23 David 2008-06-10 15:47:33 UTC

Sorry... I'll boot into f9 today and try it.  Unfortunately, this bug is keeping
me out of f9 most of the time: 
https://bugzilla.redhat.com/show_bug.cgi?id=449460 .

Also, I'm not sure if this is relevant, but I updated openldap from the testing
repository due to this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=450017
Specifically, I did this: "yum --enablerepo=updates-testing update openldap"

I haven't seen the autofs die since, but I'm getting a limited amount of f9 test
time because I'm having so many different problems with it.  Could updating
openldap have fixed the autofs problem?  If so, do you want me to try the
suggestion in comment #19 or just run with the updated openldap for a while to
avoid changing to many things at once?

Comment 24 Jeff Moyer 2008-06-10 18:07:00 UTC

(In reply to comment #23)
> Sorry... I'll boot into f9 today and try it.  Unfortunately, this bug is keeping
> me out of f9 most of the time: 
> https://bugzilla.redhat.com/show_bug.cgi?id=449460 .
> 
> Also, I'm not sure if this is relevant, but I updated openldap from the testing
> repository due to this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=450017

ah-ha!

#10 0x00c02660 in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#11 0x00c04028 in abort () at abort.c:88
#12 0x00bfb57e in __assert_fail (assertion=<value optimized out>, 
    file=<value optimized out>, line=<value optimized out>, 
    function=<value optimized out>) at assert.c:78
#13 0x05109c2a in ber_flush2 () from /usr/lib/libexchange-storage-1.2.so.3
#14 0x050ef8c9 in ldap_int_flush_request ()
   from /usr/lib/libexchange-storage-1.2.so.3
#15 0x050efcbf in ldap_send_server_request ()
   from /usr/lib/libexchange-storage-1.2.so.3
#16 0x050efea2 in ldap_send_initial_request ()
   from /usr/lib/libexchange-storage-1.2.so.3
#17 0x050e5e69 in ldap_ntlm_bind () from /usr/lib/libexchange-storage-1.2.so.3
#18 0x050d0740 in connect_ldap (gc=<value optimized out>, 
    op=<value optimized out>, ldap=<value optimized out>)
    at e2k-global-catalog.c:243

Looks like exactly our footprint!  I would say that there is a good chance that
this is the problem we were trying to hunt down.

> I haven't seen the autofs die since, but I'm getting a limited amount of f9 test
> time because I'm having so many different problems with it.  Could updating
> openldap have fixed the autofs problem?  If so, do you want me to try the
> suggestion in comment #19 or just run with the updated openldap for a while to
> avoid changing to many things at once?

Just run with the updated openldap.  I think that should do it.  If there are no
further problems, then we can close this as a duplicate of bug 450017.

Thanks so much for your patience and willingness to report bugs and work with
engineers.  It's people like you who make Fedora a great distribution!

Comment 25 David 2008-06-12 16:59:26 UTC

Created attachment 309100 [details]
another core dump backtrace

it seems I'm still having the automount crash even after updating openldap from
testing.  :(

Comment 26 Jeff Moyer 2008-06-12 17:25:57 UTC

OK, I'm updating my F9 box right now and will try to reproduce the problem.  I
suspect that the openldap patch did not actually fix the problem.

Comment 27 Jeff Moyer 2008-06-17 16:32:26 UTC

I managed to uncover another bug in the automounter, which I have subsequently
fixed.  Now I'm trying to reproduce this again.  Sorry for the delay, and I'll
keep you posted.

Comment 28 David 2008-09-03 14:54:21 UTC

I had some problems with matlab crashing and mathworks tech support said it's an ldap bug and pointed me to this:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=469232

Is this the same bug?  Debian shows it as fixed.

Comment 29 David 2008-11-04 01:25:34 UTC

What's the status on this bug?  It's still biting me several time per week.
I can't install f10 beta on due to unrelated problems, so I can't tell if
it is fixed in f10.

Comment 30 Ian Kent 2008-11-04 06:24:48 UTC

I'm not sure of the status of any possible LDAP problems.
Is the problem your seeing LDAP related as discussed above?
This bug is logged against autofs so your not going to see
any LDAP problems fixed here unless we change the component.

Anyway, you might want to try:
https://koji.fedoraproject.org/koji/buildinfo?buildID=67199

It has almost all the patches that are included in F10.

I'm not sure what autofs bug fixes have made it into the F9
kernel either but get the latest release kernel and we'll work
from there.

Ian

Comment 31 David Jansen 2008-11-19 10:51:00 UTC

I have not seen any problems of this kind in Fedora 10 preview, autofs and/or ldap seems stable here.

Comment 32 David 2008-11-19 17:12:19 UTC

So far I haven't seen the problem in f10 preview either, but I've only been
running for a few days due to an installation bug.  That's longer than I
usually go in f9 though, so I'm optimistic that it is squashed in f10.  I'll
wait a few more days to be sure.

Comment 33 Bug Zapper 2009-06-10 01:17:01 UTC

This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 34 David 2009-06-10 02:15:17 UTC

I think this bug was fixed in Fedora 10.  I've been running f11 since alpha and I haven't seen this problem, so I'll go ahead and close it.

Note You need to log in before you can comment on or make changes to this bug.