197746 – autofs queries LDAP server too many times for non-existent mount points

Bug 197746 - autofs queries LDAP server too many times for non-existent mount points

Summary: autofs queries LDAP server too many times for non-existent mount points

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	autofs
Sub Component:
Version:	6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Ian Kent
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	180495
TreeView+	depends on / blocked

Reported:	2006-07-05 22:19 UTC by Jeff Bastian
Modified:	2007-11-30 22:11 UTC (History)
CC List:	2 users (show)
Fixed In Version:	autofs-5.0.1-0.rc2.8
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-10-06 03:30:28 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jeff Bastian 2006-07-05 22:19:11 UTC

Description of problem:
If I try to cd to an automounted directory that doesn't exist, the automount
daemon makes 4 queries to the LDAP server to determine that it really doesn't
exist.  (This is better than autofs-4.x which took 9 queries!)

Running wireshark, I see it makes 4 queries and the only thing it changes is the
filter:

1. (&(objectclass=automount)(automountKey=qwerty))
2. (&(objectclass=automount)(automountKey=/))
3. (&(objectclass=automount)(automountKey=qwerty))
4. (&(objectclass=automount)(automountKey=/))

Why does it query automountKey=/ when the search for the normal key fails?  And
why does it repeat both queries?  It should just quit after the 1st query fails.

Version-Release number of selected component (if applicable):
autofs-5.0.0_beta6-2

How reproducible:
Every time

Steps to Reproduce:
1. Run wireshark and watch traffic to the LDAP server
2. In a shell, run 'cd /automnt/qwerty' where /automnt/qwerty doesn't exist
3. Watch wireshark capture 4 LDAP queries before the automount daemon returns
"No such file or directory"
  
Actual results:
It took the automounter 4 LDAP queries, 2 of which were redundant, to determine
that a certain mount point didn't exist.

Expected results:
It should only take 1 query.

Comment 1 Ian Kent 2006-07-06 02:30:46 UTC

(In reply to comment #0)
> Description of problem:
> If I try to cd to an automounted directory that doesn't exist, the automount
> daemon makes 4 queries to the LDAP server to determine that it really doesn't
> exist.  (This is better than autofs-4.x which took 9 queries!)
> 
> Running wireshark, I see it makes 4 queries and the only thing it changes is the
> filter:
> 
> 1. (&(objectclass=automount)(automountKey=qwerty))
> 2. (&(objectclass=automount)(automountKey=/))
> 3. (&(objectclass=automount)(automountKey=qwerty))
> 4. (&(objectclass=automount)(automountKey=/))
> 
> Why does it query automountKey=/ when the search for the normal key fails?  And
> why does it repeat both queries?  It should just quit after the 1st query fails.
> 

Yes. I've seen this behaviour in several different forms over
time. I'm sure there are still some oppertunities for improvement
in all the lookup modules.

First the wildcard.
When we lookup a key and it doesn't find a match in the map there's
still the possibility that there's a wildcard entry that will match
any key. Also there's the possibility that the map may have changed
since the last lookup so were stuck not knowing. So we need to try
both.

It's done this way because I believe, like NIS, there is no guarantee
as to the order query entries are returned and we require matching a
lookup key against a map key before matching against the wildcard key.

Perhaps your thinking I could construct a search with an "|". Possibly
I can, but it's not that straight forward as there are a couple of other
cases to allow for. I'll think about it.

The second issue you point out far more difficult to deal with.
Basically, autofs is at the mercy of applications making system
calls that cause the kernel module to trigger a lookup. The kernel
module attempted to cache negative lookups at one time but that
didn't work properly. I plan on implementing this in the kernel
module in time to come but because of the huge changes with v5 I
want to let that stabalize first. Also I need to think more about
how I'll do it.

Ian

Comment 2 Ian Kent 2006-07-07 12:16:50 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > Running wireshark, I see it makes 4 queries and the only thing it changes is the
> > filter:
> > 
> > 1. (&(objectclass=automount)(automountKey=qwerty))
> > 2. (&(objectclass=automount)(automountKey=/))
> > 3. (&(objectclass=automount)(automountKey=qwerty))
> > 4. (&(objectclass=automount)(automountKey=/))
> > 
> > Why does it query automountKey=/ when the search for the normal key fails?  And
> > why does it repeat both queries?  It should just quit after the 1st query fails.
> > 

snip ...

> First the wildcard.
> When we lookup a key and it doesn't find a match in the map there's
> still the possibility that there's a wildcard entry that will match
> any key. Also there's the possibility that the map may have changed
> since the last lookup so were stuck not knowing. So we need to try
> both.
> 
> It's done this way because I believe, like NIS, there is no guarantee
> as to the order query entries are returned and we require matching a
> lookup key against a map key before matching against the wildcard key.
> 
> Perhaps your thinking I could construct a search with an "|". Possibly
> I can, but it's not that straight forward as there are a couple of other
> cases to allow for. I'll think about it.

Just to keep you updated.
Today I finally started work to try and combine these two lookups
into one. It would be good if we can reduce the number of queries
by half, at least to start with.

Ian

Comment 3 Jeff Bastian 2006-07-07 20:25:45 UTC

(In reply to comment #1)
> > 2. (&(objectclass=automount)(automountKey=/))
<snip>
> First the wildcard.
> When we lookup a key and it doesn't find a match in the map there's
> still the possibility that there's a wildcard entry that will match
> any key.

I wasn't aware that '/' was a wildcard in LDAP.  If I manually run
    ldapsearch ... '(&(objectclass=automount)(automountKey=/))'
it doesn't return anything so it doesn't appear to work as a wildcard.  If,
however, I use '*' instead of '/', I get the entire map back:
    ldapsearch ... '(&(objectclass=automount)(automountKey=*))'

Should the '/' be a '*'?


> Also there's the possibility that the map may have changed
> since the last lookup so were stuck not knowing. So we need to try
> both.

If the search for the actual key doesn't return anything, though, then how will
using a wildcard to get the map help?  If the key doesn't exist when you ask the
LDAP server for it directly, then downloading the entire map and manually
searching for the key isn't going to help - it's still not going to exist.

Or are you looking for a wildcard in the map itself?  For example, an entry like
   *  server:/home/&
I can see why you might want to download the entire map to look for an entry
like this, however, as I mentioned above, the '/' doesn't work as a wildcard if
you're trying to get the entire map.

 

> The second issue you point out far more difficult to deal with.
> Basically, autofs is at the mercy of applications making system
> calls that cause the kernel module to trigger a lookup. The kernel
> module attempted to cache negative lookups at one time but that
> didn't work properly. I plan on implementing this in the kernel
> module in time to come but because of the huge changes with v5 I
> want to let that stabalize first. Also I need to think more about
> how I'll do it.

I'm not following you here.  Are you saying that if I try
    cd /automnt/qwerty
the shell is somehow going to cause the automounter to query the LDAP server
twice if the first attempt doesn't work?

Comment 4 Ian Kent 2006-07-08 04:38:07 UTC

(In reply to comment #3)
> (In reply to comment #1)
> > > 2. (&(objectclass=automount)(automountKey=/))
> <snip>
> > First the wildcard.
> > When we lookup a key and it doesn't find a match in the map there's
> > still the possibility that there's a wildcard entry that will match
> > any key.
> 
> I wasn't aware that '/' was a wildcard in LDAP.  If I manually run
>     ldapsearch ... '(&(objectclass=automount)(automountKey=/))'
> it doesn't return anything so it doesn't appear to work as a wildcard.  If,
> however, I use '*' instead of '/', I get the entire map back:
>     ldapsearch ... '(&(objectclass=automount)(automountKey=*))'
> 
> Should the '/' be a '*'?

Nop.

The '/' is used as the autofs map wildcard within LDAP autofs
maps.

The '*' is the LDAP match anything (LDAP wildcard) so that can't
be used as an autofs map wildcard as we need to match a specific
LDAP map (autofs wildcard) entry.

> 
> 
> > Also there's the possibility that the map may have changed
> > since the last lookup so were stuck not knowing. So we need to try
> > both.
> 
> If the search for the actual key doesn't return anything, though, then how will
> using a wildcard to get the map help?  If the key doesn't exist when you ask the
> LDAP server for it directly, then downloading the entire map and manually
> searching for the key isn't going to help - it's still not going to exist.
> 
> Or are you looking for a wildcard in the map itself?  For example, an entry like
>    *  server:/home/&
> I can see why you might want to download the entire map to look for an entry
> like this, however, as I mentioned above, the '/' doesn't work as a wildcard if
> you're trying to get the entire map.

We don't want to download the entire map for this.

I can see your confusion over this but I think the bit your missing
is that autofs treats the '/' from an LDAP map key to mean '*' internally
within the autofs LDAP lookup module. So the map entry

   *  server:/home/&

is stored in LDAP as

automountKey: /
automountInformation: server:/home/&

so we can recognise it as the autofs wildcard map entry.

> > The second issue you point out far more difficult to deal with.
> > Basically, autofs is at the mercy of applications making system
> > calls that cause the kernel module to trigger a lookup. The kernel
> > module attempted to cache negative lookups at one time but that
> > didn't work properly. I plan on implementing this in the kernel
> > module in time to come but because of the huge changes with v5 I
> > want to let that stabalize first. Also I need to think more about
> > how I'll do it.
> 
> I'm not following you here.  Are you saying that if I try
>     cd /automnt/qwerty
> the shell is somehow going to cause the automounter to query the LDAP server
> twice if the first attempt doesn't work?

Not quite what I meant but that does appear to be what happens at
least for "cd". "ls" otoh appears to be not so persisent.

Remember that what triggers a mount is a system call like open(2)
or opendir(3) or other such call which causes a path lookup in the
VFS which calls the autofs4 filesystem methods. This then leads to
an upcall to the userspace daemon. What I was trying to describe
is that the calls that the VFS makes to autofs4 are very specific
and autofs4 has very little information as to what system call
caused the lookup so all it can do is react by making an upcall
to the daemon.

There is at least one other case in the VFS lookup that can lead
to a second call to the autofs4 module within the same lookup and
I have recently submitted a kernel patch that I hope will remedy
this (but probably not the case here as it relates to browable
maps). Unfortunately it took some time to come up with a solution
to this case. Once I'm sure that this (fairly straight forward in
the end) patch is functioning correctly I will start thinking about
caching failed mount callbacks to the daemon for some brief time,
probably about 5 seconds, so that multiple system calls resulting
in a failure will not generate multiple upcalls.

Maybe I appears I'm a bit paranoid, leaving the caching till I'm
happy that the version 5 changes are sound. The kernel changes
for version 5 are quite significant (about 40 small patches in all,
including bug fixes) so I didn't want to obscure the base function
with failure caching within the initial implementation.

Clear as mud, yes!

So I know it's a problem but I'm working on it.

In the mean time if I can merge the two lookups in the userspace
LDAP module we can reduce the number of queries by half.

Ian

Comment 5 Ian Kent 2006-07-10 07:38:08 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > Running wireshark, I see it makes 4 queries and the only thing it changes
is the
> > > filter:
> > > 
> > > 1. (&(objectclass=automount)(automountKey=qwerty))
> > > 2. (&(objectclass=automount)(automountKey=/))
> > > 3. (&(objectclass=automount)(automountKey=qwerty))
> > > 4. (&(objectclass=automount)(automountKey=/))
> > > 
> > > Why does it query automountKey=/ when the search for the normal key fails?
 And
> > > why does it repeat both queries?  It should just quit after the 1st query
fails.

Very interesting.
After my first attempt to combine the wildcard and key lookup
I'm getting 2 queries for the for "ls" on an invalid key. That
shouldn't be happening for this case.

Working on it.
Ian

Comment 6 Ian Kent 2006-07-11 08:50:58 UTC

(In reply to comment #5)
> (In reply to comment #2)
> > > > why does it repeat both queries?  It should just quit after the 1st query
> fails.
> 
> Very interesting.
> After my first attempt to combine the wildcard and key lookup
> I'm getting 2 queries for the for "ls" on an invalid key. That
> shouldn't be happening for this case.

First I've combined the LDAP query to lookup the key and check
for the wildcard entry into one so that's done. It is included
in autofs-5.0.0_beta6-7 which I've just now built. So if you
could give this a try when it's available and you have time
that would be great.

I've investigated the multiple daemon upcalls again and have
come to the same conclusion as previously.

An strace of "ls" shows
stat("/ldap/ddddddd", 0x616b68)         = -1 ENOENT (No such file or directory)
lstat("/ldap/ddddddd", 0x616b68)        = -1 ENOENT (No such file or directory)

which, after going through the kernel code path, results in two
callbacks to the daemon.

An strace of "cd" shows
stat("/ldap", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/ldap/dbuast", 0x7fffff80f250)    = -1 ENOENT (No such file or directory)
chdir("/ldap/dbuast")                   = -1 ENOENT (No such file or directory)
chdir("/ldap/dbuast")                   = -1 ENOENT (No such file or directory)

which results in three callbacks to the daemon. Why it tries to
chdir a second time after the first one fails is a mystery to me.

At one time stat calls would not cause a callback but things have
changed a fair bit in the kernel and it's probably better that way
anyway.

But the upshot of this is that implementing the caching of mount
fails in the kernel module needs to be done as soon as possible.
This functionality will be dependent on a patch that is currently
pending in the -mm kernel which seems to have attracted some
reluctance at this stage. The patch is however making it's way
into the Rawhide kernel thanks to the efforts of Dave Jones. I'll
let you know how this goes and how the cacheing of negative
callbacks goes.

Ian

Comment 7 Jeff Bastian 2006-07-11 20:43:13 UTC

(In reply to comment #4)
> So the map entry 
>    *  server:/home/&
> is stored in LDAP as
>    automountKey: /
>    automountInformation: server:/home/&
> so we can recognise it as the autofs wildcard map entry.

Ah-hah!  I wasn't familiar with this detail of LDAP automount maps because we
don't use the wildcards in our maps.  We have many different NFS servers and
paths so a wildcard entry wouldn't work for us so I never even tried to create
one.  I can see, though, that trying to create an automountKey of '*' would be a
problem since that's an LDAP wildcard.

Thanks!  I guess I should go study the RFCs some more.  :)

Jeff

Comment 8 Ian Kent 2006-07-13 09:15:02 UTC

(In reply to comment #6)
> 
> I've investigated the multiple daemon upcalls again and have
> come to the same conclusion as previously.
> 
> An strace of "ls" shows
> stat("/ldap/ddddddd", 0x616b68)         = -1 ENOENT (No such file or directory)
> lstat("/ldap/ddddddd", 0x616b68)        = -1 ENOENT (No such file or directory)
> 
> which, after going through the kernel code path, results in two
> callbacks to the daemon.
> 
> An strace of "cd" shows
> stat("/ldap", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> stat("/ldap/dbuast", 0x7fffff80f250)    = -1 ENOENT (No such file or directory)
> chdir("/ldap/dbuast")                   = -1 ENOENT (No such file or directory)
> chdir("/ldap/dbuast")                   = -1 ENOENT (No such file or directory)
> 
> which results in three callbacks to the daemon. Why it tries to
> chdir a second time after the first one fails is a mystery to me.
> 
> At one time stat calls would not cause a callback but things have
> changed a fair bit in the kernel and it's probably better that way
> anyway.
> 
> But the upshot of this is that implementing the caching of mount
> fails in the kernel module needs to be done as soon as possible.
> This functionality will be dependent on a patch that is currently
> pending in the -mm kernel which seems to have attracted some
> reluctance at this stage. The patch is however making it's way
> into the Rawhide kernel thanks to the efforts of Dave Jones. I'll
> let you know how this goes and how the cacheing of negative
> callbacks goes.

I've done the cacheing of failed lookups.
Now you should see just one query to the LDAP server.
We may need to tune the time the failure remains negative.
I set it to 10 seconds to start with so please let me know
how it goes.

Unfortunately I couldn't do this in the kernel module which
would have been the best place. I've had to do it in the
userspace daemon. Not quite as efficient but the result is
the same. The change is available in version autofs-5.0.0_beta6-8.

Note that there has been a version change to avoid upgrade
problems as we go forward. No doubt you'll notice as the next
version after the one above is autofs-5.0.1-0.rc1.1.

Ian

Comment 9 Jeff Bastian 2006-07-17 20:43:25 UTC

I've upgraded to autofs-5.0.1-0.rc1.1 and kernel-2.6.17-1.2405.fc6 and I ran
wireshark while running 'ls /automnt/qwerty'.

The combined search filter is working: it's now looking for
  (&(objectclass=automount)(|(automountKey=qwerty)(automountKey=/)))

However, I'm still seeing two LDAP searches go out on the wire (the 2nd search
is about 0.7 seconds after the 1st).

It's getting better!

Comment 10 Jeff Bastian 2006-10-05 19:54:00 UTC

I upgraded my FC6 boxes to
   autofs-5.0.1-0.rc2.8
   kernel-2.6.18-1.2726.fc6
(among other packages) and I tested the LDAP queries again.  Today I'm only
seeing one LDAP search request for non-existent mount points.  It looks like
this BZ can be closed!

Thanks!
Jeff

Comment 11 Ian Kent 2006-10-06 03:30:28 UTC

(In reply to comment #10)
> I upgraded my FC6 boxes to
>    autofs-5.0.1-0.rc2.8
>    kernel-2.6.18-1.2726.fc6
> (among other packages) and I tested the LDAP queries again.  Today I'm only
> seeing one LDAP search request for non-existent mount points.  It looks like
> this BZ can be closed!

Thanks Jeff.

To be honest it should have been fixed when you last tested it
and no matter how hard I tried I couldn't see why it didn't
function as required.

Ian

> 

> Thanks!
> Jeff
>

Note You need to log in before you can comment on or make changes to this bug.