Bug 493789

Summary: autofs NFS mount failures with ldap
Product: Red Hat Enterprise Linux 5 Reporter: Ian Kent <ikent>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED NOTABUG QA Contact: BaseOS QE <qe-baseos-auto>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: ikent, jmoyer
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 490052 Environment:
Last Closed: 2009-04-03 05:17:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 490052    
Bug Blocks:    

Description Ian Kent 2009-04-03 05:13:11 UTC
+++ This bug was initially created as a clone of Bug #490052 +++

Created an attachment (id=335034)
syslog debug output

Description of problem:
autofs is broken in rawhide for me right now.  I'm trying to
do NFS mounts of home and project directories with intermittent
success.

Version-Release number of selected component (if applicable):
autofs-5.0.4-17.i586
openldap-2.4.15-1.fc11.i586
python-ldap-2.3.5-5.fc11.i586
openldap-clients-2.4.15-1.fc11.i586
apr-util-ldap-1.3.4-3.fc11.i586
openldap-devel-2.4.15-1.fc11.i586
nss_ldap-264-2.fc11.i586


How reproducible:
Almost always

Steps to Reproduce:
1. Try to access directories that successfully mount in fedora 10 using the same configuration files (which I will attach later)
  
Actual results:
operations hang or return failure attempting to access automounted NFS directories

Expected results:
Successful access to those directories.

Additional info:

--- Additional comment from idht4n on 2009-03-12 20:31:38 EDT ---

Created an attachment (id=335036)
nsswitch.conf

--- Additional comment from idht4n on 2009-03-12 20:34:06 EDT ---

Created an attachment (id=335037)
etc/sysconfig/autofs

--- Additional comment from idht4n on 2009-03-12 20:43:16 EDT ---

Created an attachment (id=335039)
auto.master

#auto_home looks something like this:
user1                                   manaan:/export/home/&
user2					manaan:/export/home/&
user3					manaan:/export/home/&
user4					manaan:/export/home/&
#...
#it's a lot bigger, but I think our user list is considered proprietary

--- Additional comment from ikent on 2009-03-12 20:52:55 EDT ---

automount[3414]: do_bind: lookup(ldap): ldap anonymous bind returned 0
automount[3414]: get_query_dn: lookup(ldap): query failed for (&(objectclass=nisMap)(nisMapName=auto_home)): Operations error

When you get this does the automount process disappear?

--- Additional comment from ikent on 2009-03-12 21:32:08 EDT ---

(In reply to comment #4)
> automount[3414]: do_bind: lookup(ldap): ldap anonymous bind returned 0
> automount[3414]: get_query_dn: lookup(ldap): query failed for
> (&(objectclass=nisMap)(nisMapName=auto_home)): Operations error
> 
> When you get this does the automount process disappear?  

I'm guessing it does.
Can you try this package and let me know if it resolves the
issue.

http://kojipkgs.fedoraproject.org/packages/autofs/5.0.4/19

--- Additional comment from idht4n on 2009-03-24 16:59:54 EDT ---

(In reply to comment #5)
> (In reply to comment #4)
> > automount[3414]: do_bind: lookup(ldap): ldap anonymous bind returned 0
> > automount[3414]: get_query_dn: lookup(ldap): query failed for
> > (&(objectclass=nisMap)(nisMapName=auto_home)): Operations error
> > 
> > When you get this does the automount process disappear?  
> 
> I'm guessing it does.
> Can you try this package and let me know if it resolves the
> issue.
> 
> http://kojipkgs.fedoraproject.org/packages/autofs/5.0.4/19  

Before trying the new version, I wanted to make sure I could
reproduce the problem.  The next time I booted, autofs worked.
What I found was that changing my eth0 to be NetworkManager
controlled (using system-config-network) fixed the problem.
I had previously manually started eth0 like this:

/sbin/ifconfig eth0 192.168.117.25 netmask 255.255.252.0

After doing this, I could communicate with the NFS server
and LDAP server, but autofs was not reliable.  When it was
in that state, stopping and starting autofs didn't help.

I think there is something related to a startup sequence
that is causing the problem (if eth0 isn't up at some
time in the boot sequence, autofs has problems).

--- Additional comment from ikent on 2009-03-24 21:42:49 EDT ---

(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > automount[3414]: do_bind: lookup(ldap): ldap anonymous bind returned 0
> > > automount[3414]: get_query_dn: lookup(ldap): query failed for
> > > (&(objectclass=nisMap)(nisMapName=auto_home)): Operations error
> > > 
> > > When you get this does the automount process disappear?  
> > 
> > I'm guessing it does.
> > Can you try this package and let me know if it resolves the
> > issue.
> > 
> > http://kojipkgs.fedoraproject.org/packages/autofs/5.0.4/19  
> 
> Before trying the new version, I wanted to make sure I could
> reproduce the problem.  The next time I booted, autofs worked.
> What I found was that changing my eth0 to be NetworkManager
> controlled (using system-config-network) fixed the problem.
> I had previously manually started eth0 like this:
> 
> /sbin/ifconfig eth0 192.168.117.25 netmask 255.255.252.0
> 
> After doing this, I could communicate with the NFS server
> and LDAP server, but autofs was not reliable.  When it was
> in that state, stopping and starting autofs didn't help.
> 
> I think there is something related to a startup sequence
> that is causing the problem (if eth0 isn't up at some
> time in the boot sequence, autofs has problems).  

This description isn't really useful.
You need to provide debug logs along with a problem description.
You need to not change too many things at once (ideally one thing
at a time) as you test.

We know there is a problem with startup ordering due to LSB
changing the init script order. I added an LSB block in rev 20
but I'm not yet sure if that do what we need it to do. It would
be worth using the later version.

--- Additional comment from idht4n on 2009-03-26 09:18:02 EDT ---

Debug logs were already provided in an attachment
as was a problem description.

--- Additional comment from ikent on 2009-03-26 09:38:38 EDT ---

(In reply to comment #8)
> Debug logs were already provided in an attachment
> as was a problem description.  

Those debug logs indicate a problem at a specific place in
the code which I think I have fixed.

If you observe a change in behaviour or change the way you
test you need to collect and post the debug for that also.

Given the information provided so far I don't have anything
new to work with so I can't work out what may be going wrong
in your case.

--- Additional comment from idht4n on 2009-03-26 17:25:55 EDT ---

I backed out the change that masked the bug (disabled
network manager control of eth0) and reproduced the
failure.  Then I updated to:

http://kojipkgs.fedoraproject.org/packages/autofs/5.0.4/19

As far as I can tell, this did fix the problem.  Thanks.

Comment 1 Ian Kent 2009-04-03 05:17:55 UTC
Sorry, bit to quick to clone this.
It's not an issue with RHEL-5 it was introduced in a Rawhide
update.