Bug 444618 - id hangs when ldap used in nsswitch
id hangs when ldap used in nsswitch
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: nss_ldap (Show other bugs)
9
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Nalin Dahyabhai
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-29 11:29 EDT by John Hodrien
Modified: 2015-04-20 07:03 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-14 12:03:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
ldap.conf (874 bytes, application/octet-stream)
2008-04-29 11:29 EDT, John Hodrien
no flags Details
ldap.conf (889 bytes, text/plain)
2008-04-29 11:33 EDT, John Hodrien
no flags Details
stack trace (4.14 KB, text/plain)
2008-04-30 12:26 EDT, John Hodrien
no flags Details
last few lines of id user (3.71 KB, text/plain)
2008-05-06 08:10 EDT, Jan Safranek
no flags Details
End of log using openldap-2.3.39-3.fc8.x86_64 (4.05 KB, text/plain)
2008-05-06 11:29 EDT, John Hodrien
no flags Details

  None (edit)
Description John Hodrien 2008-04-29 11:29:02 EDT
Description of problem:

id fails with nss_page_results true.  This currently makes getting user
information out of ldap impossible for us.

Version-Release number of selected component (if applicable):

nss_ldap-259-3.fc9.x86_64
openldap-2.4.8-3.fc9.x86_64

How reproducible:

When nss_paged_results true is defined in /etc/ldap.conf, it reliably hangs when
id is run.  This is not optional due to the size of the AD domain in use.

Steps to Reproduce:
1.  Enable ldap, using nss_paged_results true
2.  run "id <username>"
  
Actual results:

id doesn't return.

Expected results:

id returns showing group membership.

Additional info:

LDAP source is Active Directory.

strace just shows it stuck on a poll:

...
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP, revents=POLLIN}], 1, -1) = 1
read(3, "0\204\0\0\0J\2\1", 8)          = 8
read(3, "\2s\204\0\0\0A\4?ldap://foo.domain"..., 72) = 72
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP, revents=POLLIN}], 1, -1) = 1
read(3, "0\204\0\0\0\\\2\1", 8)         = 8
read(3, "\2s\204\0\0\0S\4Qldap://bar.d"..., 90) = 90
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP, revents=POLLIN}], 1, -1) = 1
read(3, "0\204\0\0\0\\\2\1", 8)         = 8
read(3, "\2s\204\0\0\0S\4Qldap://baz.d"..., 90) = 90
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP, revents=POLLIN}], 1, -1) = 1
read(3, "0\204\0\0\0L\2\1", 8)          = 8
read(3, "\2s\204\0\0\0C\4Aldap://qux/C"..., 74) = 74
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP, revents=POLLIN}], 1, -1) = 1
read(3, "0\204\0\0\0A\2\1", 8)          = 8
read(3, "\2e\204\0\0\0\7\n\1\0\4\0\4\0\240\204\0\0\0+0\204\0\0\0%\4\0261.2."...,
63) = 63
stat("/etc/ldap.conf", {st_mode=S_IFREG|0644, st_size=1376, ...}) = 0
geteuid()                               = 23423
getsockname(3, {sa_family=AF_INET, sin_port=htons(46315),
sin_addr=inet_addr("192.168.0.230")}, [16]) = 0
getpeername(3, {sa_family=AF_INET, sin_port=htons(389),
sin_addr=inet_addr("192.168.1.32")}, [68719476752]) = 0
poll(

Output from strace stops there.

ltrace shows it in getgrouplist:

getgrouplist(0x7fffd2e81977, 3200, 0, 0x7fffd2e8049c, 0 <unfinished ...>

  Rebuilding the rawhide srpm and using it on Fedora 7 works just fine.  Ideas
for where to look next welcomed ;)
Comment 1 John Hodrien 2008-04-29 11:29:02 EDT
Created attachment 304129 [details]
ldap.conf
Comment 2 John Hodrien 2008-04-29 11:33:48 EDT
Created attachment 304132 [details]
ldap.conf
Comment 3 John Hodrien 2008-04-30 12:25:55 EDT
Hmm, this doesn't appear to be nss_ldap as such.  Building an nss_ldap on Fedora
7, and installing it on Fedora 9 works fine.  Upping Fedora 7 to openldap-2.4.8
and building nss_ldap on that breaks in the same way.  I've attached a stack
trace of where it gets stuck.
Comment 4 John Hodrien 2008-04-30 12:26:52 EDT
Created attachment 304253 [details]
stack trace
Comment 5 Nalin Dahyabhai 2008-04-30 15:05:16 EDT
Adding Jan (the openldap package maintainer) to the CC: list.
Comment 6 Jan Safranek 2008-05-05 08:11:20 EDT
It's hard to guess what's wrong and I do not have AD to test with. Could you
please try to add "debug -1" to your /etc/ldap.conf and attach result of "id
user"  ? It should produce lot of ldap logs, revealing your ldap database
structure (no passwords).
Comment 7 Jan Safranek 2008-05-05 08:38:20 EDT
btw, use
Comment 8 Jan Safranek 2008-05-05 08:40:12 EDT
sorry for the last incomplete post...

btw use "nss_page_results yes" instead of "true", but I doubt it makes any
difference.
Comment 9 Jan Safranek 2008-05-06 08:10:49 EDT
Created attachment 304630 [details]
last few lines of id user

here are last few lines from log provided by email.
Comment 10 Jan Safranek 2008-05-06 09:05:50 EDT
It looks like that ldap client waits for something, but all requests have been
already successfully finished... OpenLDAP-2.4.x has brand new API for paged
results, the bug is probably there.

I need further assistance - could you please provide the same logs, but with the
OpenLDAP distributed with Fedora 8 (i.e. openldap-2.3.39-3), just to compare the
logs?

And as simple test, try to run "ldapsearch -E pr=10/prompt <other params as
needed>" against your AD - it should perform paged search. If it hangs at the
end, the bug is in openldap for sure and we can forget the nss_ldap.
Comment 11 John Hodrien 2008-05-06 11:29:20 EDT
Created attachment 304643 [details]
End of log using openldap-2.3.39-3.fc8.x86_64

Looks much the same on the F9 box.  Cannot reproduce using ldapsearch as
described, merrily ticks along with no problems.  On F7 with the new openldap
it works fine with an old nss_ldap but breaks with the F9 nss_ldap.  I'm going
to have to find some time to look at this in depth.
Comment 12 Jan Safranek 2008-05-07 08:39:25 EDT
Please try to analyze the ldap traffic with wireshark or so (or attach it here,
beware of passwords!) - it seems to me that the AD sends wrong data to the
nss_ldap. From the hexdumps you sent I can see the connection gets broken and
nss_ldap rebinds quite often, but I do not see why (I'm not THAT good parsing
ASN1 from hexdump :). Still, this does not explain why it hangs with paged
results and works otherwise.

Nalin: nss_ldap has implementation of paged control in pagectrl.c. This was
necessary for openldap-2.3, but openldap-2.4 has its own (again in pagectrl.c).
The configure script of nss_ldap detects that and uses the openldap's one - but
the implementation is slightly different. Also the API is different -
ldap_parse_page_control is declared deprecated and
ldap_parse_pageresponse_control should be used instead. To me it seems that
nss_ldap wrongly evaluates the search result and calls ldap_result when all
results are already returned. That's my first guess what could be wrong. Could
you please look at it? Or prepare a build, where you would log all results (and
errno) from all ldap_* calls? It seems to me it's easier on nss_ldap side.

I haven't reproduced the bug yet. I've found a local AD server, but it does not
contain POSIX users/groups, only MS Windows ones and the mapping is not
possible. Do you know of any?
Comment 13 Bug Zapper 2008-05-14 06:22:50 EDT
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 14 Jan Safranek 2008-05-15 06:30:19 EDT
Adding some info received by mail from the reporter (next time please post
everything to Bugzilla!):

We locally patch our nss_ldap to do lookups on the RIDs instead,
which removes the need for the Services for Unix attributes.  But none of this
has been done with that, so we're left with an incomplete POSIX mapping (as
some gid entries are present and some are not).
Comment 15 Jan Safranek 2008-05-15 08:06:39 EDT
I also investigated tcpdump captures of LDAP communication between nss_ldap and
AD and I don't see anything wrong - AD always returns proper results and all of
them are in one "page" (i.e. <1000 results are always returned and paging was
not used). I think the problem is not is ldap library. 

Nalin, could you please prepare nss_ldap where you would log all results (and
errno) from all ldap_* calls and provide some additional logs when you process
page control? I really think we should focus there.
Comment 16 Bug Zapper 2009-06-09 20:31:18 EDT
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 17 Bug Zapper 2009-07-14 12:03:59 EDT
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.
Comment 18 John Hodrien 2015-04-20 07:03:37 EDT
bugzilla, please stop bugging me to provide info on a ticket that was closed over five years ago!

Note You need to log in before you can comment on or make changes to this bug.