Bug 744132

Summary: [RFE] return error code when server closes idle connection
Product: Red Hat Enterprise Linux 6 Reporter: Ondrej Valousek <ondrejv>
Component: openldapAssignee: Jan Synacek <jsynacek>
Status: CLOSED WONTFIX QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: low Docs Contact:
Priority: unspecified    
Version: 6.1CC: dspurek, jhrozek, jplans, jsynacek, omoris, ovasik, sgallagh, syeghiay, tsmetana
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-04 13:24:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ondrej Valousek 2011-10-07 08:08:42 UTC
Right now, ldap libraries return a generic error when a ldap server closes a connection (which has been idle for too long for example). It would be nice, if the ldap library could return a specific error in this case so we know what has happened.

It is replicable using Win 2008 based ldap servers, for example.

Comment 2 Stephen Gallagher 2011-10-07 13:26:10 UTC
Specifically, it appears that we get back a response, but when we call ldap_result(), we get a return code of -1. There is no additional information in the diagnostic message to explain what has actually happened.

It would be better if we could get some information on what the failure actually was, so we could know whether it is safe to retry.

Comment 4 Jan Synacek 2012-03-20 10:01:42 UTC
Just a shot in the dark.. Struct ldap has a member 'ld_errno', which should contain more info about what happened. There are predefined values for that, check 'ldap.h'. There are LDAP_SERVER_DOWN or LDAP_TIMEOUT among others.

Could that be a possible solution?

Comment 5 Stephen Gallagher 2012-03-20 11:43:39 UTC
(In reply to comment #4)
> Just a shot in the dark.. Struct ldap has a member 'ld_errno', which should
> contain more info about what happened. There are predefined values for that,
> check 'ldap.h'. There are LDAP_SERVER_DOWN or LDAP_TIMEOUT among others.
> 
> Could that be a possible solution?

'struct ldap' is privately defined. The SSSD source can't look into its members without a helper routine.

Comment 6 Jan Synacek 2012-03-21 12:35:17 UTC
My point was that the functionality you propose is already there.

You only have to use ldap_get_option like so:
ldap_get_option(ldap, LDAP_OPT_RESULT_CODE, &res);

Then you can compare the value of res with values defined in ldap.h - section commented as 'API Error Codes'.

It's only an unlucky coincidence (or a bad design error) that the value of LDAP_SERVER_DOWN is -1, as is the value for unknown error returned in a different context.

Simple test results in:

(... server killed)
ldap bind failed: Can't contact LDAP server
rc: -1 (== LDAP_SERVER_DOWN)

(... server stopped to simulate a timeout)
ldap bind failed: Timed out
rc: -5 (== LDAP_TIMEOUT)

Comment 7 Jakub Hrozek 2012-04-17 06:48:17 UTC
(In reply to comment #6)
> It's only an unlucky coincidence (or a bad design error) that the value of
> LDAP_SERVER_DOWN is -1, as is the value for unknown error returned in a
> different context.
> 

Sorry, I still don't get it. When you call ldap_result() followed by a ldap_get_option(ld, LDAP_OPT_RESULT_CODE, &err) how do you distinguish between two meanings of -1? How does ldap_err2string() do that?

Comment 8 Jakub Hrozek 2012-04-25 09:41:41 UTC
I was playing with ldap_result and LDAP_OPT_RESULT_CODE and I'm still not sure this meets all our requirements.

To test, I set up an openldap server and set the olcIdleTimeout to 5 seconds. When a subsequent request comes in after 5 seconds, ldap_err2string only reported:
"Can't contact LDAP server".

The problem is that Can't contact LDAP server is not specific and we can't decide whether to retry the same server or more to the next configured server.

Our result hadling looks somewhat like this:

if (ldap_result() == -1) {
   ldap_get_option(ld,  LDAP_OPT_RESULT_CODE, &err);
   log_error("%s\n", ldap_err2string(ret));
}

I also added ldap_get_option(ld, LDAP_OPT_DIAGNOSTIC_MESSAGE, &msg) to get extra information, but that only returned NULL in this case.

Comment 9 Jan Synacek 2012-04-25 10:19:09 UTC
if (ldap_result() == -1) {
   ldap_get_option(ld,  LDAP_OPT_RESULT_CODE, &err);
   log_error("%s\n", ldap_err2string(err));
}

If err == -5, which is the value of LDAP_TIMEOUT, ldap_err2string(err) results in "Timed out".

Notice the usage of err in ldap_err2string.

Comment 10 Jan Synacek 2012-04-25 10:26:01 UTC
> Notice the usage of err in ldap_err2string.
Forgot to emphasize it's the same 'err' you get via ldap_get_option.

Comment 11 Jan Vcelak 2012-04-25 11:03:23 UTC
Jakub is right. When the server cuts off the connection, LDAP_OPT_RESULT_CODE will return LDAP_SERVER_DOWN (-1). Which is not useful. LDAP_TIMEOUT (-5) is returned only if the client do not receive the response in time.

Comment 12 Jan Vcelak 2012-04-25 14:21:35 UTC
I was trying to come with some solution. Jakub, is there some workaround you are using now?

Just an idea: SSSD establishes the connection itself, right? What about storing some flag after successful binding to the server that the server works. And when the connection is dropped try to reconnect if this flag is set.

Maybe the information about dropped connection can be obtained somehow from sockbuf associated with the handle. I haven't tried yet.

Comment 14 Jakub Hrozek 2012-05-15 14:31:41 UTC
(In reply to comment #12)
> I was trying to come with some solution. Jakub, is there some workaround you
> are using now?
> 
> Just an idea: SSSD establishes the connection itself, right? What about storing
> some flag after successful binding to the server that the server works. And
> when the connection is dropped try to reconnect if this flag is set.
> 
> Maybe the information about dropped connection can be obtained somehow from
> sockbuf associated with the handle. I haven't tried yet.

Yes, we have a list of recoverable errors after which we retry to connection attempt - we retry each server in the fail over list unless we receive a fatal error such as ENOMEM.

Comment 15 RHEL Program Management 2012-07-10 08:29:53 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 16 RHEL Program Management 2012-07-11 01:44:48 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 17 RHEL Program Management 2012-12-14 08:27:49 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.