Hide Forgot
Right now, ldap libraries return a generic error when a ldap server closes a connection (which has been idle for too long for example). It would be nice, if the ldap library could return a specific error in this case so we know what has happened. It is replicable using Win 2008 based ldap servers, for example.
Specifically, it appears that we get back a response, but when we call ldap_result(), we get a return code of -1. There is no additional information in the diagnostic message to explain what has actually happened. It would be better if we could get some information on what the failure actually was, so we could know whether it is safe to retry.
Just a shot in the dark.. Struct ldap has a member 'ld_errno', which should contain more info about what happened. There are predefined values for that, check 'ldap.h'. There are LDAP_SERVER_DOWN or LDAP_TIMEOUT among others. Could that be a possible solution?
(In reply to comment #4) > Just a shot in the dark.. Struct ldap has a member 'ld_errno', which should > contain more info about what happened. There are predefined values for that, > check 'ldap.h'. There are LDAP_SERVER_DOWN or LDAP_TIMEOUT among others. > > Could that be a possible solution? 'struct ldap' is privately defined. The SSSD source can't look into its members without a helper routine.
My point was that the functionality you propose is already there. You only have to use ldap_get_option like so: ldap_get_option(ldap, LDAP_OPT_RESULT_CODE, &res); Then you can compare the value of res with values defined in ldap.h - section commented as 'API Error Codes'. It's only an unlucky coincidence (or a bad design error) that the value of LDAP_SERVER_DOWN is -1, as is the value for unknown error returned in a different context. Simple test results in: (... server killed) ldap bind failed: Can't contact LDAP server rc: -1 (== LDAP_SERVER_DOWN) (... server stopped to simulate a timeout) ldap bind failed: Timed out rc: -5 (== LDAP_TIMEOUT)
(In reply to comment #6) > It's only an unlucky coincidence (or a bad design error) that the value of > LDAP_SERVER_DOWN is -1, as is the value for unknown error returned in a > different context. > Sorry, I still don't get it. When you call ldap_result() followed by a ldap_get_option(ld, LDAP_OPT_RESULT_CODE, &err) how do you distinguish between two meanings of -1? How does ldap_err2string() do that?
I was playing with ldap_result and LDAP_OPT_RESULT_CODE and I'm still not sure this meets all our requirements. To test, I set up an openldap server and set the olcIdleTimeout to 5 seconds. When a subsequent request comes in after 5 seconds, ldap_err2string only reported: "Can't contact LDAP server". The problem is that Can't contact LDAP server is not specific and we can't decide whether to retry the same server or more to the next configured server. Our result hadling looks somewhat like this: if (ldap_result() == -1) { ldap_get_option(ld, LDAP_OPT_RESULT_CODE, &err); log_error("%s\n", ldap_err2string(ret)); } I also added ldap_get_option(ld, LDAP_OPT_DIAGNOSTIC_MESSAGE, &msg) to get extra information, but that only returned NULL in this case.
if (ldap_result() == -1) { ldap_get_option(ld, LDAP_OPT_RESULT_CODE, &err); log_error("%s\n", ldap_err2string(err)); } If err == -5, which is the value of LDAP_TIMEOUT, ldap_err2string(err) results in "Timed out". Notice the usage of err in ldap_err2string.
> Notice the usage of err in ldap_err2string. Forgot to emphasize it's the same 'err' you get via ldap_get_option.
Jakub is right. When the server cuts off the connection, LDAP_OPT_RESULT_CODE will return LDAP_SERVER_DOWN (-1). Which is not useful. LDAP_TIMEOUT (-5) is returned only if the client do not receive the response in time.
I was trying to come with some solution. Jakub, is there some workaround you are using now? Just an idea: SSSD establishes the connection itself, right? What about storing some flag after successful binding to the server that the server works. And when the connection is dropped try to reconnect if this flag is set. Maybe the information about dropped connection can be obtained somehow from sockbuf associated with the handle. I haven't tried yet.
(In reply to comment #12) > I was trying to come with some solution. Jakub, is there some workaround you > are using now? > > Just an idea: SSSD establishes the connection itself, right? What about storing > some flag after successful binding to the server that the server works. And > when the connection is dropped try to reconnect if this flag is set. > > Maybe the information about dropped connection can be obtained somehow from > sockbuf associated with the handle. I haven't tried yet. Yes, we have a list of recoverable errors after which we retry to connection attempt - we retry each server in the fail over list unless we receive a fatal error such as ENOMEM.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.