Bug 784928

Summary: httpd (and php-ldap) fail to create new outgoing TLS sessions
Product: Red Hat Enterprise Linux 6 Reporter: George Shammas <gs1445>
Component: openldapAssignee: Jan Synacek <jsynacek>
Status: CLOSED CURRENTRELEASE QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: high Docs Contact:
Priority: high    
Version: 6.2CC: brian.keffer, ddumas, jsynacek, kdudka, ngupta, prc
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-20 18:48:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 782183, 836160, 840699    
Attachments:
Description Flags
php ldap test none

Description George Shammas 2012-01-26 16:59:03 UTC
Description of problem:
Random Apache threads stop being able to create TLS connections. This problem becomes very problematic when trying to contact an ldap server over ssl. And because its not all Apache threads, it intermittent, making users very frustrated.

Uping the php LDAP debug level (ldap_set_option(NULL, LDAP_OPT_DEBUG_LEVEL, 7);)  we can see the threads stop being able to create TLS sessions.

Sniffing traffic to and from the ldap server shows that each time a tcp connection is fully initiated, but in the case of a fail, a FIN is issued as soon as the connection is opened.

Another person seems to have the same problem, https://www.redhat.com/archives/rhelv6-list/2011-September/msg00023.html but the only response to that discussion is invalid. On my server having the problem, I launched 20 proccesses runing the same php script from the cli for 5 minutes staight, and not once did one fail. All the while apache threads still had trouble.

He links it to how many requests a thread has served is the cause of the problem.


Version-Release number of selected component (if applicable):
RHEL6.1 and RHEL6.2
RHEL6.0 and RHEL5.x Seem to be immune. 

How reproducible:
Very



Steps to Reproduce:
1. Install php-ldap and httpd
2. Try contacting a ldap server over ssl
2a. A freshly restarted httpd will take about 20 minutes before symptoms start.
3. Turn on php-ldap debug mode to see the usable error of failing to create tls sessions. With out doing this, the only error you get is failed to bind.
  


Actual results:
ldap_create
ldap_url_parse_ext(ldaps://redacted/)
ldap_bind_s
ldap_simple_bind_s
ldap_sasl_bind_s
ldap_sasl_bind
ldap_send_initial_request
ldap_new_connection 1 1 0
ldap_int_open_connection
ldap_connect_to_host: TCP redacted:636
ldap_new_socket: 14
ldap_prepare_socket: 14
ldap_connect_to_host: Trying redacted:636
ldap_pvt_connect: fd: 14 tm: -1 async: 0
TLS: error: could not initialize moznss security context - error -5925:The one-time function was previously called and failed. Its error code is no longer available
TLS: can't create ssl handle.
ldap_err2string



Expected results:
ldap_create
ldap_url_parse_ext(ldaps://redacted/)
ldap_bind_s
ldap_simple_bind_s
ldap_sasl_bind_s
ldap_sasl_bind
ldap_send_initial_request
ldap_new_connection 1 1 0
ldap_int_open_connection
ldap_connect_to_host: TCP redacted:636
ldap_new_socket: 13
ldap_prepare_socket: 13
ldap_connect_to_host: Trying redacted:636
ldap_pvt_connect: fd: 13 tm: -1 async: 0
TLS: certificate [CN=redacted,OU=ITS,O=New York University,L=New York,ST=New York,C=US] is valid
TLS certificate verification: subject: CN=redacted,OU=ITS,O=New York University,L=New York,ST=New York,C=US, issuer: CN=VeriSign Class 3 Secure Server CA - G2,OU=Terms of use at https://www.verisign.com/rpa (c)09,OU=VeriSign Trust Network,O="VeriSign, Inc.",C=US, cipher: RC4, security level: high, secret key bits: 128, total key bits: 128, cache hits: 0, cache misses: 0, cache not reusable: 0
ldap_open_defconn: successful
ldap_send_server_request
........
ldap_next_attribute
ldap_get_values_len
ldap_next_attribute
ldap_get_values_len
ldap_next_attribute
ldap_get_dn
ldap_msgfree
ldap_free_connection 1 1
ldap_send_unbind
ldap_free_connection: actually freed


Additional info:

Comment 2 George Shammas 2012-01-26 18:19:04 UTC
Rich Graves mailing list thread, which linked to, gave me two more testing ideas, both with interesting results. 

1. I adjusted the MaxRequestsPerChild from the default of 4000 to 200 and waited for the problem to occur. Then I stressed the server to see if when a new thread spawned it would take the problem with it. And it did. After about 10 seconds, all the old apache threads died and respawned and everything even at 400 requests a second no more threads would error.If I waited for the problem again (without restarted httpd) and the same result happened. 

Once a thread reached its maxed requests and respawned, the problem goes away. This leads me to believe that its a time based problem.


2. If I further lower MaxRequestsPerChild from 200 to 10. The problem is not noticeable. Threads respawn too quickly to get diseased by it. However the side effect of this is that the server's max requests per second dropped from 400 to 50. So not an ideal solution.

Comment 3 Joe Orton 2012-01-31 17:14:24 UTC
Given the indication this is a regression since 6.0, and the error message:

""
error: could not initialize moznss security context - error -5925:The
one-time function was previously called and failed. Its error code is no longer
available
""

I'd presume this is a problem with the switch to Mozilla NSS in openldap; re-assigning accordingly.  Can you provide a minimal repro case?

Comment 6 George Shammas 2012-02-01 15:16:59 UTC
Created attachment 558850 [details]
php ldap test

I have not been able to reproduce it outside of Apache with mod_php. Even apache with php-cgi works.

I have attached a simple php script, that avoids being cached, and errors when it doesn't work. Outputs as much data as it can, however the error that ldap returns is '', which isn't very useful.


Also this bug is actually a dup of
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=738456
which also has no solution.

Comment 8 Kamil Dudka 2012-04-02 07:35:42 UTC
(In reply to comment #6)
> Also this bug is actually a dup of
> https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=738456

Could you please check whether libcurl-7.19.7-26.el6_2.4 fixes this issue?

http://rhn.redhat.com/errata/RHBA-2012-0430.html

Comment 11 Jan Vcelak 2012-07-17 10:11:01 UTC
@George: ping

Comment 12 Jan Vcelak 2012-07-18 18:29:13 UTC
If not, please, can you also try with following scratch build:
http://people.redhat.com/jvcelak/bz707599/

Comment 16 Rich Graves 2013-03-19 21:34:10 UTC
Cleanup: possible duplicate bug https://bugzilla.redhat.com/show_bug.cgi?id=738456 is resolved.

Comment 17 Siddharth Nagar 2013-03-20 18:48:17 UTC
This is issue seems to be resolved as per Comment #16. Closing.