Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
We have a multi-threaded Java application which makes dozens of connections to the Amazon SQS web service to read messages every 15 minutes. Occasionally -- about once a day or so now -- one of the threads will hang, preventing the application from terminating and blocking subsequent runs until we find out about it and kill the process.
After taking some Java stack dumps while hung and some research we found that the Apache httpclient library had a bug where the socket timeout was ignored during the SSL handshake:
https://issues.apache.org/jira/browse/HTTPCLIENT-1478
This was reportedly fixed in version 4.3.6 of httpcomponents-client, but RHEL6 only has commons-httpclient 3.1 which is no longer supported. I don't see anything showing whether the fix was backported to the older library.
Version-Release number of selected component (if applicable):
3.1-0.9.el6_5
How reproducible:
Low -- roughly 0.03% probability
Steps to Reproduce:
1. Create a MultiThreadedHttpConnectionManager.
2. getConnectionWithTimeout (e.g. 60 seconds)
3. In the HttpConnectionParams, setConnectionTimeout(60 seconds), setSoTimeout(60 seconds), setBooleanParameter(STALE_CONNECTION_CHECK, true)
4. Create a TransparentPostMethod and fill it in with the required parameters for an Amazon SQS request. Use the AWSUtils library to sign the request.
5. Open a connection, then execute the request.
Actual results:
Thread has a small chance of hanging indefinitely in the following call (jstack dump):
Thread 15254: (state = IN_NATIVE)
- java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
- java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Compiled frame)
- java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Compiled frame)
- sun.security.ssl.InputRecord.readFully(java.io.InputStream, byte[], int, int) @bci=21, line=442 (Compiled frame)
- sun.security.ssl.InputRecord.read(java.io.InputStream, java.io.OutputStream) @bci=32, line=480 (Interpreted frame)
- sun.security.ssl.SSLSocketImpl.readRecord(sun.security.ssl.InputRecord, boolean) @bci=44, line=927 (Interpreted frame)
- sun.security.ssl.SSLSocketImpl.performInitialHandshake() @bci=84, line=1312 (Interpreted frame)
- sun.security.ssl.SSLSocketImpl.startHandshake(boolean) @bci=13, line=1339 (Interpreted frame)
- sun.security.ssl.SSLSocketImpl.getSession() @bci=10, line=2171 (Interpreted frame)
- org.apache.commons.httpclient.protocol.SSLProtocolSocketFactory.verifyHostName(java.lang.String, javax.net.ssl.SSLSocket) @bci=15, line=213 (Interpreted frame)
- org.apache.commons.httpclient.protocol.SSLProtocolSocketFactory.createSocket(java.lang.String, int, java.net.InetAddress, int, org.apache.commons.httpclient.params.HttpConnectionParams) @bci=88, line=159 (Interpreted frame)
- org.apache.commons.httpclient.HttpConnection.open() @bci=182, line=710 (Interpreted frame)
- org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open() @bci=11, line=1361 (Interpreted frame)
- com.boingo.cloud.aws.sqs.AmazonSimpleQueueServiceImplementation.pollMessagesInternal(int, org.apache.commons.httpclient.HttpClient, long) @bci=246 (Compiled frame)
- com.boingo.cloud.aws.sqs.AmazonSimpleQueueServiceImplementation.pollMessages(int, org.apache.commons.httpclient.HttpClient, long) @bci=57 (Interpreted frame)
- com.boingo.sqs.MessageReader.run() @bci=24, line=59 (Interpreted frame)
Expected results:
Connection times out after 60 seconds, throwing a SocketTimeoutException.
Additional info:
http://insidecoffe.blogspot.com/2011/12/when-timeout-fails-in-threadjoin.html
I’ve applied the patch locally and gave it a smoke test in our QA environment; we pushing it to production this afternoon. We should know in a few days whether it was effective.
The patched library has been in production for a week now, and our application has not hanged at all during that time. We're very happy with the result. Thank you for the quick response.
:)
Red Hat Enterprise Linux 6 reached end of Maintenance Support 1 phase. Therefore this vulnerability, due to low severity, is not going to be fixed. I'm closing this bug as NEXTRELEASE as the problem is already fixed in Red Hat Enterprise Linux 8.