Bug 995496 - [Windows C++ client] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
Summary: [Windows C++ client] An operation on a socket could not be performed because ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-sdk
Version: 2.3
Hardware: All
OS: Windows
high
high
Target Milestone: 3.1
: ---
Assignee: Cliff Jansen
QA Contact: Petra Svobodová
URL:
Whiteboard:
Depends On:
Blocks: 785156
TreeView+ depends on / blocked
 
Reported: 2013-08-09 14:23 UTC by Pavel Moravec
Modified: 2019-05-20 11:05 UTC (History)
4 users (show)

Fixed In Version: qpid-cpp-0.30-4
Doc Type: Bug Fix
Doc Text:
It was discovered that the Windows C++ client would randomly drop SSL connections while reporting a non-existent resource failure: `An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full`. This depended on timing factors and overall network traffic. qpid-cpp was using too many buffers concurrently and reserving available buffers unnecessarily. The Windows C++ client now uses one less buffer for accumulating AMQP frames from encrypted network traffic, and uses all buffers when needed. As a side effect, the qpid-ccp I/O layer now consumes between 64KB and 128KB less memory per connection on all platforms.
Clone Of:
Environment:
Last Closed: 2015-04-14 13:46:53 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Apache JIRA QPID-5033 None None None Never
Red Hat Product Errata RHEA-2015:0805 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 3.1 Release 2015-04-14 17:45:54 UTC

Description Pavel Moravec 2013-08-09 14:23:56 UTC
Description of problem:
When receiving a large amounts of messages over SSL using a receiver prefetch, the clients fails with an exception "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full". This exception seems to originate from the SslAsynchIO class, method sslDataIn.


Version-Release number of selected component (if applicable):
any (e.g. MRG 2.3 but also in qpid 0.22)


How reproducible:
100%


Steps to Reproduce:
1) Create a large queue on a broker (C++ / Linux)
2) Start feeding messages into the queue using C++/Linux program (in my case I used approximately 1kB messages)
3) Connect with a receiver (C++/Windows) using SSL and prefetch 1000 (no client authentication, I used username & password)
4) Wait few seconds to see the error in the receiver

Particular reproducer program: see https://issues.apache.org/jira/secure/attachment/12595257/client.cpp.


Actual results:
Receiver stucks and logs:

debug Exception constructed: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.  (C:\some\path\source\qpid\cpp\src\qpid\sys\windows\SslAsynchIO.cpp:350)


Expected results:
No stuck consumer, no error.


Additional info:
1) Decreasing the capacity seems to improve the frequency with which the problem appears. However with 1MB messages, even capacity 1 doesn't seem to work.
2) Attempting to reproduce in-house, we aimed in the client stuck but without the error seen (could be logging issue, though).
3) Increasing the BufferCount value in AsynchIO.h from 4 to 5 seems to solve the problem - at least in the terms that the error doesn't reproduce anymore.

Comment 1 Pavel Moravec 2013-08-09 14:33:13 UTC
See JIRA QPID-5033

Comment 3 Cliff Jansen 2014-09-09 13:39:39 UTC
Jira fixed upstream with detailed comments on nature of fix available.

  https://issues.apache.org/jira/browse/QPID-5033

Downstream patch available in branch:

  0.22-mrg-cjansen-bz995496

Comment 4 Pavel Moravec 2014-10-09 10:06:41 UTC
Justin, I see the BZ has flag mrg-2.3.x+ : does it mean the fix will be backported both to 0.30-* (MRG-M 3.1) and _also_ to 0.18-* (MRG 2.5.*)?

(customer is asking around this)

Comment 5 Justin Ross 2014-10-09 11:14:49 UTC
Pavel, I don't know why this has 2.3.x.  That's got to be wrong.  (Remember, don't set multiple version flags!)  This is set to appear in our 3.1 builds (coming soon).

If you want it backported, you need to clone this bug and raise it for 2.5.x.  We're not opposed, just need to look at the scope of the change.

(In reply to Pavel Moravec from comment #4)
> Justin, I see the BZ has flag mrg-2.3.x+ : does it mean the fix will be
> backported both to 0.30-* (MRG-M 3.1) and _also_ to 0.18-* (MRG 2.5.*)?
> 
> (customer is asking around this)

Comment 7 Petra Svobodová 2015-01-16 08:56:26 UTC
The exception did not appear in the clients' logs anymore.

Verified on Rhel6.6-i686 and Rhel6.6-x86_64 (on broker side) and clients on Windows 7-x86 and x64, Windows 8-x86 and 64, Windows Server2008-x64 and R2 and Windows Server2012 R2 with packages qpid-cpp-win-3.30.5.1-1 for MS Visual Studio 2008 and 2010 and qpid-cpp-0.30-6.el6.

--> VERIFIED

Comment 8 Petra Svobodová 2015-01-18 20:01:11 UTC
Retested also with .NET client; this issue did not occur.

Comment 12 errata-xmlrpc 2015-04-14 13:46:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0805.html


Note You need to log in before you can comment on or make changes to this bug.