Bug 995496 - [Windows C++ client] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
[Windows C++ client] An operation on a socket could not be performed because ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-sdk (Show other bugs)
2.3
All Windows
high Severity high
: 3.1
: ---
Assigned To: Cliff Jansen
Petra Svobodová
: Patch
Depends On:
Blocks: 785156
  Show dependency treegraph
 
Reported: 2013-08-09 10:23 EDT by Pavel Moravec
Modified: 2015-04-14 09:46 EDT (History)
4 users (show)

See Also:
Fixed In Version: qpid-cpp-0.30-4
Doc Type: Bug Fix
Doc Text:
It was discovered that the Windows C++ client would randomly drop SSL connections while reporting a non-existent resource failure: `An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full`. This depended on timing factors and overall network traffic. qpid-cpp was using too many buffers concurrently and reserving available buffers unnecessarily. The Windows C++ client now uses one less buffer for accumulating AMQP frames from encrypted network traffic, and uses all buffers when needed. As a side effect, the qpid-ccp I/O layer now consumes between 64KB and 128KB less memory per connection on all platforms.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-04-14 09:46:53 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Apache JIRA QPID-5033 None None None Never

  None (edit)
Description Pavel Moravec 2013-08-09 10:23:56 EDT
Description of problem:
When receiving a large amounts of messages over SSL using a receiver prefetch, the clients fails with an exception "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full". This exception seems to originate from the SslAsynchIO class, method sslDataIn.


Version-Release number of selected component (if applicable):
any (e.g. MRG 2.3 but also in qpid 0.22)


How reproducible:
100%


Steps to Reproduce:
1) Create a large queue on a broker (C++ / Linux)
2) Start feeding messages into the queue using C++/Linux program (in my case I used approximately 1kB messages)
3) Connect with a receiver (C++/Windows) using SSL and prefetch 1000 (no client authentication, I used username & password)
4) Wait few seconds to see the error in the receiver

Particular reproducer program: see https://issues.apache.org/jira/secure/attachment/12595257/client.cpp.


Actual results:
Receiver stucks and logs:

debug Exception constructed: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.  (C:\some\path\source\qpid\cpp\src\qpid\sys\windows\SslAsynchIO.cpp:350)


Expected results:
No stuck consumer, no error.


Additional info:
1) Decreasing the capacity seems to improve the frequency with which the problem appears. However with 1MB messages, even capacity 1 doesn't seem to work.
2) Attempting to reproduce in-house, we aimed in the client stuck but without the error seen (could be logging issue, though).
3) Increasing the BufferCount value in AsynchIO.h from 4 to 5 seems to solve the problem - at least in the terms that the error doesn't reproduce anymore.
Comment 1 Pavel Moravec 2013-08-09 10:33:13 EDT
See JIRA QPID-5033
Comment 3 Cliff Jansen 2014-09-09 09:39:39 EDT
Jira fixed upstream with detailed comments on nature of fix available.

  https://issues.apache.org/jira/browse/QPID-5033

Downstream patch available in branch:

  0.22-mrg-cjansen-bz995496
Comment 4 Pavel Moravec 2014-10-09 06:06:41 EDT
Justin, I see the BZ has flag mrg-2.3.x+ : does it mean the fix will be backported both to 0.30-* (MRG-M 3.1) and _also_ to 0.18-* (MRG 2.5.*)?

(customer is asking around this)
Comment 5 Justin Ross 2014-10-09 07:14:49 EDT
Pavel, I don't know why this has 2.3.x.  That's got to be wrong.  (Remember, don't set multiple version flags!)  This is set to appear in our 3.1 builds (coming soon).

If you want it backported, you need to clone this bug and raise it for 2.5.x.  We're not opposed, just need to look at the scope of the change.

(In reply to Pavel Moravec from comment #4)
> Justin, I see the BZ has flag mrg-2.3.x+ : does it mean the fix will be
> backported both to 0.30-* (MRG-M 3.1) and _also_ to 0.18-* (MRG 2.5.*)?
> 
> (customer is asking around this)
Comment 7 Petra Svobodová 2015-01-16 03:56:26 EST
The exception did not appear in the clients' logs anymore.

Verified on Rhel6.6-i686 and Rhel6.6-x86_64 (on broker side) and clients on Windows 7-x86 and x64, Windows 8-x86 and 64, Windows Server2008-x64 and R2 and Windows Server2012 R2 with packages qpid-cpp-win-3.30.5.1-1 for MS Visual Studio 2008 and 2010 and qpid-cpp-0.30-6.el6.

--> VERIFIED
Comment 8 Petra Svobodová 2015-01-18 15:01:11 EST
Retested also with .NET client; this issue did not occur.
Comment 12 errata-xmlrpc 2015-04-14 09:46:53 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0805.html

Note You need to log in before you can comment on or make changes to this bug.