Bug 995496

Summary: [Windows C++ client] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
Product: Red Hat Enterprise MRG Reporter: Pavel Moravec <pmoravec>
Component: qpid-sdkAssignee: Cliff Jansen <cjansen>
Status: CLOSED ERRATA QA Contact: Petra Svobodová <psvobodo>
Severity: high Docs Contact:
Priority: high    
Version: 2.3CC: cjansen, iboverma, jross, psvobodo
Target Milestone: 3.1Keywords: Patch
Target Release: ---   
Hardware: All   
OS: Windows   
Whiteboard:
Fixed In Version: qpid-cpp-0.30-4 Doc Type: Bug Fix
Doc Text:
It was discovered that the Windows C++ client would randomly drop SSL connections while reporting a non-existent resource failure: `An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full`. This depended on timing factors and overall network traffic. qpid-cpp was using too many buffers concurrently and reserving available buffers unnecessarily. The Windows C++ client now uses one less buffer for accumulating AMQP frames from encrypted network traffic, and uses all buffers when needed. As a side effect, the qpid-ccp I/O layer now consumes between 64KB and 128KB less memory per connection on all platforms.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-04-14 13:46:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 785156    

Description Pavel Moravec 2013-08-09 14:23:56 UTC
Description of problem:
When receiving a large amounts of messages over SSL using a receiver prefetch, the clients fails with an exception "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full". This exception seems to originate from the SslAsynchIO class, method sslDataIn.


Version-Release number of selected component (if applicable):
any (e.g. MRG 2.3 but also in qpid 0.22)


How reproducible:
100%


Steps to Reproduce:
1) Create a large queue on a broker (C++ / Linux)
2) Start feeding messages into the queue using C++/Linux program (in my case I used approximately 1kB messages)
3) Connect with a receiver (C++/Windows) using SSL and prefetch 1000 (no client authentication, I used username & password)
4) Wait few seconds to see the error in the receiver

Particular reproducer program: see https://issues.apache.org/jira/secure/attachment/12595257/client.cpp.


Actual results:
Receiver stucks and logs:

debug Exception constructed: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.  (C:\some\path\source\qpid\cpp\src\qpid\sys\windows\SslAsynchIO.cpp:350)


Expected results:
No stuck consumer, no error.


Additional info:
1) Decreasing the capacity seems to improve the frequency with which the problem appears. However with 1MB messages, even capacity 1 doesn't seem to work.
2) Attempting to reproduce in-house, we aimed in the client stuck but without the error seen (could be logging issue, though).
3) Increasing the BufferCount value in AsynchIO.h from 4 to 5 seems to solve the problem - at least in the terms that the error doesn't reproduce anymore.

Comment 1 Pavel Moravec 2013-08-09 14:33:13 UTC
See JIRA QPID-5033

Comment 3 Cliff Jansen 2014-09-09 13:39:39 UTC
Jira fixed upstream with detailed comments on nature of fix available.

  https://issues.apache.org/jira/browse/QPID-5033

Downstream patch available in branch:

  0.22-mrg-cjansen-bz995496

Comment 4 Pavel Moravec 2014-10-09 10:06:41 UTC
Justin, I see the BZ has flag mrg-2.3.x+ : does it mean the fix will be backported both to 0.30-* (MRG-M 3.1) and _also_ to 0.18-* (MRG 2.5.*)?

(customer is asking around this)

Comment 5 Justin Ross 2014-10-09 11:14:49 UTC
Pavel, I don't know why this has 2.3.x.  That's got to be wrong.  (Remember, don't set multiple version flags!)  This is set to appear in our 3.1 builds (coming soon).

If you want it backported, you need to clone this bug and raise it for 2.5.x.  We're not opposed, just need to look at the scope of the change.

(In reply to Pavel Moravec from comment #4)
> Justin, I see the BZ has flag mrg-2.3.x+ : does it mean the fix will be
> backported both to 0.30-* (MRG-M 3.1) and _also_ to 0.18-* (MRG 2.5.*)?
> 
> (customer is asking around this)

Comment 7 Petra Svobodová 2015-01-16 08:56:26 UTC
The exception did not appear in the clients' logs anymore.

Verified on Rhel6.6-i686 and Rhel6.6-x86_64 (on broker side) and clients on Windows 7-x86 and x64, Windows 8-x86 and 64, Windows Server2008-x64 and R2 and Windows Server2012 R2 with packages qpid-cpp-win-3.30.5.1-1 for MS Visual Studio 2008 and 2010 and qpid-cpp-0.30-6.el6.

--> VERIFIED

Comment 8 Petra Svobodová 2015-01-18 20:01:11 UTC
Retested also with .NET client; this issue did not occur.

Comment 12 errata-xmlrpc 2015-04-14 13:46:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0805.html