Description of problem: This is split off BZ #146797.
This bug exists to track the non-critical memory consumption issues with sctp
(critical issues were fixed in BZ #146797, kernel 2.6.9-6.43, should be in RHEL4
Version-Release number of selected component (if applicable):
Update from IT:
In SIGTRAN's case, the smallest messages are at least 50 bytes. A realistic
message rate is probably 1000 messages/s in + 1000 messages/s out. We are
required to support over 2000 associations, but in reality currently there are
rarely more than 100 associations. This is lowmem we're talking about here and
there is always less than one GB of that to go around, that's the only static
limit there is.
The possibility that so many send buffers would fill up at the same time and
exhaust lowmem is totally theoretical at this point in time.
However, with 50 byte messages each socket uses 1.85 MB of lowmem to buffer
about 100KB of outbound data. That doesn't seem reasonable. One would only need
to fill the send buffer of 487 associations to exhaust 900MB of lowmem. That's
very uncomfortable considering we must support over 2000 concurrent
associations. The number of associations used will increase in the future as
server clusters and networks increase in size.
So this is not a problem in realistic situations yet, but it will be later on.
Created attachment 118890 [details]
patch to account for ulpevents in receive queue
This patch has been tested, and found to improve memory use slightly. Its not
enough by any stretch, but will likely be part of the final solution.
Created attachment 119047 [details]
updated patch to account for remaining sctp skb allocations
This is an improvement on my previous patch, and seems to clear up all the
missing accounting pieces for me.
Customer reports that latest patch provides correct accounting. I'll propose
this upstream later today.
Created attachment 119314 [details]
Updated patch taking upstream suggestions
This is an updated version of the patch taking into consideration some of the
suggestions provided by upstream. Its functionally equivalent, and provides
the same accounting as the previous patch, but uses the skb->desctructor to do
its work, which in my view is a better solution, it also consolidates receive
buffer accounting so it doesn't need to occur both in sctp_rcv and
sctp_ulpevent_set_owner, and cleans up an inadvertent double accounting error.
Created attachment 120128 [details]
latest upstream proposal patch
This is the latest version of the upstream patch. After going around several
times, we've come to the consensus that this is the best solution to the
immediate problem. There are some outstanding issues with receive window size
that still need to be hashed out, but they aren't pertinent to this problem,
and the issues aren't RFC violators, nor do they have a real performance
impact. This patch passes all my tests, and as soon as I have upstream
acceptance, I'll build a test kernel for you to confirm
Created attachment 120191 [details]
final upstream version of patch
This is the version of the patch that now has a commitment for upstream
inclusion from the sctp maintainer. Its identical to the previous patch, but
with a variable name change per request of the maintainer. I'm going to build
a kernel with this patch against the latest RHEL4 kernel for you to test with,
and post internally for inclusion if it fixes the problem for you (it should,
it passes all the test cases I've been using).
Created attachment 120943 [details]
new version of patch
The receive buffer accounting patch uncovered an skb leak in the establishment
of stream style sockets, which the upstream maintainer rolled into the
accounting patch. We should pick it up as well. Same patch as before with the
additional leak fixing bits.
Ok, customer reports this corrects the remaining memeory accounting issues.
Posting the above patch to rhkl
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.