1717430 – Excessive memory usage when running out of descriptors

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1717430 - Excessive memory usage when running out of descriptors

Summary: Excessive memory usage when running out of descriptors

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	squid
Sub Component:
Version:	7.6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Luboš Uhliarik
QA Contact:	Branislav Náter
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1716962 1754591
TreeView+	depends on / blocked

Reported:	2019-06-05 13:13 UTC by Renaud Métrich
Modified:	2020-03-31 19:48 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-31 19:47:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2020:1068	0	None	None	None	2020-03-31 19:48:15 UTC

Description Renaud Métrich 2019-06-05 13:13:00 UTC

Description of problem:

NEC reported that a customer saw squid spinning on the CPU until it was restarted. The coredump taken while spinning shows that the customer hit a known issue fixed by PR 291 (https://github.com/squid-cache/squid/pull/291):

Coredump analysis:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) bt
#0  0x00005622241ee742 in check (aLine=429, this=0x562225579cc0) at cbdata.cc:106
#1  cbdataInternalUnlock (p=0x562225579cd8) at cbdata.cc:429
#2  0x000056222447a65f in clear (this=0x7f93e4e90440) at ../../src/base/CbcPointer.h:127
#3  CbcPointer<Comm::TcpAcceptor>::operator= (this=0x7f93e4e90440, d=...)
    at ../../src/base/CbcPointer.h:112
#4  0x0000562224479ffc in __copy_m<CbcPointer<Comm::TcpAcceptor>*, CbcPointer<Comm::TcpAcceptor>*> (
    __result=<optimized out>, __last=<optimized out>, __first=<optimized out>)
    at /usr/include/c++/4.8.2/bits/stl_algobase.h:354
#5  __copy_move_a<true, CbcPointer<Comm::TcpAcceptor>*, CbcPointer<Comm::TcpAcceptor>*> (
    __result=<optimized out>, __last=<optimized out>, __first=<optimized out>)
    at /usr/include/c++/4.8.2/bits/stl_algobase.h:390
#6  __copy_move_a2<true, __gnu_cxx::__normal_iterator<CbcPointer<Comm::TcpAcceptor>*, std::vector<CbcPointer<Comm::TcpAcceptor> > >, __gnu_cxx::__normal_iterator<CbcPointer<Comm::TcpAcceptor>*, std::vector<CbcPointer<Comm::TcpAcceptor> > > > (__result=..., __last=..., __first=...)
    at /usr/include/c++/4.8.2/bits/stl_algobase.h:428
#7  move<__gnu_cxx::__normal_iterator<CbcPointer<Comm::TcpAcceptor>*, std::vector<CbcPointer<Comm::TcpAcceptor> > >, __gnu_cxx::__normal_iterator<CbcPointer<Comm::TcpAcceptor>*, std::vector<CbcPointer<Comm::TcpAcceptor> > > > (__result=..., __last=..., __first=...)
    at /usr/include/c++/4.8.2/bits/stl_algobase.h:492
#8  erase (__position=..., this=0x5622249ead30 <Comm::AcceptLimiter::Instance_>)
    at /usr/include/c++/4.8.2/bits/vector.tcc:138
#9  Comm::AcceptLimiter::kick (this=0x5622249ead30 <Comm::AcceptLimiter::Instance_>)
    at AcceptLimiter.cc:59
#10 0x00005622243dab67 in AsyncCall::make (this=0x5622259a9170) at AsyncCall.cc:40
#11 0x00005622243def32 in AsyncCallQueue::fireNext (this=this@entry=0x5622251718e0)
    at AsyncCallQueue.cc:56
#12 0x00005622243df380 in AsyncCallQueue::fire (this=0x5622251718e0) at AsyncCallQueue.cc:42
#13 0x0000562224260b09 in dispatchCalls (this=0x7ffec9497e30) at EventLoop.cc:143
#14 EventLoop::runOnce (this=this@entry=0x7ffec9497e30) at EventLoop.cc:120
#15 0x0000562224260bf8 in EventLoop::run (this=this@entry=0x7ffec9497e30) at EventLoop.cc:82
#16 0x00005622242cbdac in SquidMain (argc=<optimized out>, argv=<optimized out>) at main.cc:1539
#17 0x00005622241c1f5b in SquidMainSafe (argv=<optimized out>, argc=<optimized out>) at main.cc:1263
#18 main (argc=<optimized out>, argv=<optimized out>) at main.cc:1256

(gdb) f 9

(gdb) p deferred_
$2 = std::vector of length 2114119, capacity 4194304 = {{cbc = 0x562225579cd8, lock = 0x562225579cd8}, {
    cbc = 0x562225579cd8, lock = 0x562225579cd8}, {cbc = 0x562225579cd8, lock = 0x562225579cd8}, {
    cbc = 0x562225579cd8, lock = 0x562225579cd8}, {cbc = 0x562225579cd8, lock = 0x562225579cd8}, {
    cbc = 0x562225579cd8, lock = 0x562225579cd8}, {cbc = 0x562225579cd8, lock = 0x562225579cd8}, {
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

It looks like squid is stuck in a loop deleting the same entry again and again (2114119 times), causing the high CPU usage (but also memory increase).

The corresponding code is shown below:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 47 void
 48 Comm::AcceptLimiter::kick()
 49 {
 50     // TODO: this could be optimized further with an iterator to search
 51     //       looking for first non-NULL, followed by dumping the first N
 52     //       with only one shift()/pop_front operation
 53     //  OR, by reimplementing as a list instead of Vector.
 54 
 55     debugs(5, 5, "size=" << deferred_.size());
 56     while (deferred_.size() > 0 && fdNFree() >= RESERVED_FD) {
 57         /* NP: shift() is equivalent to pop_front(). Giving us a FIFO queue. */
 58         TcpAcceptor::Pointer temp = deferred_.front();
 59         deferred_.erase(deferred_.begin());
 60         if (temp.valid()) {
 61             debugs(5, 5, "doing one.");
 62             -- temp->isLimited;
 63             temp->acceptNext();
 64             break;
 65         }
 66     }
 67 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


Version-Release number of selected component (if applicable):

squid-3.5.20-xxx


How reproducible:

Always (xmalloc and spin, but cannot get exact same backtrace for now)


Steps to Reproduce:

1. Limit the number of file descriptors to ease reproducing

  # systemctl edit squid
  >> editor opens
  [Service]
  LimitNOFILE=1024
  >> save and exit

2. Start squid

  # systemctl start squid

3. Connect to squid in loop until no FD is available

  # for i in $(seq 1 1000); do ncat --recv-only localhost 3128 & done

Actual results:

squid starts spinning and eats memory after some time. Journal shows this, then squid dies:

(squid-1)[3379]: xmalloc: Unable to allocate 2147483648 bytes!


Expected results:

No issue


Additional info:

Customer was not seeing squid dying, but only spinning and eating memory.

Comment 10 errata-xmlrpc 2020-03-31 19:47:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1068

Note You need to log in before you can comment on or make changes to this bug.