Bug 1717430

Summary:	Excessive memory usage when running out of descriptors
Product:	Red Hat Enterprise Linux 7	Reporter:	Renaud Métrich <rmetrich>
Component:	squid	Assignee:	Luboš Uhliarik <luhliari>
Status:	CLOSED ERRATA	QA Contact:	Branislav Náter <bnater>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.6	CC:	bgollahe, bnater, hmatsumo, jorton
Target Milestone:	rc	Keywords:	Patch, Triaged
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-03-31 19:47:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1716962, 1754591

Description Renaud Métrich 2019-06-05 13:13:00 UTC

Description of problem:

NEC reported that a customer saw squid spinning on the CPU until it was restarted. The coredump taken while spinning shows that the customer hit a known issue fixed by PR 291 (https://github.com/squid-cache/squid/pull/291):

Coredump analysis:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) bt
#0  0x00005622241ee742 in check (aLine=429, this=0x562225579cc0) at cbdata.cc:106
#1  cbdataInternalUnlock (p=0x562225579cd8) at cbdata.cc:429
#2  0x000056222447a65f in clear (this=0x7f93e4e90440) at ../../src/base/CbcPointer.h:127
#3  CbcPointer<Comm::TcpAcceptor>::operator= (this=0x7f93e4e90440, d=...)
    at ../../src/base/CbcPointer.h:112
#4  0x0000562224479ffc in __copy_m<CbcPointer<Comm::TcpAcceptor>*, CbcPointer<Comm::TcpAcceptor>*> (
    __result=<optimized out>, __last=<optimized out>, __first=<optimized out>)
    at /usr/include/c++/4.8.2/bits/stl_algobase.h:354
#5  __copy_move_a<true, CbcPointer<Comm::TcpAcceptor>*, CbcPointer<Comm::TcpAcceptor>*> (
    __result=<optimized out>, __last=<optimized out>, __first=<optimized out>)
    at /usr/include/c++/4.8.2/bits/stl_algobase.h:390
#6  __copy_move_a2<true, __gnu_cxx::__normal_iterator<CbcPointer<Comm::TcpAcceptor>*, std::vector<CbcPointer<Comm::TcpAcceptor> > >, __gnu_cxx::__normal_iterator<CbcPointer<Comm::TcpAcceptor>*, std::vector<CbcPointer<Comm::TcpAcceptor> > > > (__result=..., __last=..., __first=...)
    at /usr/include/c++/4.8.2/bits/stl_algobase.h:428
#7  move<__gnu_cxx::__normal_iterator<CbcPointer<Comm::TcpAcceptor>*, std::vector<CbcPointer<Comm::TcpAcceptor> > >, __gnu_cxx::__normal_iterator<CbcPointer<Comm::TcpAcceptor>*, std::vector<CbcPointer<Comm::TcpAcceptor> > > > (__result=..., __last=..., __first=...)
    at /usr/include/c++/4.8.2/bits/stl_algobase.h:492
#8  erase (__position=..., this=0x5622249ead30 <Comm::AcceptLimiter::Instance_>)
    at /usr/include/c++/4.8.2/bits/vector.tcc:138
#9  Comm::AcceptLimiter::kick (this=0x5622249ead30 <Comm::AcceptLimiter::Instance_>)
    at AcceptLimiter.cc:59
#10 0x00005622243dab67 in AsyncCall::make (this=0x5622259a9170) at AsyncCall.cc:40
#11 0x00005622243def32 in AsyncCallQueue::fireNext (this=this@entry=0x5622251718e0)
    at AsyncCallQueue.cc:56
#12 0x00005622243df380 in AsyncCallQueue::fire (this=0x5622251718e0) at AsyncCallQueue.cc:42
#13 0x0000562224260b09 in dispatchCalls (this=0x7ffec9497e30) at EventLoop.cc:143
#14 EventLoop::runOnce (this=this@entry=0x7ffec9497e30) at EventLoop.cc:120
#15 0x0000562224260bf8 in EventLoop::run (this=this@entry=0x7ffec9497e30) at EventLoop.cc:82
#16 0x00005622242cbdac in SquidMain (argc=<optimized out>, argv=<optimized out>) at main.cc:1539
#17 0x00005622241c1f5b in SquidMainSafe (argv=<optimized out>, argc=<optimized out>) at main.cc:1263
#18 main (argc=<optimized out>, argv=<optimized out>) at main.cc:1256

(gdb) f 9

(gdb) p deferred_
$2 = std::vector of length 2114119, capacity 4194304 = {{cbc = 0x562225579cd8, lock = 0x562225579cd8}, {
    cbc = 0x562225579cd8, lock = 0x562225579cd8}, {cbc = 0x562225579cd8, lock = 0x562225579cd8}, {
    cbc = 0x562225579cd8, lock = 0x562225579cd8}, {cbc = 0x562225579cd8, lock = 0x562225579cd8}, {
    cbc = 0x562225579cd8, lock = 0x562225579cd8}, {cbc = 0x562225579cd8, lock = 0x562225579cd8}, {
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

It looks like squid is stuck in a loop deleting the same entry again and again (2114119 times), causing the high CPU usage (but also memory increase).

The corresponding code is shown below:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 47 void
 48 Comm::AcceptLimiter::kick()
 49 {
 50     // TODO: this could be optimized further with an iterator to search
 51     //       looking for first non-NULL, followed by dumping the first N
 52     //       with only one shift()/pop_front operation
 53     //  OR, by reimplementing as a list instead of Vector.
 54 
 55     debugs(5, 5, "size=" << deferred_.size());
 56     while (deferred_.size() > 0 && fdNFree() >= RESERVED_FD) {
 57         /* NP: shift() is equivalent to pop_front(). Giving us a FIFO queue. */
 58         TcpAcceptor::Pointer temp = deferred_.front();
 59         deferred_.erase(deferred_.begin());
 60         if (temp.valid()) {
 61             debugs(5, 5, "doing one.");
 62             -- temp->isLimited;
 63             temp->acceptNext();
 64             break;
 65         }
 66     }
 67 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


Version-Release number of selected component (if applicable):

squid-3.5.20-xxx


How reproducible:

Always (xmalloc and spin, but cannot get exact same backtrace for now)


Steps to Reproduce:

1. Limit the number of file descriptors to ease reproducing

  # systemctl edit squid
  >> editor opens
  [Service]
  LimitNOFILE=1024
  >> save and exit

2. Start squid

  # systemctl start squid

3. Connect to squid in loop until no FD is available

  # for i in $(seq 1 1000); do ncat --recv-only localhost 3128 & done

Actual results:

squid starts spinning and eats memory after some time. Journal shows this, then squid dies:

(squid-1)[3379]: xmalloc: Unable to allocate 2147483648 bytes!


Expected results:

No issue


Additional info:

Customer was not seeing squid dying, but only spinning and eating memory.

Comment 10 errata-xmlrpc 2020-03-31 19:47:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1068