Bug 768137

Summary: squid consumes 100% CPU looping on read()
Product: Red Hat Enterprise Linux 6 Reporter: Benjamin Gilbert <bgilbert>
Component: squidAssignee: Michal Luscon <mluscon>
Status: CLOSED CURRENTRELEASE QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: prc
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 12:29:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Benjamin Gilbert 2011-12-15 20:58:56 UTC
Description of problem:
squid sometimes begins consuming 100% CPU and continues until killed.  While doing this, it responds normally to requests.

Version-Release number of selected component (if applicable):
3.1.10-1.el6_2.1

How reproducible:
Has occurred several times on multiple machines, but no reproducer has been found.

Additional info:
stracing an affected squid process produces:

epoll_wait(6, {{EPOLLIN, {u32=19, u64=19}}}, 16384, 205) = 1
read(21, 0x7fff1135c210, 256)           = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=19, u64=19}}}, 16384, 205) = 1
read(21, 0x7fff1135c210, 256)           = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=19, u64=19}}}, 16384, 205) = 1
read(21, 0x7fff1135c210, 256)           = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=19, u64=19}}}, 16384, 205) = 1
read(21, 0x7fff1135c210, 256)           = -1 EAGAIN (Resource temporarily unavailable)
[...]

fd 21 is a pipe:

# ls -l /proc/32128/fd
total 0
lrwx------ 1 squid squid 64 Dec 15 15:15 0 -> /dev/null
lrwx------ 1 squid squid 64 Dec 15 15:15 1 -> /dev/null
l-wx------ 1 squid squid 64 Dec 15 15:15 10 -> /usr7/squid/swap.state
l-wx------ 1 squid squid 64 Dec 15 15:15 12 -> pipe:[169475]
lrwx------ 1 squid squid 64 Dec 15 15:15 15 -> socket:[169678]
lr-x------ 1 squid squid 64 Dec 15 15:15 19 -> pipe:[351636]
lrwx------ 1 squid squid 64 Dec 15 15:15 2 -> /dev/null
l-wx------ 1 squid squid 64 Dec 15 15:15 20 -> pipe:[351636]
lr-x------ 1 squid squid 64 Dec 15 15:15 21 -> pipe:[351637]
l-wx------ 1 squid squid 64 Dec 15 15:15 22 -> pipe:[351637]
lrwx------ 1 squid squid 64 Dec 15 15:15 3 -> /var/log/squid/cache.log-20111211 (deleted)
lrwx------ 1 squid squid 64 Dec 15 15:15 4 -> /dev/null
lrwx------ 1 squid squid 64 Dec 15 15:15 5 -> /var/log/squid/cache.log
lrwx------ 1 squid squid 64 Dec 15 15:15 6 -> anon_inode:[eventpoll]
lrwx------ 1 squid squid 64 Dec 15 15:15 7 -> socket:[169469]
lrwx------ 1 squid squid 64 Dec 15 15:15 8 -> socket:[169470]
lr-x------ 1 squid squid 64 Dec 15 15:15 9 -> pipe:[169474]

gdb backtraces:
#0  0x00007f8c76f94d03 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f8c79706cf2 in comm_select (msec=<value optimized out>) at comm_epoll.cc:266
#2  0x00007f8c797c6c8e in CommSelectEngine::checkEvents (this=<value optimized out>, timeout=<value optimized out>) at comm.cc:2683
#3  0x00007f8c7971cda4 in EventLoop::checkEngine (this=0x7fff977f6fa0, engine=0x7fff977f7010, primary=<value optimized out>)
    at EventLoop.cc:50
#4  0x00007f8c7971cf1d in EventLoop::runOnce (this=0x7fff977f6fa0) at EventLoop.cc:124
#5  0x00007f8c7971d008 in EventLoop::run (this=0x7fff977f6fa0) at EventLoop.cc:94
#6  0x00007f8c7976aa25 in SquidMain (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1399
#7  0x00007f8c7976aff6 in SquidMainSafe (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1159
#8  main (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1151

#0  0x00007f8c78fcc4ed in read () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f8c797214ec in read (fd=<value optimized out>, buf=<value optimized out>, len=<value optimized out>)
    at /usr/include/bits/unistd.h:45
#2  default_read_method (fd=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at fd.cc:157
#3  0x00007f8c797bde66 in CommIO::FlushPipe () at comm.cc:2395
#4  0x00007f8c797be408 in CommIO::NULLFDHandler (fd=19, data=<value optimized out>) at comm.cc:2401
#5  0x00007f8c79706e16 in comm_select (msec=<value optimized out>) at comm_epoll.cc:307
#6  0x00007f8c797c6c8e in CommSelectEngine::checkEvents (this=<value optimized out>, timeout=<value optimized out>) at comm.cc:2683
#7  0x00007f8c7971cda4 in EventLoop::checkEngine (this=0x7fff977f6fa0, engine=0x7fff977f7010, primary=<value optimized out>)
    at EventLoop.cc:50
#8  0x00007f8c7971cf1d in EventLoop::runOnce (this=0x7fff977f6fa0) at EventLoop.cc:124
#9  0x00007f8c7971d008 in EventLoop::run (this=0x7fff977f6fa0) at EventLoop.cc:94
#10 0x00007f8c7976aa25 in SquidMain (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1399
#11 0x00007f8c7976aff6 in SquidMainSafe (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1159
#12 main (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1151

squid.conf:

acl localhost src 127.0.0.1/32 ::1
http_access allow localhost
http_access deny all
http_port localhost:3128
access_log none
cache_dir aufs /usr7/squid 225000 16 256
coredump_dir /var/spool/squid
hosts_file /etc/squid/hosts
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

Comment 2 Jiri Skala 2012-01-12 16:11:56 UTC
(In reply to comment #0)
> #0  0x00007f8c78fcc4ed in read () at ../sysdeps/unix/syscall-template.S:82
> #1  0x00007f8c797214ec in read (fd=<value optimized out>, buf=<value optimized
> out>, len=<value optimized out>)
>     at /usr/include/bits/unistd.h:45
> #2  default_read_method (fd=<value optimized out>, buf=<value optimized out>,
> len=<value optimized out>) at fd.cc:157
> 
> squid.conf:
> 
> cache_dir aufs /usr7/squid 225000 16 256

As was described there are difficulties with read operation. So I'd like to ask about /usr7? Is it a mounting point?

Thanks, Jiri

Comment 3 Benjamin Gilbert 2012-01-12 18:24:45 UTC
/usr7 is a mountpoint, yes.

Comment 4 Jiri Skala 2012-01-13 07:32:51 UTC
How is /usr7 mounted? Did you test another location for cache_dir option? I guess it could be the source of the issue.

Comment 5 Benjamin Gilbert 2012-01-13 19:49:27 UTC
/usr7 is ext2 with no special mount options.  Since the file descriptor being read is a pipe rather than a regular file, the filesystem is likely not the issue.

Comment 6 Jiri Skala 2012-01-17 09:42:17 UTC
You've mentioned 'Has occurred several times on multiple machines...'. What about comparison these machines - setting, squid conf?

How often do you observe 100% CPU consumption? Does it fix only service restart?

/usr7 inquiry - please could you test another cache_dir location? What about this parameter on different machines?

Comment 7 Benjamin Gilbert 2012-01-19 00:41:24 UTC
The machines are all identically configured.  Once the CPU consumption starts, it continues until squid is restarted.

I moved the cache_dir to /var/cache/squid (ext4) and was able to reproduce the problem almost immediately.

I found <http://bugs.squid-cache.org/show_bug.cgi?id=3189>.  Comments 3 and 5 (not the rest of the bug) look exactly like what I'm seeing.

Comment 9 Jiri Skala 2012-02-14 08:57:46 UTC
I've tested squid a couple of days with automatic requests to generate adequate load. I've used recommendation 'how to configure squid to fail' from Squid's BZ#3189 too. 

Unfortunately I was not able to reproduce this issue. I guess there must be something unique in hardware or software configuration what is behind originating this unwanted cpu load. Please, try to put more details about your system (private mail is also fine for me if you worry about security).

Comment 10 Suzanne Logcher 2012-02-14 23:24:44 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 11 Benjamin Gilbert 2012-02-17 19:57:22 UTC
The request load originates from an in-house multithreaded application running on localhost, and the remote HTTP server is another in-house application.  squid is not listening on any external address.

After some effort, I have been unable to reliably reproduce the problem.  If it recurs, I can attach gdb to the squid process and run some commands for you if you'd like.

Comment 12 Michal Luscon 2013-04-13 13:37:55 UTC
Are you still observing high CPU utilization with latest version of squid(3.1.10-16)?

Comment 13 Benjamin Gilbert 2013-04-15 19:43:06 UTC
I haven't seen any recurrences recently.

Comment 14 Michal Luscon 2013-07-24 12:29:45 UTC
Ok, I suppose that this was fixed meanwhile. Feel free to reopen when the high utilization occur again.