Hide Forgot
Description of problem: squid sometimes begins consuming 100% CPU and continues until killed. While doing this, it responds normally to requests. Version-Release number of selected component (if applicable): 3.1.10-1.el6_2.1 How reproducible: Has occurred several times on multiple machines, but no reproducer has been found. Additional info: stracing an affected squid process produces: epoll_wait(6, {{EPOLLIN, {u32=19, u64=19}}}, 16384, 205) = 1 read(21, 0x7fff1135c210, 256) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=19, u64=19}}}, 16384, 205) = 1 read(21, 0x7fff1135c210, 256) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=19, u64=19}}}, 16384, 205) = 1 read(21, 0x7fff1135c210, 256) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=19, u64=19}}}, 16384, 205) = 1 read(21, 0x7fff1135c210, 256) = -1 EAGAIN (Resource temporarily unavailable) [...] fd 21 is a pipe: # ls -l /proc/32128/fd total 0 lrwx------ 1 squid squid 64 Dec 15 15:15 0 -> /dev/null lrwx------ 1 squid squid 64 Dec 15 15:15 1 -> /dev/null l-wx------ 1 squid squid 64 Dec 15 15:15 10 -> /usr7/squid/swap.state l-wx------ 1 squid squid 64 Dec 15 15:15 12 -> pipe:[169475] lrwx------ 1 squid squid 64 Dec 15 15:15 15 -> socket:[169678] lr-x------ 1 squid squid 64 Dec 15 15:15 19 -> pipe:[351636] lrwx------ 1 squid squid 64 Dec 15 15:15 2 -> /dev/null l-wx------ 1 squid squid 64 Dec 15 15:15 20 -> pipe:[351636] lr-x------ 1 squid squid 64 Dec 15 15:15 21 -> pipe:[351637] l-wx------ 1 squid squid 64 Dec 15 15:15 22 -> pipe:[351637] lrwx------ 1 squid squid 64 Dec 15 15:15 3 -> /var/log/squid/cache.log-20111211 (deleted) lrwx------ 1 squid squid 64 Dec 15 15:15 4 -> /dev/null lrwx------ 1 squid squid 64 Dec 15 15:15 5 -> /var/log/squid/cache.log lrwx------ 1 squid squid 64 Dec 15 15:15 6 -> anon_inode:[eventpoll] lrwx------ 1 squid squid 64 Dec 15 15:15 7 -> socket:[169469] lrwx------ 1 squid squid 64 Dec 15 15:15 8 -> socket:[169470] lr-x------ 1 squid squid 64 Dec 15 15:15 9 -> pipe:[169474] gdb backtraces: #0 0x00007f8c76f94d03 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007f8c79706cf2 in comm_select (msec=<value optimized out>) at comm_epoll.cc:266 #2 0x00007f8c797c6c8e in CommSelectEngine::checkEvents (this=<value optimized out>, timeout=<value optimized out>) at comm.cc:2683 #3 0x00007f8c7971cda4 in EventLoop::checkEngine (this=0x7fff977f6fa0, engine=0x7fff977f7010, primary=<value optimized out>) at EventLoop.cc:50 #4 0x00007f8c7971cf1d in EventLoop::runOnce (this=0x7fff977f6fa0) at EventLoop.cc:124 #5 0x00007f8c7971d008 in EventLoop::run (this=0x7fff977f6fa0) at EventLoop.cc:94 #6 0x00007f8c7976aa25 in SquidMain (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1399 #7 0x00007f8c7976aff6 in SquidMainSafe (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1159 #8 main (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1151 #0 0x00007f8c78fcc4ed in read () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007f8c797214ec in read (fd=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at /usr/include/bits/unistd.h:45 #2 default_read_method (fd=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at fd.cc:157 #3 0x00007f8c797bde66 in CommIO::FlushPipe () at comm.cc:2395 #4 0x00007f8c797be408 in CommIO::NULLFDHandler (fd=19, data=<value optimized out>) at comm.cc:2401 #5 0x00007f8c79706e16 in comm_select (msec=<value optimized out>) at comm_epoll.cc:307 #6 0x00007f8c797c6c8e in CommSelectEngine::checkEvents (this=<value optimized out>, timeout=<value optimized out>) at comm.cc:2683 #7 0x00007f8c7971cda4 in EventLoop::checkEngine (this=0x7fff977f6fa0, engine=0x7fff977f7010, primary=<value optimized out>) at EventLoop.cc:50 #8 0x00007f8c7971cf1d in EventLoop::runOnce (this=0x7fff977f6fa0) at EventLoop.cc:124 #9 0x00007f8c7971d008 in EventLoop::run (this=0x7fff977f6fa0) at EventLoop.cc:94 #10 0x00007f8c7976aa25 in SquidMain (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1399 #11 0x00007f8c7976aff6 in SquidMainSafe (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1159 #12 main (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1151 squid.conf: acl localhost src 127.0.0.1/32 ::1 http_access allow localhost http_access deny all http_port localhost:3128 access_log none cache_dir aufs /usr7/squid 225000 16 256 coredump_dir /var/spool/squid hosts_file /etc/squid/hosts refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 refresh_pattern . 0 20% 4320
(In reply to comment #0) > #0 0x00007f8c78fcc4ed in read () at ../sysdeps/unix/syscall-template.S:82 > #1 0x00007f8c797214ec in read (fd=<value optimized out>, buf=<value optimized > out>, len=<value optimized out>) > at /usr/include/bits/unistd.h:45 > #2 default_read_method (fd=<value optimized out>, buf=<value optimized out>, > len=<value optimized out>) at fd.cc:157 > > squid.conf: > > cache_dir aufs /usr7/squid 225000 16 256 As was described there are difficulties with read operation. So I'd like to ask about /usr7? Is it a mounting point? Thanks, Jiri
/usr7 is a mountpoint, yes.
How is /usr7 mounted? Did you test another location for cache_dir option? I guess it could be the source of the issue.
/usr7 is ext2 with no special mount options. Since the file descriptor being read is a pipe rather than a regular file, the filesystem is likely not the issue.
You've mentioned 'Has occurred several times on multiple machines...'. What about comparison these machines - setting, squid conf? How often do you observe 100% CPU consumption? Does it fix only service restart? /usr7 inquiry - please could you test another cache_dir location? What about this parameter on different machines?
The machines are all identically configured. Once the CPU consumption starts, it continues until squid is restarted. I moved the cache_dir to /var/cache/squid (ext4) and was able to reproduce the problem almost immediately. I found <http://bugs.squid-cache.org/show_bug.cgi?id=3189>. Comments 3 and 5 (not the rest of the bug) look exactly like what I'm seeing.
I've tested squid a couple of days with automatic requests to generate adequate load. I've used recommendation 'how to configure squid to fail' from Squid's BZ#3189 too. Unfortunately I was not able to reproduce this issue. I guess there must be something unique in hardware or software configuration what is behind originating this unwanted cpu load. Please, try to put more details about your system (private mail is also fine for me if you worry about security).
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
The request load originates from an in-house multithreaded application running on localhost, and the remote HTTP server is another in-house application. squid is not listening on any external address. After some effort, I have been unable to reliably reproduce the problem. If it recurs, I can attach gdb to the squid process and run some commands for you if you'd like.
Are you still observing high CPU utilization with latest version of squid(3.1.10-16)?
I haven't seen any recurrences recently.
Ok, I suppose that this was fixed meanwhile. Feel free to reopen when the high utilization occur again.