Description of problem: A small local installation of squid (one sibling, aufs storage) works just fine across multiple days, startups and shutdowns, until, on one of the startups, the cache rebuild process starts, goes wild and fills the disk. This is what is in the log (cache.log): 2012/09/06 19:17:33 kid1| Startup: Initializing Authentication Schemes ... 2012/09/06 19:17:33 kid1| Startup: Initialized Authentication Scheme 'basic' 2012/09/06 19:17:33 kid1| Startup: Initialized Authentication Scheme 'digest' 2012/09/06 19:17:33 kid1| Startup: Initialized Authentication Scheme 'negotiate' 2012/09/06 19:17:33 kid1| Startup: Initialized Authentication Scheme 'ntlm' 2012/09/06 19:17:33 kid1| Startup: Initialized Authentication. 2012/09/06 19:17:33 kid1| Processing Configuration File: /etc/squid/squid.conf (depth 0) 2012/09/06 19:17:33 kid1| Initializing https proxy context 2012/09/06 19:17:33 kid1| Logfile: opening log daemon:/var/log/squid/access.log 2012/09/06 19:17:33 kid1| Logfile Daemon: opening log /var/log/squid/access.log 2012/09/06 19:17:33 kid1| Squid plugin modules loaded: 0 2012/09/06 19:17:33 kid1| Adaptation support is off. 2012/09/06 19:17:33 kid1| Store logging disabled 2012/09/06 19:17:33 kid1| DNS Socket created at [::], FD 10 2012/09/06 19:17:33 kid1| DNS Socket created at 0.0.0.0, FD 11 2012/09/06 19:17:33 kid1| Adding domain tmatmouci.cz from /etc/resolv.conf 2012/09/06 19:17:33 kid1| Adding domain tmatmouci.cz from /etc/resolv.conf 2012/09/06 19:17:33 kid1| Adding nameserver 192.168.0.1 from /etc/resolv.conf 2012/09/06 19:17:33 kid1| HTCP Disabled. 2012/09/06 19:17:33 kid1| Configuring Sibling 192.168.0.1/3128/3130 2012/09/06 19:17:33 kid1| Loaded Icons. 2012/09/06 19:17:33 kid1| Accepting HTTP Socket connections at local=127.0.0.1:3128 remote=[::] FD 13 flags=9 2012/09/06 19:17:33 kid1| Accepting ICP messages on [::]:3130 2012/09/06 19:17:33 kid1| Sending ICP messages from [::]:3130 2012/09/06 19:17:38 kid1| WARNING: Disk space over limit: 610715120.00 KB > 67108864 KB 2012/09/06 19:17:42 kid1| Store rebuilding is 963997.35% complete 2012/09/06 19:17:49 kid1| WARNING: Disk space over limit: 615944012.00 KB > 67108864 KB 2012/09/06 19:17:57 kid1| Store rebuilding is 975840.41% complete 2012/09/06 19:18:00 kid1| WARNING: Disk space over limit: 621642344.00 KB > 67108864 KB 2012/09/06 19:18:11 kid1| WARNING: Disk space over limit: 626752920.00 KB > 67108864 KB 2012/09/06 19:18:12 kid1| Store rebuilding is 987658.22% complete ... 2012/09/06 19:32:27 kid1| WARNING: Disk space over limit: 835379880.00 KB > 67108864 KB 2012/09/06 19:32:30 kid1| Store rebuilding is 1316233.70% complete 2012/09/06 19:32:38 kid1| WARNING: Disk space over limit: 835691972.00 KB > 67108864 KB 2012/09/06 19:32:45 kid1| Store rebuilding is 1317622.55% complete ... The store configuration is: maximum_object_size 1 GB cache_dir aufs /var/spool/squid 65536 16 256 Some additional observations. The content of the storage directory (minus swap.state) is not really over the limit, du reports around 1.5 GB occupied. The disk is filled by the growing swap.state file, in this particular occurence it grew to over 30 GB and kept growing. Shutting down squid, deleting the swap.state file and starting up squid solves the problem (until next occurence). Prior to the problem, squid was shut down cleanly (and likely there was no unclean shutdown before that either). Version-Release number of selected component (if applicable): squid-3.2.0.16-1.fc17.x86_64 all other packages current with Fedora 17 quad core machine How reproducible: On this machine, the problem happens once every few weeks, slightly more often than once a month.
This is known to happen when swap.state gets corrupted for some reason. But swap.state corruption is not easily seen unless the machine as such crashes, so this is a quite unexpected.
I've scanned the logs several months back and see no unclean shutdown. Is it possible to manually detect whether swap.state is corrupted, so that I can perhaps track down when the corruption happens (sooner than when I see the disk getting filled to maximum capacity) ?
I see the same thing with F17, systemd doesn't seem to honor squids *long* shutdown time (up to 30 seconds) and after every reboot I get warnings like: 2012/09/30 01:57:51 kid1| WARNING: Disk space over limit: 5278968.00 KB > 209715 2 KB in /var/log/squid/cache.log. Stopping squid, removing /var/spool/squid/swap.state and then restarting squid again helped but that's just a workaround.
I confirm the observation above. Another pass through the logs suggests that while clean shutdown was always initiated, it sometimes did not run to completion (the log says "Preparing for shutdown" but other common shutdown messages do not follow). What is strange is that default systemd shutdown timeout for the squid service should be 90 seconds, well above the 30 seconds mentioned above. I'm now trying with "TimeoutSec=0" added to /lib/systemd/system/squid.service to see if that helps.
For a large Squid cache 90 seconds may not be enough. The 30 second mentioned is how log Squid waits for existing connections to finish before it shuts down for real. After that is the actual shutdown procedure which can take quite some time if either the cache is large or your system have swapped out significant parts of Squid.
Summary of the problem: Squid shutdown easily takes more than one minute, even if it is not particularly active. Systemd does not wait for Squid to shut down properly. When killed, Squid can leave its cache files in an inconsistent state, which cause it to go wild and use up all disk space on some of the next starts. Current solution that seems to work for me: Wait for proper Squid shutdown in Systemd. Change the ExecStop line and add the next two lines in /lib/systemd/system/squid.service: ExecStop=/bin/sh -c "/usr/sbin/squid -k shutdown && while kill -0 $MAINPID ; do sleep 1 ; done" PIDFile=/run/squid.pid TimeoutSec=300 What I do not like on the current solution: The active wait for the Squid process to disappear. I could not find any better way to do this in Systemd, but I could easily be missing something. Squid itself apparently does not offer any way of shutting down synchronously (if it did, it could simply be used in ExecStop instead of the asynchronous shutdown command). Finally, there is a SQUID_SHUTDOWN_TIMEOUT variable in /etc/sysconfig/squid, which is apparently not used anywhere. Perhaps we could do something like TimeoutSec=$SQUID_SHUTDOWN_TIMEOUT in the Systemd service file, but I'm not sure if Systemd expands shell variables in other than the Exec lines.
This bug is still present in the current Squid package in Fedora 18 (ExecStop command in /lib/systemd/system/squid.service kills Squid rather than shutting it down properly).
This message is a reminder that Fedora 18 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 18. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '18'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 18's end of life. Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 18 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 18's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
Resolved in the current rawhide version of squid -> timeout option is now set to zero(unlimited) as it can't be predicted in advance.