Description of problem: Docker does not properly rotate the logs when maxfile is > 1 and logs file descriptor are open (i.e., being watching through the console) at the rotation moment. Then if the process continues for several days, /var/lib/docker will cause an "OutOfSpace" error (reported by kubelet). If stopping watching the log from the GUI, it will not free up the indicated "deleted" files. As a workaround to recover disk space in /var/lib/docker, we can restart docker, which is not sustainable on the long term. Version-Release number of selected component (if applicable): Docker 1.10.3-59 Openshift 3.2.1-15 How reproducible: 100% Steps to Reproduce: Current configuration: /usr/bin/docker-current daemon --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --selinux-enabled --log-driver=json-file --log-opt max-size=50m --log-opt max-file=3 --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true -b=lbr0 --mtu=1450 --add-registry registry.access.redhat.com Procedure to replicate the issue: I have the following logs: # ls -il /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log* 787851 -r--------. 1 root root 40233476 Jul 24 11:54 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log 787243 -r--------. 1 root root 50000052 Jul 24 11:49 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.1 788262 -r--------. 1 root root 50000039 Jul 24 11:43 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 Now start watching the related pod log from Openshift web console GUI: # ls -il /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log* 787851 -r--------. 1 root root 49874783 Jul 24 11:55 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log 787243 -r--------. 1 root root 50000052 Jul 24 11:49 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.1 788262 -r--------. 1 root root 50000039 Jul 24 11:43 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 Since, the rotation occured, now seing one 1st "deleted" file, but that one still hold disk space even if no more accessible except from Docker: # lsof -a +L1 | grep /var/lib/docker/ docker-cu 2624 root 55r REG 253,1 50000039 0 788262 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 (deleted) # ls -il /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log* 787878 -r--------. 1 root root 15362832 Jul 24 11:57 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log 787851 -r--------. 1 root root 50000187 Jul 24 11:55 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.1 787243 -r--------. 1 root root 50000052 Jul 24 11:49 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 This is about to rotate again: # ls -il /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log* 787878 -r--------. 1 root root 48125666 Jul 24 12:01 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log 787851 -r--------. 1 root root 50000187 Jul 24 11:55 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.1 787243 -r--------. 1 root root 50000052 Jul 24 11:49 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 A rotation occured: # ls -il /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log* 788260 -r--------. 1 root root 1820555 Jul 24 12:01 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log 787878 -r--------. 1 root root 50000501 Jul 24 12:01 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.1 787851 -r--------. 1 root root 50000187 Jul 24 11:55 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 Now there are 2 handles on log files left opened: # lsof -a +L1 | grep /var/lib/docker/ docker-cu 2624 root 55r REG 253,1 50000039 0 788262 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 (deleted) docker-cu 2624 root 70r REG 253,1 50000052 0 787243 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 (deleted) Etc.: # lsof -a +L1 | grep /var/lib/docker/ docker-cu 2624 root 55r REG 253,1 50000039 0 788262 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 (deleted) docker-cu 2624 root 70r REG 253,1 50000052 0 787243 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 (deleted) docker-cu 2624 root 71r REG 253,1 50000187 0 787851 /var/lib/docker/containers/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c/5891494323dcef09e051473e2f4e1172683f4a655e82c32d189ca9f6f973707c-json.log.2 (deleted) Actual results: Open files are not being removed on next Docker log rotation. They keep occupying space on the FS and potentially fill the /var/lib/docker Expected results: The open files are removed during a docker log rotation when they appear as (deleted) on lsof output. Additional info:
This looks like expected VFS behavior. If you hold open a FD and then delete the file, the block remain used until the FS is closed. You have asserted: "If stopping watching the log from the GUI, it will not free up the indicated "deleted" files." This is odd. It suggests that stopping the GUI watch doesn't actually close the FD. We should get to the bottom of that if it is, indeed happening. Can you reproduce with a series of lsof outputs both before and _after_ the GUI log watch process is closed?
@Ian, I'm on the way of reproducing it, I will let you know my findings shortly.
Ok, I could reproduce it: --- About to rotate: # tail rot.logs -n3 -rw-------. 1 root root 1000159 Aug 3 21:35 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 -rw-r--r--. 1 root root 1000165 Aug 4 05:37 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.1 -rw-r--r--. 1 root root 994780 Aug 4 13:36 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log If I check the open files that have been unlinked: # lsof -a +L1 | grep bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69 Shows nothing. However, if I run (just to show that, at least, there are open files with 1 link): # lsof -a +L2 | grep bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69 docker-cu 125850 root 10w REG 253,0 992706 1 17725690 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log docker-cu 125850 root 28r REG 0,25 0 1 21312546 /sys/fs/cgroup/memory/system.slice/docker-bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69.scope/memory.oom_control docker-cu 125850 root 37r REG 253,0 1000159 1 17725702 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 docker-cu 125850 root 47r REG 253,0 1000165 1 17924919 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.1 docker-cu 125850 root 50r REG 253,0 992706 1 17725690 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log fluentd 129272 root 16r REG 253,0 992706 1 17725690 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log --- Rotated (with console' watch process open): # tail rot.logs -n3 -rw-r--r--. 1 root root 1000165 Aug 4 05:37 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 -rw-r--r--. 1 root root 1000141 Aug 4 13:39 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.1 -rw-r--r--. 1 root root 11242 Aug 4 13:44 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log # lsof -a +L1 | grep containers docker-cu 125850 root 37r REG 253,0 1000159 0 17725702 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 (deleted) --- After log out of the console, the file descriptor still there: # lsof -a +L1 | grep bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69 docker-cu 125850 root 37r REG 253,0 1000159 0 17725702 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 (deleted)
As a workaround, changing docker config on nodes to start with "max-file=1" (instead of "max-file=3") in order to always have the same log file descriptor seems to avoid this bug.
Tested it in OCP 3.6 and the issue is gone. Closing this BZ as current release.