Bug 1477486
Summary: | [DOCKER] /var/lib/docker gets full overtime when docker uses maxfile > 1 and while Openshif web console keeps watching logs | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nicolas Nosenzo <nnosenzo> |
Component: | Containers | Assignee: | Nalin Dahyabhai <nalin> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | DeShuai Ma <dma> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 3.2.1 | CC: | amurdaca, aos-bugs, imcleod, jokerman, mmccomas, nnosenzo, subhat |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-09 14:31:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nicolas Nosenzo
2017-08-02 08:40:15 UTC
This looks like expected VFS behavior. If you hold open a FD and then delete the file, the block remain used until the FS is closed. You have asserted: "If stopping watching the log from the GUI, it will not free up the indicated "deleted" files." This is odd. It suggests that stopping the GUI watch doesn't actually close the FD. We should get to the bottom of that if it is, indeed happening. Can you reproduce with a series of lsof outputs both before and _after_ the GUI log watch process is closed? @Ian, I'm on the way of reproducing it, I will let you know my findings shortly. Ok, I could reproduce it: --- About to rotate: # tail rot.logs -n3 -rw-------. 1 root root 1000159 Aug 3 21:35 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 -rw-r--r--. 1 root root 1000165 Aug 4 05:37 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.1 -rw-r--r--. 1 root root 994780 Aug 4 13:36 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log If I check the open files that have been unlinked: # lsof -a +L1 | grep bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69 Shows nothing. However, if I run (just to show that, at least, there are open files with 1 link): # lsof -a +L2 | grep bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69 docker-cu 125850 root 10w REG 253,0 992706 1 17725690 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log docker-cu 125850 root 28r REG 0,25 0 1 21312546 /sys/fs/cgroup/memory/system.slice/docker-bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69.scope/memory.oom_control docker-cu 125850 root 37r REG 253,0 1000159 1 17725702 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 docker-cu 125850 root 47r REG 253,0 1000165 1 17924919 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.1 docker-cu 125850 root 50r REG 253,0 992706 1 17725690 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log fluentd 129272 root 16r REG 253,0 992706 1 17725690 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log --- Rotated (with console' watch process open): # tail rot.logs -n3 -rw-r--r--. 1 root root 1000165 Aug 4 05:37 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 -rw-r--r--. 1 root root 1000141 Aug 4 13:39 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.1 -rw-r--r--. 1 root root 11242 Aug 4 13:44 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log # lsof -a +L1 | grep containers docker-cu 125850 root 37r REG 253,0 1000159 0 17725702 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 (deleted) --- After log out of the console, the file descriptor still there: # lsof -a +L1 | grep bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69 docker-cu 125850 root 37r REG 253,0 1000159 0 17725702 /var/lib/docker/containers/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69/bfc9b78e9d2bbd7a3bc39d5900e28850c837646dfd87201cfa5a100190707a69-json.log.2 (deleted) As a workaround, changing docker config on nodes to start with "max-file=1" (instead of "max-file=3") in order to always have the same log file descriptor seems to avoid this bug. Tested it in OCP 3.6 and the issue is gone. Closing this BZ as current release. |