| Summary: | Node becomes NotReady - Container Install | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vikas Laad <vlaad> |
| Component: | Containers | Assignee: | Vivek Goyal <vgoyal> |
| Status: | CLOSED WORKSFORME | QA Contact: | DeShuai Ma <dma> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.3.0 | CC: | agoldste, aos-bugs, jhonce, jokerman, mmccomas, vgoyal, vlaad, wmeng |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-09-19 14:00:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Vikas Laad
2016-08-16 18:18:40 UTC
Please provide details on which commands you used for pruning. oadm prune deployments --orphans --keep-complete=5 --keep-failed=1 --keep-younger-than=60m oadm prune builds --orphans --keep-complete=5 --keep-failed=1 --keep-younger-than=60m oadm prune images --keep-tag-revisions=3 --keep-younger-than=60m --confirm Pruning removes data from etcd related to builds, deployments, and images. It also removes image layers from the registry's storage. It does **not** remove anything from the docker daemon's thin pool, which is what is apparently having an issue. Although, if you look at the output from 'docker info', it appears that everything should be ok. Sorry I can't be more helpful here. Perhaps vgoyal could? 1 minor clarification - if 'oadm prune' deletes a build or a deployment, if containers exist for the associated pods, those containers will be deleted, and anything in the containers' COW space would be deleted as well, which would come out of the thin pool. Is there any other cleanup recommended to make sure this does not happen? I'm still confused as to why you got this error. Your output from 'docker info' appears to show plenty of space. How soon after you got the initial openvswitch error did you run 'docker info'? Kube/OpenShift will automatically prune non-running containers as needed, and there are settings you can tweak for when that kicks in. It will also automatically prune images if it is running low on space. When I run docker info openvswitch was still not starting. after lowering dm.min_free_space things started working. Vikas, Can you run docker on this system and while docker is running, can you run "dmsetup status" command and "lvs -a" command and paste output here. Hi Vivek, I do not have this cluster around, I am going to start the app reliability tests today and update this bug when hit this problem again. Andy, I think you are talking about following settings, we have it on all the nodes. I will stop doing pruning I guess, because these setting should take care of pruning automatically. image-gc-high-threshold: - '80' image-gc-low-threshold: - '70' max-pods: - '250' maximum-dead-containers: - '20' maximum-dead-containers-per-container: - '1' minimum-container-ttl-duration: - 10s Please let me know if there is anything else I should do. (In reply to Andy Goldstein from comment #7) > I'm still confused as to why you got this error. Your output from 'docker > info' appears to show plenty of space. How soon after you got the initial > openvswitch error did you run 'docker info'? > > Kube/OpenShift will automatically prune non-running containers as needed, > and there are settings you can tweak for when that kicks in. It will also > automatically prune images if it is running low on space. Yes, but you should continue to run 'oadm prune' so it can get rid of completed builds and deployments and their associated pods/containers. @vikas, hvae you been able to reproduce the problem? I think your thin pool just filled up and that's why docker refused to start new containers. Lowering min_free_space, just allows you to go little further till you fill last remaining free space. So there should be good mechanism in openshift to keep track of free space and keep on cleaning images/containers to make sure sufficient free space is there in thin pool. After that either stop sending jobs to that node or add more storag to that node. @Vivek, I am still running the tests, will update the bug when/if I encounter the issue. If not I guess we will close this bug. I am not able to reproduce this issue, had couple of reliability runs on container install. Closing this bug. |