Bug 1337470
| Summary: | Pods pending on Node with unknown reason | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jaspreet Kaur <jkaur> | 
| Component: | Node | Assignee: | Derek Carr <decarr> | 
| Status: | CLOSED ERRATA | QA Contact: | DeShuai Ma <dma> | 
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.1.0 | CC: | agoldste, aos-bugs, bleanhar, jkaur, jokerman, mmccomas, tdawson, wmeng | 
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: | Feature: 
Ability to define eviction thresholds for imagefs
Reason: 
Evicts pods when node is running low on disk
Result: 
Disk is reclaimed and node remains stable. | Story Points: | --- | 
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-18 12:41:02 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| 
        
          Description
        
        
          Jaspreet Kaur
        
        
        
        
        
          2016-05-19 09:29:33 UTC
        
       Typically when pods are stuck pending, it's because the node has asked Docker to pull an image, and the pull "hangs" for some reason. And because image pulling is done in serial, if a pull hangs, then all subsequent attempts to run pods on that node will be stuck Pending until the pull finishes. One improvement could be to add an event when a request to pull an image is queued. That way, you would at least know that was the last operation for the pod, and it would be easy to tell that pulling was hanging. Re the docker pool, OSE 3.2 added support for correctly reporting the docker pool usage on devicemapper systems (i.e. RHEL), so I would expect to see improvements in 3.2 that aren't in 3.1. We are also working on proactively evicting pods from nodes when the node determines that it's running low on memory or disk. Is there anything else you're looking for? OCP 3.4 has rebased on Kube 1.4 which has the support for disk eviction policies we added upstream (see http://kubernetes.io/docs/admin/out-of-resource/) Moving this to ON_QA Test on openshift v3.4.0.15+9c963ec, disk pressure works as expected. detail in the card. https://trello.com/c/3LvGAHr3/371-5-kubelet-evicts-pods-when-low-on-disk-node-reliability Verify this bug. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066 |