Bug 1570145
Summary: | Build pod stuck in Unknown state and node stuck in NotReady | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Vikas Laad <vlaad> | ||||||
Component: | Containers | Assignee: | Antonio Murdaca <amurdaca> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Vikas Laad <vlaad> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.10.0 | CC: | amurdaca, aos-bugs, dwalsh, jhonce, jokerman, mifiedle, mmccomas, mpatel, vlaad, wsun | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 3.10.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | aos-scalability-310 | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: |
Cause: grpc buffer too low (4MB)
Consequence: error out from the kubelet and not being able to handle requests from the remote runtime
Fix: increase the client buffer
Result: no more errors from kubelet to the runtime over grpc
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-09-11 18:34:13 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Vikas Laad
2018-04-20 17:51:49 UTC
Created attachment 1424631 [details]
node yaml
Created attachment 1424632 [details]
node logs
Nodes do not become even after restarting node, docker and rebooting the node itself. cleaning everything under /var/lib/docker fixed the problem. "grpc message size" is the ResourceExhausted. this happens when there are large numbers of built images in /var/lib/docker under ocp 3.10 and docker 1.13 This looks like the return code in grpc is being exceeded. I would guess. Antonio and Mrunal WDYT This is probably because the amount of containers/images on the node are overflowing the max response size of the grpc client in the kubelet. This PR https://github.com/kubernetes/kubernetes/pull/63977 increases the size and should fix this issue. I would hear more from the pod team though (and we'll need a backport anyway) Hi Vikas,please check if it has been fixed. Tried multiple runs in 3.10.0-0.63.0 version, did not happen again. |