Bug 1708442
Summary: | [3.11] Symlinks under /var/lib/containers/storage/overlay/l are lost on reboot | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Steven Walter <stwalter> |
Component: | Containers | Assignee: | Urvashi Mohnani <umohnani> |
Status: | CLOSED ERRATA | QA Contact: | weiwei jiang <wjiang> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.11.0 | CC: | adeshpan, aos-bugs, dwalsh, eparis, jokerman, mmccomas, umohnani, wjiang |
Target Milestone: | --- | ||
Target Release: | 3.11.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1704410 | Environment: | |
Last Closed: | 2019-06-26 09:08:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1704410 | ||
Bug Blocks: |
Comment 1
Urvashi Mohnani
2019-05-10 09:30:59 UTC
(In reply to Urvashi Mohnani from comment #1) > New build is available at > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=21597045 Hi, do we miss some important containers/storage things for this build? One node got this error for a pod: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Failed 10m kubelet, qe-wjiang-311-node-registry-router-1 Failed to pull image "brewregistry.stage.redhat.io/openshift3/ose-node:v3.11": rpc error: code = Unknown desc = Error writing blob: error storing blob to file "/var/tmp/storage729637586/1": unexpected EOF Warning Failed 10m kubelet, qe-wjiang-311-node-registry-router-1 Error: ErrImagePull Normal BackOff 10m kubelet, qe-wjiang-311-node-registry-router-1 Back-off pulling image "brewregistry.stage.redhat.io/openshift3/ose-node:v3.11" Warning Failed 10m kubelet, qe-wjiang-311-node-registry-router-1 Error: ImagePullBackOff Normal Pulling 10m (x2 over 16m) kubelet, qe-wjiang-311-node-registry-router-1 pulling image "brewregistry.stage.redhat.io/openshift3/ose-node:v3.11" Nope, nothing in containers/storage changed, we just cherry-picked the symlink fixes onto the containers/storage version already being used by cri-o 1.11. Are you seeing this error on multiple pods? Did it eventually fix itself, or was it stuck in this state? Did you try killing the pod and letting it start up again? My customer who is hitting the issue "rebuilds" the node to fix the issue (removing and reinstalling components) -- but this is a very big workaround and not ideal. Curious if anyone has a less intrusive workaround. So the "storing-the-layer-blob-to-a-file" logic comes from containers/image and not containers/storage. If this issue is continuously happening, please open another bz for it. This shouldn't be blocking this bz as the symlink fixes went into containers/storage. Hi, Do you mean, if it requires rebuilding (i.e. if it does not resolve by deleting pods or etc) @Steven yeah, does deleting the pod resole the issue? Also how often is the customer seeing this happen? If possible can I get cri-o and kubelet logs from the cluster as well. Checked with 1.11.14 and reboot 5 times for the whole clusters, not met this issue, so move to verified. # oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME qe-wjiang-311-master-etcd-1 Ready master 48m v1.11.0+d4cacc0 10.0.76.16 <none> Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-957.12.1.el7.x86_64 cri-o://1.11.14-1.rhaos3.11.gitd56660e.el7 qe-wjiang-311-node-1 Ready compute 45m v1.11.0+d4cacc0 10.0.77.60 <none> Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-957.12.1.el7.x86_64 cri-o://1.11.14-1.rhaos3.11.gitd56660e.el7 qe-wjiang-311-node-registry-router-1 Ready <none> 45m v1.11.0+d4cacc0 10.0.76.72 <none> Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-957.12.1.el7.x86_64 cri-o://1.11.14-1.rhaos3.11.gitd56660e.el7 For the "Error writing blob: error storing blob to file" issue, I tried 2 times, but not met this. Will keep an eye on that, and open bug once I met that again. @Urvashi Hm, well the issue seems to occur with new pods, so this might not apply. I opened a new bug. https://bugzilla.redhat.com/show_bug.cgi?id=1710124 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1605 |