Bug 1451110
Summary: | Pods stuck in ContainerCreating with CNI errors in node logs | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||
Component: | Networking | Assignee: | Dan Williams <dcbw> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Meng Bo <bmeng> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.6.0 | CC: | aloughla, aos-bugs, atragler, bbennett, jeder, mifiedle, wabouham, zzhao | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-06-09 20:29:22 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mike Fiedler
2017-05-15 20:00:46 UTC
If you see this again, please: 1) oc get netnamespace -o wide 2) after that, modify the atomic-openshift-node systemd service file in /etc/systemd/system/atomic-openshift-node.service and set --loglevel=5 and restart. Then wait for the problem to appear again. 3) Or better yet, provision the cluster with --loglevel=5 on all the nodes. In any case, it's very likely some errors earlier are causing the "failed to find netid for namespace", and we should figure out what those are. (In reply to Dan Williams from comment #3) > If you see this again, please: > > 1) oc get netnamespace -o wide > 2) after that, modify the atomic-openshift-node systemd service file in > /etc/systemd/system/atomic-openshift-node.service and set --loglevel=5 and > restart. Then wait for the problem to appear again. > 3) Or better yet, provision the cluster with --loglevel=5 on all the nodes. > > In any case, it's very likely some errors earlier are causing the "failed to > find netid for namespace", and we should figure out what those are. So having debugged the "failed to find netid for namespace" issue as part of bug 1451902 this is very likely due to kubelet being blocked by docker, and the SDN code not being given time to run, thus some events come out of order. I'm going to dupe this bug to that one for now, if we solve the docker blockage issue and find the same "failed to find netid" still happening, then we can un-dupe and proceed. *** This bug has been marked as a duplicate of bug 1451902 *** *** Bug 1461370 has been marked as a duplicate of this bug. *** |