It looks like sometimes we wait and sometimes we don't but we always return the same error. Even if we can't work out precisely what the error is, we should improve the logging here so we have a chance to identify where the error comes from. At the very least we should make sure the errors returned are different.
NetID/VNID assignment is done on project creation. If this is happening on every pod creation, then this can't be related to race between master assigning netid and node populating netid in memory. We could get this error if master failed to assign netid to the namespace or node failed to receive netnamespace event. Can you attach relevant master and node logs that will provide some more information? Do you have steps to reproduce this issue locally on dind cluster?
Thanks @Justin Pierce I'm also try to reproduce this issue on local env. hope it can be reproduced.
@zhaozhanqi The issue you found on AWS (comment#6) is not same as the one you filed. You might have hit https://github.com/openshift/origin/issues/14601 or https://github.com/moby/moby/issues/33603 For the original issue, attached logs were not helpful. Log info available from 'Jun 14 16:19' but the issue happened around 'Jun 14 08:49'. I didn't see similar symptom 'failed to find netid for namespace' in the available logs. May be logs were rotated? This seems like https://bugzilla.redhat.com/show_bug.cgi?id=1451110 but couldn't confirm without additional evidence. Can you reproduce again on free-int and attach output from sdn debug script (https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh).
@Ravi Sankar Last time all pods that scheduled to node ip-172-31-59-87.ec2.internal will failed with this bug error. Today I did a testing to make the pod schedule that node.the result is created successfully.. perhaps free-int upgrade fixed this issue. ------- 25s 25s 1 default-scheduler Normal Scheduled Successfully assigned nodeselect-pod to ip-172-31-59-87.ec2.internal 10s 10s 1 kubelet, ip-172-31-59-87.ec2.internal spec.containers{nodeselect-pod} Normal Pulling pulling image "openshift/hello-openshift" 8s 8s 1 kubelet, ip-172-31-59-87.ec2.internal spec.containers{nodeselect-pod} Normal Pulled Successfully pulled image "openshift/hello-openshift" 8s 8s 1 kubelet, ip-172-31-59-87.ec2.internal spec.containers{nodeselect-pod} Normal Created Created container with id a73e6daf5ed5c114a416cd07610de93579fc4373282dc96bce539d0007f34d05 7s 7s 1 kubelet, ip-172-31-59-87.ec2.internal spec.containers{nodeselect-pod} Normal Started Started container with id a73e6daf5ed5c114a416cd07610de93579fc4373282dc96bce539d0007f34d05 So maybe now it has a little difficult to reproduce on free-int. I will also keep a eye on this.
zhaozhanqi: Can you re-test this please to see if it still happens since https://bugzilla.redhat.com/show_bug.cgi?id=1456138 has been merged? Thanks
Ok, there are two problems here: 1) "failed to find netid for namespace" appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1451110 2) "Can't set cookie dm_task_set_cookie failed" is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1456138 #1 seems to be caused by >1 delays waiting for responses to requests to docker. If more evidence arises to suggest that is not the case, and that it not a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1451110 then we can re-open and investigate. If #2 is not resolved, please open that as a separate bug. *** This bug has been marked as a duplicate of bug 1451110 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days