Bug 1461370 - [free-int] pod cannot created successfully [NEEDINFO]
Summary: [free-int] pod cannot created successfully
Status: CLOSED DUPLICATE of bug 1451110
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Networking
Version: 3.x
Hardware: All
OS: All
Target Milestone: ---
: ---
Assignee: Ravi Sankar
QA Contact: Meng Bo
Depends On:
TreeView+ depends on / blocked
Reported: 2017-06-14 10:21 UTC by zhaozhanqi
Modified: 2017-06-22 19:17 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2017-06-22 19:17:14 UTC
bbennett: needinfo? (zzhao)

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1456138 None CLOSED devicemapper error dm_task_set_cookie failed 2019-04-12 13:23:29 UTC

Internal Links: 1456138

Comment 2 Ben Bennett 2017-06-14 15:16:33 UTC
It looks like sometimes we wait and sometimes we don't but we always return the same error.  Even if we can't work out precisely what the error is, we should improve the logging here so we have a chance to identify where the error comes from.  At the very least we should make sure the errors returned are different.

Comment 3 Ravi Sankar 2017-06-14 20:22:19 UTC
NetID/VNID assignment is done on project creation. If this is happening on every pod creation, then this can't be related to race between master assigning netid and node populating netid in memory. We could get this error if master failed to assign netid to the namespace or node failed to receive netnamespace event.

Can you attach relevant master and node logs that will provide some more information? Do you have steps to reproduce this issue locally on dind cluster?

Comment 5 zhaozhanqi 2017-06-15 03:03:03 UTC
Thanks @Justin Pierce

I'm also try to reproduce this issue on local env. hope it can be reproduced.

Comment 7 Ravi Sankar 2017-06-15 22:39:34 UTC
The issue you found on AWS (comment#6) is not same as the one you filed. You might have hit https://github.com/openshift/origin/issues/14601 or https://github.com/moby/moby/issues/33603 

For the original issue, attached logs were not helpful. Log info available from 'Jun 14 16:19' but the issue happened around 'Jun 14 08:49'. I didn't see similar symptom 'failed to find netid for namespace' in the available logs. May be logs were rotated? This seems like https://bugzilla.redhat.com/show_bug.cgi?id=1451110 but couldn't confirm without additional evidence.
Can you reproduce again on free-int and attach output from sdn debug script (https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh).

Comment 8 zhaozhanqi 2017-06-16 08:17:51 UTC
@Ravi Sankar

Last time all pods that scheduled to node ip-172-31-59-87.ec2.internal will failed with this bug error. Today I did a testing to make the pod schedule that node.the result is created successfully.. perhaps free-int upgrade fixed this issue.

  25s		25s		1	default-scheduler							Normal		Scheduled	Successfully assigned nodeselect-pod to ip-172-31-59-87.ec2.internal
  10s		10s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Pulling		pulling image "openshift/hello-openshift"
  8s		8s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Pulled		Successfully pulled image "openshift/hello-openshift"
  8s		8s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Created		Created container with id a73e6daf5ed5c114a416cd07610de93579fc4373282dc96bce539d0007f34d05
  7s		7s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Started		Started container with id a73e6daf5ed5c114a416cd07610de93579fc4373282dc96bce539d0007f34d05

So maybe now it has a little difficult to reproduce on free-int. I will also keep a eye on this.

Comment 10 Ben Bennett 2017-06-22 13:59:07 UTC
zhaozhanqi: Can you re-test this please to see if it still happens since https://bugzilla.redhat.com/show_bug.cgi?id=1456138 has been merged?  Thanks

Comment 11 Ben Bennett 2017-06-22 19:17:14 UTC
Ok, there are two problems here:

1) "failed to find netid for namespace" appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1451110

2) "Can't set cookie dm_task_set_cookie failed" is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1456138

#1 seems to be caused by >1 delays waiting for responses to requests to docker.  If more evidence arises to suggest that is not the case, and that it not a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1451110 then we can re-open and investigate.

If #2 is not resolved, please open that as a separate bug.

*** This bug has been marked as a duplicate of bug 1451110 ***

Note You need to log in before you can comment on or make changes to this bug.