1461370 – [free-int] pod cannot created successfully

Bug 1461370 - [free-int] pod cannot created successfully

Summary: [free-int] pod cannot created successfully

Keywords:
Status:	CLOSED DUPLICATE of bug 1451110
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.x
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ravi Sankar
QA Contact:	Meng Bo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-14 10:21 UTC by zhaozhanqi
Modified:	2023-09-14 03:59 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-22 19:17:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1456138	0	high	CLOSED	devicemapper error dm_task_set_cookie failed	2021-02-22 00:41:40 UTC

Internal Links: 1456138

Comment 2 Ben Bennett 2017-06-14 15:16:33 UTC

It looks like sometimes we wait and sometimes we don't but we always return the same error.  Even if we can't work out precisely what the error is, we should improve the logging here so we have a chance to identify where the error comes from.  At the very least we should make sure the errors returned are different.

Comment 3 Ravi Sankar 2017-06-14 20:22:19 UTC

NetID/VNID assignment is done on project creation. If this is happening on every pod creation, then this can't be related to race between master assigning netid and node populating netid in memory. We could get this error if master failed to assign netid to the namespace or node failed to receive netnamespace event.

Can you attach relevant master and node logs that will provide some more information? Do you have steps to reproduce this issue locally on dind cluster?

Comment 5 zhaozhanqi 2017-06-15 03:03:03 UTC

Thanks @Justin Pierce

I'm also try to reproduce this issue on local env. hope it can be reproduced.

Comment 7 Ravi Sankar 2017-06-15 22:39:34 UTC

@zhaozhanqi
The issue you found on AWS (comment#6) is not same as the one you filed. You might have hit https://github.com/openshift/origin/issues/14601 or https://github.com/moby/moby/issues/33603 

For the original issue, attached logs were not helpful. Log info available from 'Jun 14 16:19' but the issue happened around 'Jun 14 08:49'. I didn't see similar symptom 'failed to find netid for namespace' in the available logs. May be logs were rotated? This seems like https://bugzilla.redhat.com/show_bug.cgi?id=1451110 but couldn't confirm without additional evidence.
Can you reproduce again on free-int and attach output from sdn debug script (https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh).

Comment 8 zhaozhanqi 2017-06-16 08:17:51 UTC

@Ravi Sankar

Last time all pods that scheduled to node ip-172-31-59-87.ec2.internal will failed with this bug error. Today I did a testing to make the pod schedule that node.the result is created successfully.. perhaps free-int upgrade fixed this issue.

	-------
  25s		25s		1	default-scheduler							Normal		Scheduled	Successfully assigned nodeselect-pod to ip-172-31-59-87.ec2.internal
  10s		10s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Pulling		pulling image "openshift/hello-openshift"
  8s		8s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Pulled		Successfully pulled image "openshift/hello-openshift"
  8s		8s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Created		Created container with id a73e6daf5ed5c114a416cd07610de93579fc4373282dc96bce539d0007f34d05
  7s		7s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Started		Started container with id a73e6daf5ed5c114a416cd07610de93579fc4373282dc96bce539d0007f34d05

So maybe now it has a little difficult to reproduce on free-int. I will also keep a eye on this.

Comment 10 Ben Bennett 2017-06-22 13:59:07 UTC

zhaozhanqi: Can you re-test this please to see if it still happens since https://bugzilla.redhat.com/show_bug.cgi?id=1456138 has been merged?  Thanks

Comment 11 Ben Bennett 2017-06-22 19:17:14 UTC

Ok, there are two problems here:

1) "failed to find netid for namespace" appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1451110

2) "Can't set cookie dm_task_set_cookie failed" is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1456138


#1 seems to be caused by >1 delays waiting for responses to requests to docker.  If more evidence arises to suggest that is not the case, and that it not a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1451110 then we can re-open and investigate.

If #2 is not resolved, please open that as a separate bug.

*** This bug has been marked as a duplicate of bug 1451110 ***

Comment 12 Red Hat Bugzilla 2023-09-14 03:59:09 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.