This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1461370 - [free-int] pod cannot created successfully [NEEDINFO]
[free-int] pod cannot created successfully
Status: CLOSED DUPLICATE of bug 1451110
Product: OpenShift Online
Classification: Red Hat
Component: Networking (Show other bugs)
3.x
All All
high Severity high
: ---
: ---
Assigned To: Ravi Sankar
Meng Bo
: OpsBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-14 06:21 EDT by zhaozhanqi
Modified: 2017-06-22 15:17 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-22 15:17:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
bbennett: needinfo? (zzhao)


Attachments (Terms of Use)

  None (edit)
Comment 2 Ben Bennett 2017-06-14 11:16:33 EDT
It looks like sometimes we wait and sometimes we don't but we always return the same error.  Even if we can't work out precisely what the error is, we should improve the logging here so we have a chance to identify where the error comes from.  At the very least we should make sure the errors returned are different.
Comment 3 Ravi Sankar 2017-06-14 16:22:19 EDT
NetID/VNID assignment is done on project creation. If this is happening on every pod creation, then this can't be related to race between master assigning netid and node populating netid in memory. We could get this error if master failed to assign netid to the namespace or node failed to receive netnamespace event.

Can you attach relevant master and node logs that will provide some more information? Do you have steps to reproduce this issue locally on dind cluster?
Comment 5 zhaozhanqi 2017-06-14 23:03:03 EDT
Thanks @Justin Pierce

I'm also try to reproduce this issue on local env. hope it can be reproduced.
Comment 7 Ravi Sankar 2017-06-15 18:39:34 EDT
@zhaozhanqi
The issue you found on AWS (comment#6) is not same as the one you filed. You might have hit https://github.com/openshift/origin/issues/14601 or https://github.com/moby/moby/issues/33603 

For the original issue, attached logs were not helpful. Log info available from 'Jun 14 16:19' but the issue happened around 'Jun 14 08:49'. I didn't see similar symptom 'failed to find netid for namespace' in the available logs. May be logs were rotated? This seems like https://bugzilla.redhat.com/show_bug.cgi?id=1451110 but couldn't confirm without additional evidence.
Can you reproduce again on free-int and attach output from sdn debug script (https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh).
Comment 8 zhaozhanqi 2017-06-16 04:17:51 EDT
@Ravi Sankar

Last time all pods that scheduled to node ip-172-31-59-87.ec2.internal will failed with this bug error. Today I did a testing to make the pod schedule that node.the result is created successfully.. perhaps free-int upgrade fixed this issue.

	-------
  25s		25s		1	default-scheduler							Normal		Scheduled	Successfully assigned nodeselect-pod to ip-172-31-59-87.ec2.internal
  10s		10s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Pulling		pulling image "openshift/hello-openshift"
  8s		8s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Pulled		Successfully pulled image "openshift/hello-openshift"
  8s		8s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Created		Created container with id a73e6daf5ed5c114a416cd07610de93579fc4373282dc96bce539d0007f34d05
  7s		7s		1	kubelet, ip-172-31-59-87.ec2.internal	spec.containers{nodeselect-pod}	Normal		Started		Started container with id a73e6daf5ed5c114a416cd07610de93579fc4373282dc96bce539d0007f34d05

So maybe now it has a little difficult to reproduce on free-int. I will also keep a eye on this.
Comment 10 Ben Bennett 2017-06-22 09:59:07 EDT
zhaozhanqi: Can you re-test this please to see if it still happens since https://bugzilla.redhat.com/show_bug.cgi?id=1456138 has been merged?  Thanks
Comment 11 Ben Bennett 2017-06-22 15:17:14 EDT
Ok, there are two problems here:

1) "failed to find netid for namespace" appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1451110

2) "Can't set cookie dm_task_set_cookie failed" is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1456138


#1 seems to be caused by >1 delays waiting for responses to requests to docker.  If more evidence arises to suggest that is not the case, and that it not a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1451110 then we can re-open and investigate.

If #2 is not resolved, please open that as a separate bug.

*** This bug has been marked as a duplicate of bug 1451110 ***

Note You need to log in before you can comment on or make changes to this bug.