Bug 1449373 - Completed build pod has no IP address
Summary: Completed build pod has no IP address
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.0
Assignee: Dan Williams
QA Contact: Vikas Laad
URL:
Whiteboard:
: 1406338 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-09 20:04 UTC by Vikas Laad
Modified: 2018-07-25 06:43 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-25 13:02:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
pod json (7.55 KB, text/plain)
2017-05-09 20:07 UTC, Vikas Laad
no flags Details
desc pod (5.16 KB, text/plain)
2017-05-09 20:07 UTC, Vikas Laad
no flags Details
Testing log (34.53 KB, text/plain)
2017-05-11 21:32 UTC, Weibin Liang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3049 0 normal SHIPPED_LIVE OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update 2017-10-25 15:57:15 UTC

Description Vikas Laad 2017-05-09 20:04:16 UTC
Description of problem:
After running many builds one of the build pod shows <none> in the IP address. describe pod also has nothing in IP field.

proj27      cakephp-mysql-example-1-build   0/1       Completed   0          12m       <none>         ip-172-31-14-153.us-west-2.compute.internal


Version-Release number of selected component (if applicable):
openshift v3.6.65
kubernetes v1.6.1+5115d708d7
etcd 3.1.0

following docker versions, i have seen problem with all of following 
docker-common-1.12.6-17
docker-common-1.12.6-18
docker-common-1.12.6-19

Steps to Reproduce:
1. Create openshift clueter (1 master 1 infra 2 worker nodes)
2. Start concurrent builds
3. one of the pod shows no ip address after the build is complete

Actual results:
No IP address associated with build pod

Expected results:
IP address should show up

Additional info:

Comment 1 Vikas Laad 2017-05-09 20:07:00 UTC
Created attachment 1277441 [details]
pod json

Comment 2 Vikas Laad 2017-05-09 20:07:22 UTC
Created attachment 1277442 [details]
desc pod

Comment 4 Dan Williams 2017-05-10 03:57:32 UTC
Could you add the exact commands you're using to recreate the issue?  I've tried doing a build pod for the cake-php project, but running into:

$ oc process -f examples/quickstarts/cakephp-mysql.json  > /tmp/build.json
$ oc create -f /tmp/build.json
$ oc start-build cakephp-mysql-example
The ImageStreamTag "php:7.0" is invalid: from: Error resolving ImageStreamTag php:7.0 in namespace openshift: imagestreams.image.openshift.io "php" not found

Also, can you paste in the *entire* output of the build pod status your reference in the first comment?  I think you're doing "oc get builds" right?

Comment 5 Vikas Laad 2017-05-10 14:48:22 UTC
first command was 

oc get pods --all-namespaces -o wide

There were total 300 pods thats why I just created bz with the error one.

to create the bc I used the template from SVT repo

https://raw.githubusercontent.com/openshift/svt/master/openshift_scalability/content/quickstarts/cakephp/cakephp-build.json

One thing I forgot to mention while creating the bug was I was running concurrent build test. That means I created 30 build configs and was running them concurrently. I think I saw this issue happened with 20 pods out of total 300 pods.

Please let me know if you need anything else.

Comment 6 Ben Bennett 2017-05-10 18:38:32 UTC
Weibin, can you try to reproduce this please?

Comment 7 Weibin Liang 2017-05-10 19:42:33 UTC
(In reply to Vikas Laad from comment #5)
> first command was 
> 
> oc get pods --all-namespaces -o wide
> 
> There were total 300 pods thats why I just created bz with the error one.
> 
> to create the bc I used the template from SVT repo
> 
> https://raw.githubusercontent.com/openshift/svt/master/openshift_scalability/
> content/quickstarts/cakephp/cakephp-build.json
> 
> One thing I forgot to mention while creating the bug was I was running
> concurrent build test. That means I created 30 build configs and was running
> them concurrently. I think I saw this issue happened with 20 pods out of
> total 300 pods.
> 
> Please let me know if you need anything else.

Hi Vikas,

Could you let me know how to use above template to create 300 pods? Which oc commands I need to use? Thanks!

Comment 8 Vikas Laad 2017-05-10 19:58:11 UTC
Hi Weibin,

Here is how we create it

oc process -f https://raw.githubusercontent.com/openshift/svt/master/openshift_scalability/content/quickstarts/cakephp/cakephp-build.json | oc create -f -

This will create build config, after I creating it I keep doing

oc start-build cakephp-mysql-example

for creating multiple build configs I do it in multiple projects.

Comment 9 Weibin Liang 2017-05-11 21:32:09 UTC
Created attachment 1278037 [details]
Testing log

Comment 10 Weibin Liang 2017-05-11 21:33:26 UTC
Hi Vikas,

I create 300 pods under default namespace, all the pods got IP addresses, in my testing env, it will take about several hours to let 300 pods to get their IP addresses. 

The test log is attached, about half pod got IP address from 10.128.0.0, and another half got from 10.129.0.0.

Comment 11 Mike Fiedler 2017-05-12 00:44:52 UTC
Marking this as a regression for now.   In previous OCP releases, no instances of non-Error, non-Failed pods which did not receive IPs (or at least IPs reported in the pod details).

Comment 12 Meng Bo 2017-05-12 02:21:41 UTC
I can recreate the issue with 100 completed pods

# for i in `seq 100` ; do curl -s https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/completed-pod.json | sed s/complete
d-pod/cpod-$i/g | oc create -f - ; done

It will take a while to make all pods complete and 6 of them do not have IP.

Pod list: http://pastebin.test.redhat.com/483650

Comment 13 Mike Fiedler 2017-05-12 11:58:06 UTC
I think the reproducer in comment 12 is the most straightforward.   Thanks, Meng Bo!

Comment 16 Dan Williams 2017-05-15 04:56:20 UTC
Thanks, will try to reproduce this coming week.

Comment 17 Dan Williams 2017-06-09 15:53:08 UTC
Fairly easy to reproduce locally with even 20 build pods.  It appears to be a race with kubelet pod status reporting since the pods complete so quickly.

Comment 18 Dan Williams 2017-06-09 19:00:08 UTC
1) PLEG puts pod status into a cache.
2) PLEG then sends an event to kubelet
3) kubelet reads PLEG events in syncLoopIteration()
4) that calls HandlePodSyncs() which dispatchWork() for the event
5) dispatchWork() calls podWorkers.UpdatePod()
6) UpdatePod() starts a goroutine that calls managePodLoop()

<< race: PLEG runs again and reads pod status; pod has terminated and thus network status is no longer available.  (1) runs again and puts a new status with PodIP:"" into the cache >>

7) that goroutine will be scheduled at some arbitrary time in the future
8) managePodLoop() (in a goroutine) reads the PLEG cache for pod status
9) the status is sent to kubelet's syncPod() function, which converts the status to API status and sends to the api-server (again, from the goroutine started by UpdatePod())

The race is that, for pods that don't save the PodIP, between (6) and (8) the cache is updated again by PLEG with a "" PodIP.  For pods that don't hit the race, PLEG hasn't updated the cache with "" PodIP.

This begs the question, why is the PodIP *ever* shown after the pod is exited?  The pod is dead, the IP no longer valid, and the IP may well be recycled to another pod.  It doesn't make a lot of sense to show the "current pod IP" since the pod is dead and no longer has that IP.

The IP should be saved somewhere, but I'm not sure it's correct to show it in 'oc describe' as the PodIP or in "-o wide" as the current PodIP.

My current feeling is to either (a) close this as NOTABUG, or (b) create a PR to ensure the PodIP="" when the container is dead, so that the IP never shows up after container death.

Comment 19 Mike Fiedler 2017-06-09 19:06:10 UTC
As stated in comment 18 the pod IP should be saved after the pod has exited.  It is frequently useful for debug/forensic purposes.

Comment 20 Dan Williams 2017-06-09 19:34:30 UTC
Filed upstream issue https://github.com/kubernetes/kubernetes/issues/47265 for clarification on intended behavior.

Comment 21 Dan Williams 2017-06-09 19:35:29 UTC
(In reply to Mike Fiedler from comment #19)
> As stated in comment 18 the pod IP should be saved after the pod has exited.
> It is frequently useful for debug/forensic purposes.

Yes, it should be saved, but it doesn't need to be saved as the overall PodIP in the Pod Status.  It could be pushed as an event too, so that after pod death it's still available to see.

Comment 22 Dan Williams 2017-06-20 22:30:09 UTC
Upstream PR here: https://github.com/kubernetes/kubernetes/pull/47806

Comment 23 Ben Bennett 2017-06-23 12:40:40 UTC
*** Bug 1406338 has been marked as a duplicate of this bug. ***

Comment 24 Dan Williams 2017-09-28 17:23:33 UTC
This will be merged into origin as https://github.com/openshift/origin/pull/16464

Comment 25 Dan Williams 2017-09-29 22:05:55 UTC
this has been merged into origin

Comment 29 Yan Du 2017-10-10 05:16:10 UTC
Test on latest OCP
openshift v3.7.0-0.146.0
kubernetes v1.7.6+a08f5eeb62

Create 100 completed pods and all completed pods have IP address, the issue have been fixed.
http://pastebin.test.redhat.com/523078

Comment 31 errata-xmlrpc 2017-10-25 13:02:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049


Note You need to log in before you can comment on or make changes to this bug.