Description of problem: After running many builds one of the build pod shows <none> in the IP address. describe pod also has nothing in IP field. proj27 cakephp-mysql-example-1-build 0/1 Completed 0 12m <none> ip-172-31-14-153.us-west-2.compute.internal Version-Release number of selected component (if applicable): openshift v3.6.65 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 following docker versions, i have seen problem with all of following docker-common-1.12.6-17 docker-common-1.12.6-18 docker-common-1.12.6-19 Steps to Reproduce: 1. Create openshift clueter (1 master 1 infra 2 worker nodes) 2. Start concurrent builds 3. one of the pod shows no ip address after the build is complete Actual results: No IP address associated with build pod Expected results: IP address should show up Additional info:
Created attachment 1277441 [details] pod json
Created attachment 1277442 [details] desc pod
Could you add the exact commands you're using to recreate the issue? I've tried doing a build pod for the cake-php project, but running into: $ oc process -f examples/quickstarts/cakephp-mysql.json > /tmp/build.json $ oc create -f /tmp/build.json $ oc start-build cakephp-mysql-example The ImageStreamTag "php:7.0" is invalid: from: Error resolving ImageStreamTag php:7.0 in namespace openshift: imagestreams.image.openshift.io "php" not found Also, can you paste in the *entire* output of the build pod status your reference in the first comment? I think you're doing "oc get builds" right?
first command was oc get pods --all-namespaces -o wide There were total 300 pods thats why I just created bz with the error one. to create the bc I used the template from SVT repo https://raw.githubusercontent.com/openshift/svt/master/openshift_scalability/content/quickstarts/cakephp/cakephp-build.json One thing I forgot to mention while creating the bug was I was running concurrent build test. That means I created 30 build configs and was running them concurrently. I think I saw this issue happened with 20 pods out of total 300 pods. Please let me know if you need anything else.
Weibin, can you try to reproduce this please?
(In reply to Vikas Laad from comment #5) > first command was > > oc get pods --all-namespaces -o wide > > There were total 300 pods thats why I just created bz with the error one. > > to create the bc I used the template from SVT repo > > https://raw.githubusercontent.com/openshift/svt/master/openshift_scalability/ > content/quickstarts/cakephp/cakephp-build.json > > One thing I forgot to mention while creating the bug was I was running > concurrent build test. That means I created 30 build configs and was running > them concurrently. I think I saw this issue happened with 20 pods out of > total 300 pods. > > Please let me know if you need anything else. Hi Vikas, Could you let me know how to use above template to create 300 pods? Which oc commands I need to use? Thanks!
Hi Weibin, Here is how we create it oc process -f https://raw.githubusercontent.com/openshift/svt/master/openshift_scalability/content/quickstarts/cakephp/cakephp-build.json | oc create -f - This will create build config, after I creating it I keep doing oc start-build cakephp-mysql-example for creating multiple build configs I do it in multiple projects.
Created attachment 1278037 [details] Testing log
Hi Vikas, I create 300 pods under default namespace, all the pods got IP addresses, in my testing env, it will take about several hours to let 300 pods to get their IP addresses. The test log is attached, about half pod got IP address from 10.128.0.0, and another half got from 10.129.0.0.
Marking this as a regression for now. In previous OCP releases, no instances of non-Error, non-Failed pods which did not receive IPs (or at least IPs reported in the pod details).
I can recreate the issue with 100 completed pods # for i in `seq 100` ; do curl -s https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/completed-pod.json | sed s/complete d-pod/cpod-$i/g | oc create -f - ; done It will take a while to make all pods complete and 6 of them do not have IP. Pod list: http://pastebin.test.redhat.com/483650
I think the reproducer in comment 12 is the most straightforward. Thanks, Meng Bo!
Thanks, will try to reproduce this coming week.
Fairly easy to reproduce locally with even 20 build pods. It appears to be a race with kubelet pod status reporting since the pods complete so quickly.
1) PLEG puts pod status into a cache. 2) PLEG then sends an event to kubelet 3) kubelet reads PLEG events in syncLoopIteration() 4) that calls HandlePodSyncs() which dispatchWork() for the event 5) dispatchWork() calls podWorkers.UpdatePod() 6) UpdatePod() starts a goroutine that calls managePodLoop() << race: PLEG runs again and reads pod status; pod has terminated and thus network status is no longer available. (1) runs again and puts a new status with PodIP:"" into the cache >> 7) that goroutine will be scheduled at some arbitrary time in the future 8) managePodLoop() (in a goroutine) reads the PLEG cache for pod status 9) the status is sent to kubelet's syncPod() function, which converts the status to API status and sends to the api-server (again, from the goroutine started by UpdatePod()) The race is that, for pods that don't save the PodIP, between (6) and (8) the cache is updated again by PLEG with a "" PodIP. For pods that don't hit the race, PLEG hasn't updated the cache with "" PodIP. This begs the question, why is the PodIP *ever* shown after the pod is exited? The pod is dead, the IP no longer valid, and the IP may well be recycled to another pod. It doesn't make a lot of sense to show the "current pod IP" since the pod is dead and no longer has that IP. The IP should be saved somewhere, but I'm not sure it's correct to show it in 'oc describe' as the PodIP or in "-o wide" as the current PodIP. My current feeling is to either (a) close this as NOTABUG, or (b) create a PR to ensure the PodIP="" when the container is dead, so that the IP never shows up after container death.
As stated in comment 18 the pod IP should be saved after the pod has exited. It is frequently useful for debug/forensic purposes.
Filed upstream issue https://github.com/kubernetes/kubernetes/issues/47265 for clarification on intended behavior.
(In reply to Mike Fiedler from comment #19) > As stated in comment 18 the pod IP should be saved after the pod has exited. > It is frequently useful for debug/forensic purposes. Yes, it should be saved, but it doesn't need to be saved as the overall PodIP in the Pod Status. It could be pushed as an event too, so that after pod death it's still available to see.
Upstream PR here: https://github.com/kubernetes/kubernetes/pull/47806
*** Bug 1406338 has been marked as a duplicate of this bug. ***
This will be merged into origin as https://github.com/openshift/origin/pull/16464
this has been merged into origin
Test on latest OCP openshift v3.7.0-0.146.0 kubernetes v1.7.6+a08f5eeb62 Create 100 completed pods and all completed pods have IP address, the issue have been fixed. http://pastebin.test.redhat.com/523078
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049