Bug 1543727

Summary:	Met daemonset quickly recreate pod issue
Product:	OpenShift Container Platform	Reporter:	DeShuai Ma <dma>
Component:	Installer	Assignee:	Scott Dodson <sdodson>
Status:	CLOSED DUPLICATE	QA Contact:	DeShuai Ma <dma>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.9.0	CC:	aos-bugs, dma, jokerman, mfojtik, mmccomas, sdodson
Target Milestone:	---	Keywords:	Reopened
Target Release:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-03-13 18:55:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 DeShuai Ma 2018-02-09 06:10:52 UTC

As there is some internal ip, make the comment private.

Comment 2 Tomáš Nožička 2018-02-12 13:12:44 UTC

Is the environment still running? I was hoping to encounter it again. The most important step is to find out why the pod is failed as the restart policy is "always". If you could capture the YAML for the failed pod it would be great.

Comment 3 DeShuai Ma 2018-02-23 01:58:23 UTC

Sorry, the env is not exist.

Comment 4 Tomáš Nožička 2018-02-27 13:55:38 UTC

There should be just 2 cases where pod with `restartPolicy: Always` can fail - eviction and failing to matchNodeSelector. I think this is the consequence of one of those scenarios and not an actual DS bug.

Feel free to re-open if you encounter it again but without additional info there is nothing to be done here.

Comment 5 DeShuai Ma 2018-03-12 09:12:54 UTC

reproduce on ocp 3.9.7; As project add projectConfig.defaultNodeSelector in pr https://github.com/openshift/openshift-ansible/pull/7364

If I create a ds with 'restartPolicy:Always' (it's default policy), it quickly recreate the pod.

[root@host-172-16-120-96 ~]# oc version
oc v3.9.7
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.16.120.96:8443
openshift v3.9.7

[root@host-172-16-120-96 ~]# oc adm new-project dma
Created project dma
[root@host-172-16-120-96 ~]# oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/daemonset/daemonset.yaml -n dma
daemonset "hello-daemonset" created
[root@host-172-16-120-96 ~]# oc get ds -n dma
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
hello-daemonset   2         2         0         2            0           <none>          4s
[root@host-172-16-120-96 ~]# oc get po -n dma
NAME                    READY     STATUS    RESTARTS   AGE
hello-daemonset-2d9z5   1/1       Running   0          9s
hello-daemonset-vgjdl   0/1       Pending   0          1s
[root@host-172-16-120-96 ~]# oc get po -n dma -w
NAME                    READY     STATUS    RESTARTS   AGE
hello-daemonset-2d9z5   1/1       Running   0          12s
hello-daemonset-ftbcm   0/1       Pending   0          0s
hello-daemonset-ftbcm   0/1       MatchNodeSelector   0         0s
hello-daemonset-ftbcm   0/1       Terminating   0         0s
hello-daemonset-ftbcm   0/1       Terminating   0         0s
hello-daemonset-559l4   0/1       Pending   0         0s
hello-daemonset-559l4   0/1       MatchNodeSelector   0         1s
hello-daemonset-559l4   0/1       Terminating   0         1s
hello-daemonset-559l4   0/1       Terminating   0         1s
hello-daemonset-7xfs5   0/1       Pending   0         0s
hello-daemonset-7xfs5   0/1       MatchNodeSelector   0         0s
hello-daemonset-7xfs5   0/1       Terminating   0         0s
hello-daemonset-7xfs5   0/1       Terminating   0         0s
hello-daemonset-hs4wg   0/1       Pending   0         0s
hello-daemonset-hs4wg   0/1       MatchNodeSelector   0         1s
hello-daemonset-hs4wg   0/1       Terminating   0         1s
hello-daemonset-hs4wg   0/1       Terminating   0         1s
hello-daemonset-qwlfn   0/1       Pending   0         0s
hello-daemonset-qwlfn   0/1       MatchNodeSelector   0         1s
hello-daemonset-qwlfn   0/1       Terminating   0         1s
hello-daemonset-qwlfn   0/1       Terminating   0         1s
hello-daemonset-sz2vv   0/1       Pending   0         0s
hello-daemonset-sz2vv   0/1       MatchNodeSelector   0         0s
hello-daemonset-sz2vv   0/1       Terminating   0         0s
hello-daemonset-sz2vv   0/1       Terminating   0         0s
hello-daemonset-hhpzs   0/1       Pending   0         0s
hello-daemonset-hhpzs   0/1       MatchNodeSelector   0         1s
hello-daemonset-hhpzs   0/1       Terminating   0         1s
hello-daemonset-hhpzs   0/1       Terminating   0         1s
hello-daemonset-f58q7   0/1       Pending   0         0s
hello-daemonset-f58q7   0/1       MatchNodeSelector   0         1s
hello-daemonset-f58q7   0/1       Terminating   0         1s
hello-daemonset-f58q7   0/1       Terminating   0         1s
hello-daemonset-ptw29   0/1       Pending   0         0s
hello-daemonset-ptw29   0/1       MatchNodeSelector   0         0s
hello-daemonset-ptw29   0/1       Terminating   0         0s
hello-daemonset-ptw29   0/1       Terminating   0         0s
hello-daemonset-khh5p   0/1       Pending   0         0s
hello-daemonset-khh5p   0/1       MatchNodeSelector   0         1s
hello-daemonset-khh5p   0/1       Terminating   0         1s
hello-daemonset-khh5p   0/1       Terminating   0         1s
hello-daemonset-zjspp   0/1       Pending   0         0s
hello-daemonset-zjspp   0/1       MatchNodeSelector   0         0s
hello-daemonset-zjspp   0/1       Terminating   0         0s
hello-daemonset-zjspp   0/1       Terminating   0         0s
hello-daemonset-p99h7   0/1       Pending   0         0s

Comment 6 DeShuai Ma 2018-03-12 09:28:03 UTC

we need limit the interval/rate to recreate pod

Comment 7 DeShuai Ma 2018-03-12 09:32:59 UTC

For short term fix the issue we need revert https://github.com/openshift/openshift-ansible/pull/7364

Comment 8 Tomáš Nožička 2018-03-12 20:38:11 UTC

dma good catch with the project default node selectors and ansible ;)

Project defaultNodeSelectors are incompatible with DaemonSets and we should avoid setting them.

detailed explanation here:
  https://bugzilla.redhat.com/show_bug.cgi?id=1501514#c9

Comment 9 Scott Dodson 2018-03-13 18:55:54 UTC


*** This bug has been marked as a duplicate of bug 1501514 ***