Bug 1543727

Summary: Met daemonset quickly recreate pod issue
Product: OpenShift Container Platform Reporter: DeShuai Ma <dma>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED DUPLICATE QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: high    
Version: 3.9.0CC: aos-bugs, dma, jokerman, mfojtik, mmccomas, sdodson
Target Milestone: ---Keywords: Reopened
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-13 18:55:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 DeShuai Ma 2018-02-09 06:10:52 UTC
As there is some internal ip, make the comment private.

Comment 2 Tomáš Nožička 2018-02-12 13:12:44 UTC
Is the environment still running? I was hoping to encounter it again. The most important step is to find out why the pod is failed as the restart policy is "always". If you could capture the YAML for the failed pod it would be great.

Comment 3 DeShuai Ma 2018-02-23 01:58:23 UTC
Sorry, the env is not exist.

Comment 4 Tomáš Nožička 2018-02-27 13:55:38 UTC
There should be just 2 cases where pod with `restartPolicy: Always` can fail - eviction and failing to matchNodeSelector. I think this is the consequence of one of those scenarios and not an actual DS bug.

Feel free to re-open if you encounter it again but without additional info there is nothing to be done here.

Comment 5 DeShuai Ma 2018-03-12 09:12:54 UTC
reproduce on ocp 3.9.7; As project add projectConfig.defaultNodeSelector in pr https://github.com/openshift/openshift-ansible/pull/7364

If I create a ds with 'restartPolicy:Always' (it's default policy), it quickly recreate the pod.

[root@host-172-16-120-96 ~]# oc version
oc v3.9.7
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.16.120.96:8443
openshift v3.9.7

[root@host-172-16-120-96 ~]# oc adm new-project dma
Created project dma
[root@host-172-16-120-96 ~]# oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/daemonset/daemonset.yaml -n dma
daemonset "hello-daemonset" created
[root@host-172-16-120-96 ~]# oc get ds -n dma
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
hello-daemonset   2         2         0         2            0           <none>          4s
[root@host-172-16-120-96 ~]# oc get po -n dma
NAME                    READY     STATUS    RESTARTS   AGE
hello-daemonset-2d9z5   1/1       Running   0          9s
hello-daemonset-vgjdl   0/1       Pending   0          1s
[root@host-172-16-120-96 ~]# oc get po -n dma -w
NAME                    READY     STATUS    RESTARTS   AGE
hello-daemonset-2d9z5   1/1       Running   0          12s
hello-daemonset-ftbcm   0/1       Pending   0          0s
hello-daemonset-ftbcm   0/1       MatchNodeSelector   0         0s
hello-daemonset-ftbcm   0/1       Terminating   0         0s
hello-daemonset-ftbcm   0/1       Terminating   0         0s
hello-daemonset-559l4   0/1       Pending   0         0s
hello-daemonset-559l4   0/1       MatchNodeSelector   0         1s
hello-daemonset-559l4   0/1       Terminating   0         1s
hello-daemonset-559l4   0/1       Terminating   0         1s
hello-daemonset-7xfs5   0/1       Pending   0         0s
hello-daemonset-7xfs5   0/1       MatchNodeSelector   0         0s
hello-daemonset-7xfs5   0/1       Terminating   0         0s
hello-daemonset-7xfs5   0/1       Terminating   0         0s
hello-daemonset-hs4wg   0/1       Pending   0         0s
hello-daemonset-hs4wg   0/1       MatchNodeSelector   0         1s
hello-daemonset-hs4wg   0/1       Terminating   0         1s
hello-daemonset-hs4wg   0/1       Terminating   0         1s
hello-daemonset-qwlfn   0/1       Pending   0         0s
hello-daemonset-qwlfn   0/1       MatchNodeSelector   0         1s
hello-daemonset-qwlfn   0/1       Terminating   0         1s
hello-daemonset-qwlfn   0/1       Terminating   0         1s
hello-daemonset-sz2vv   0/1       Pending   0         0s
hello-daemonset-sz2vv   0/1       MatchNodeSelector   0         0s
hello-daemonset-sz2vv   0/1       Terminating   0         0s
hello-daemonset-sz2vv   0/1       Terminating   0         0s
hello-daemonset-hhpzs   0/1       Pending   0         0s
hello-daemonset-hhpzs   0/1       MatchNodeSelector   0         1s
hello-daemonset-hhpzs   0/1       Terminating   0         1s
hello-daemonset-hhpzs   0/1       Terminating   0         1s
hello-daemonset-f58q7   0/1       Pending   0         0s
hello-daemonset-f58q7   0/1       MatchNodeSelector   0         1s
hello-daemonset-f58q7   0/1       Terminating   0         1s
hello-daemonset-f58q7   0/1       Terminating   0         1s
hello-daemonset-ptw29   0/1       Pending   0         0s
hello-daemonset-ptw29   0/1       MatchNodeSelector   0         0s
hello-daemonset-ptw29   0/1       Terminating   0         0s
hello-daemonset-ptw29   0/1       Terminating   0         0s
hello-daemonset-khh5p   0/1       Pending   0         0s
hello-daemonset-khh5p   0/1       MatchNodeSelector   0         1s
hello-daemonset-khh5p   0/1       Terminating   0         1s
hello-daemonset-khh5p   0/1       Terminating   0         1s
hello-daemonset-zjspp   0/1       Pending   0         0s
hello-daemonset-zjspp   0/1       MatchNodeSelector   0         0s
hello-daemonset-zjspp   0/1       Terminating   0         0s
hello-daemonset-zjspp   0/1       Terminating   0         0s
hello-daemonset-p99h7   0/1       Pending   0         0s

Comment 6 DeShuai Ma 2018-03-12 09:28:03 UTC
we need limit the interval/rate to recreate pod

Comment 7 DeShuai Ma 2018-03-12 09:32:59 UTC
For short term fix the issue we need revert https://github.com/openshift/openshift-ansible/pull/7364

Comment 8 Tomáš Nožička 2018-03-12 20:38:11 UTC
dma good catch with the project default node selectors and ansible ;)

Project defaultNodeSelectors are incompatible with DaemonSets and we should avoid setting them.

detailed explanation here:
  https://bugzilla.redhat.com/show_bug.cgi?id=1501514#c9

Comment 9 Scott Dodson 2018-03-13 18:55:54 UTC

*** This bug has been marked as a duplicate of bug 1501514 ***