Bug 1543727 - Met daemonset quickly recreate pod issue
Summary: Met daemonset quickly recreate pod issue
Keywords:
Status: CLOSED DUPLICATE of bug 1501514
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.0
Assignee: Scott Dodson
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-09 06:09 UTC by DeShuai Ma
Modified: 2018-03-13 18:55 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-13 18:55:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 1 DeShuai Ma 2018-02-09 06:10:52 UTC
As there is some internal ip, make the comment private.

Comment 2 Tomáš Nožička 2018-02-12 13:12:44 UTC
Is the environment still running? I was hoping to encounter it again. The most important step is to find out why the pod is failed as the restart policy is "always". If you could capture the YAML for the failed pod it would be great.

Comment 3 DeShuai Ma 2018-02-23 01:58:23 UTC
Sorry, the env is not exist.

Comment 4 Tomáš Nožička 2018-02-27 13:55:38 UTC
There should be just 2 cases where pod with `restartPolicy: Always` can fail - eviction and failing to matchNodeSelector. I think this is the consequence of one of those scenarios and not an actual DS bug.

Feel free to re-open if you encounter it again but without additional info there is nothing to be done here.

Comment 5 DeShuai Ma 2018-03-12 09:12:54 UTC
reproduce on ocp 3.9.7; As project add projectConfig.defaultNodeSelector in pr https://github.com/openshift/openshift-ansible/pull/7364

If I create a ds with 'restartPolicy:Always' (it's default policy), it quickly recreate the pod.

[root@host-172-16-120-96 ~]# oc version
oc v3.9.7
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.16.120.96:8443
openshift v3.9.7

[root@host-172-16-120-96 ~]# oc adm new-project dma
Created project dma
[root@host-172-16-120-96 ~]# oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/daemonset/daemonset.yaml -n dma
daemonset "hello-daemonset" created
[root@host-172-16-120-96 ~]# oc get ds -n dma
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
hello-daemonset   2         2         0         2            0           <none>          4s
[root@host-172-16-120-96 ~]# oc get po -n dma
NAME                    READY     STATUS    RESTARTS   AGE
hello-daemonset-2d9z5   1/1       Running   0          9s
hello-daemonset-vgjdl   0/1       Pending   0          1s
[root@host-172-16-120-96 ~]# oc get po -n dma -w
NAME                    READY     STATUS    RESTARTS   AGE
hello-daemonset-2d9z5   1/1       Running   0          12s
hello-daemonset-ftbcm   0/1       Pending   0          0s
hello-daemonset-ftbcm   0/1       MatchNodeSelector   0         0s
hello-daemonset-ftbcm   0/1       Terminating   0         0s
hello-daemonset-ftbcm   0/1       Terminating   0         0s
hello-daemonset-559l4   0/1       Pending   0         0s
hello-daemonset-559l4   0/1       MatchNodeSelector   0         1s
hello-daemonset-559l4   0/1       Terminating   0         1s
hello-daemonset-559l4   0/1       Terminating   0         1s
hello-daemonset-7xfs5   0/1       Pending   0         0s
hello-daemonset-7xfs5   0/1       MatchNodeSelector   0         0s
hello-daemonset-7xfs5   0/1       Terminating   0         0s
hello-daemonset-7xfs5   0/1       Terminating   0         0s
hello-daemonset-hs4wg   0/1       Pending   0         0s
hello-daemonset-hs4wg   0/1       MatchNodeSelector   0         1s
hello-daemonset-hs4wg   0/1       Terminating   0         1s
hello-daemonset-hs4wg   0/1       Terminating   0         1s
hello-daemonset-qwlfn   0/1       Pending   0         0s
hello-daemonset-qwlfn   0/1       MatchNodeSelector   0         1s
hello-daemonset-qwlfn   0/1       Terminating   0         1s
hello-daemonset-qwlfn   0/1       Terminating   0         1s
hello-daemonset-sz2vv   0/1       Pending   0         0s
hello-daemonset-sz2vv   0/1       MatchNodeSelector   0         0s
hello-daemonset-sz2vv   0/1       Terminating   0         0s
hello-daemonset-sz2vv   0/1       Terminating   0         0s
hello-daemonset-hhpzs   0/1       Pending   0         0s
hello-daemonset-hhpzs   0/1       MatchNodeSelector   0         1s
hello-daemonset-hhpzs   0/1       Terminating   0         1s
hello-daemonset-hhpzs   0/1       Terminating   0         1s
hello-daemonset-f58q7   0/1       Pending   0         0s
hello-daemonset-f58q7   0/1       MatchNodeSelector   0         1s
hello-daemonset-f58q7   0/1       Terminating   0         1s
hello-daemonset-f58q7   0/1       Terminating   0         1s
hello-daemonset-ptw29   0/1       Pending   0         0s
hello-daemonset-ptw29   0/1       MatchNodeSelector   0         0s
hello-daemonset-ptw29   0/1       Terminating   0         0s
hello-daemonset-ptw29   0/1       Terminating   0         0s
hello-daemonset-khh5p   0/1       Pending   0         0s
hello-daemonset-khh5p   0/1       MatchNodeSelector   0         1s
hello-daemonset-khh5p   0/1       Terminating   0         1s
hello-daemonset-khh5p   0/1       Terminating   0         1s
hello-daemonset-zjspp   0/1       Pending   0         0s
hello-daemonset-zjspp   0/1       MatchNodeSelector   0         0s
hello-daemonset-zjspp   0/1       Terminating   0         0s
hello-daemonset-zjspp   0/1       Terminating   0         0s
hello-daemonset-p99h7   0/1       Pending   0         0s

Comment 6 DeShuai Ma 2018-03-12 09:28:03 UTC
we need limit the interval/rate to recreate pod

Comment 7 DeShuai Ma 2018-03-12 09:32:59 UTC
For short term fix the issue we need revert https://github.com/openshift/openshift-ansible/pull/7364

Comment 8 Tomáš Nožička 2018-03-12 20:38:11 UTC
dma good catch with the project default node selectors and ansible ;)

Project defaultNodeSelectors are incompatible with DaemonSets and we should avoid setting them.

detailed explanation here:
  https://bugzilla.redhat.com/show_bug.cgi?id=1501514#c9

Comment 9 Scott Dodson 2018-03-13 18:55:54 UTC

*** This bug has been marked as a duplicate of bug 1501514 ***


Note You need to log in before you can comment on or make changes to this bug.