1308824 – [platformmanagement_public_596]Delete daemonset always need wait a long time

Bug 1308824 - [platformmanagement_public_596]Delete daemonset always need wait a long time

Summary: [platformmanagement_public_596]Delete daemonset always need wait a long time

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Pod
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Paul Weil
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-16 08:51 UTC by DeShuai Ma
Modified:	2016-05-12 17:10 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-05-12 17:10:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description DeShuai Ma 2016-02-16 08:51:56 UTC

Description of problem:
1.When delete a daemonset it always wait a long time, it should delete daemonset immediately.
2.When change node label, it always need a long time to delete the pod,so "Number of Nodes Misscheduled" will keep not 0 when describe daemonset.

Version-Release number of selected component (if applicable):
openshift v1.1.2-276-gabe8291-dirty
kubernetes v1.2.0-origin
etcd 2.2.2+git

How reproducible:
Always

Steps to Reproduce:
1.Create a daemonset
$ cat daemonset.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: hello-daemonset
spec:
  selector:
    name: hello-daemonset
  template:
    metadata:
      labels:
        name: hello-daemonset
    spec:
      containers:
      - image: openshift/hello-openshift
        imagePullPolicy: Always
        name: registry
        ports:
        - containerPort: 80
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
      serviceAccount: default
      terminationGracePeriodSeconds: 1
$ oc create -f daemonset.yaml -n dma

2.Check the daemonset
$ oc get daemonset -n dma

3.Delete the daemonset
$ oc delete daemonset hello-daemonset -n dma

Actual results:
3.When delete it always wait a long time, then delete daemonset.

Expected results:
3.It should delete daemonset immediately

Additional info:
[root@ip-172-18-2-109 sample-app]# oc describe daemonset hello-daemonset -n dma
Name:		hello-daemonset
Image(s):	openshift/hello-openshift
Selector:	
Node-Selector:	daemon=yes
Labels:		name=hello-daemonset
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Misscheduled: 1
Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed
No events.

Comment 1 DeShuai Ma 2016-02-18 08:55:11 UTC

Delete daemonset with timeout, there is two pod, but only one pod is deleted successfully when delete daemonset. Misscheduled is 1
[root@dhcp-128-7 dma]# oc delete daemonset hello-daemonset -n dma1
error: timed out waiting for the condition
[root@dhcp-128-7 dma]# 
[root@dhcp-128-7 dma]# oc get daemonset -n dma1
NAME              CONTAINER(S)   IMAGE(S)                    SELECTOR   NODE-SELECTOR
hello-daemonset   registry       openshift/hello-openshift              0ccb5de4-d61c-11e5-b46d-4439c48d4f6b=0ccb5e2c-d61c-11e5-b46d-4439c48d4f6b
[root@dhcp-128-7 dma]# oc get pod -n dma1
NAME                       READY     STATUS    RESTARTS   AGE
ruby-hello-world-1-npcyj   1/1       Running   0          4m
[root@dhcp-128-7 dma]# oc describe daemonset hello-daemonset -n dma1
Name:		hello-daemonset
Image(s):	openshift/hello-openshift
Selector:	
Node-Selector:	0ccb5de4-d61c-11e5-b46d-4439c48d4f6b=0ccb5e2c-d61c-11e5-b46d-4439c48d4f6b
Labels:		name=hello-daemonset
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Misscheduled: 1
Pods Status:	1 Running / 0 Waiting / 0 Succeeded / 0 Failed
No events.

Comment 2 Paul Weil 2016-02-18 22:57:17 UTC

I've debugged this.  The client must currently wait for the daemonset controller to clean out all the pods before issuing a delete.  Deleting the DS first results in the pods being left in a running state.

The controller itself is designed to synchronize a DS one by one so that it does not race with multiple commands for the same DS.  It may be possible to let DS have a terminating state like a namespace and allow the DS to be deleted by the system once all pods are cleaned up but this is a larger change.

Will discuss with the ClusterInfra team.

https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/k8s.io/kubernetes/pkg/kubectl/stop.go#L275

Comment 3 Paul Weil 2016-02-18 22:57:41 UTC

Andy, do you have an opinion on the above comment?

Comment 4 Andy Goldstein 2016-02-19 15:59:58 UTC

I can confirm that I have seen deleting the above DS, with 1 pod on a 1-node all-in-one setup, take anywhere from 1-2 seconds (which I'd say is the average) to 30 seconds or more. One time it seemed to hang indefinitely, and I got impatient and just hit ctrl-c. I think this deserves some more investigation. It's possible there's some sort of race between the kube and origin finalizers, maybe.

Removing UpcomingRelease as there might be a bad bug somewhere in there.

Comment 5 Paul Weil 2016-02-19 18:37:34 UTC

Amending my opinion.  It appears that this is caused by the invalid selector in the yaml:

selector:
    name: hello-daemonset

it is not in the correct format for unversioned.LabelSelector and defaulting will add the selector back for you if you omit it.  Please remove this selector and retest to see if you can reproduce long wait times for a delete.

After changing the yaml I timed this for 100 runs which all stayed steady between 2-5 seconds.

https://gist.github.com/pweil-/459fe5e77236d3a91f1a

Comment 6 Paul Weil 2016-02-19 18:40:48 UTC

for reference here is the exact yaml I was using:

[pweil@localhost origin]$ cat paul_temp/daemonset2.yaml 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: hello-daemonset
spec:
  template:
    metadata:
      labels:
        name: hello-daemonset
    spec:
      containers:
      - image: openshift/hello-openshift
        imagePullPolicy: Always
        name: registry
        ports:
        - containerPort: 80
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
      serviceAccount: default
      terminationGracePeriodSeconds: 1

Comment 7 DeShuai Ma 2016-02-22 03:00:31 UTC

Use the new selector in yaml, verify this bug.
[root@ip-172-18-6-122 sample-app]# openshift version
openshift v1.1.3-172-g2f499da-dirty
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5

Steps:
[root@ip-172-18-6-122 sample-app]# oc create -f daemonset.yaml -n dma
daemonset "hello-daemonset" created
[root@ip-172-18-6-122 sample-app]# oc get pod -n dma
NAME                    READY     STATUS    RESTARTS   AGE
hello-daemonset-wgbr4   1/1       Running   0          4s
[root@ip-172-18-6-122 sample-app]# oc get ds -n dma
NAME              CONTAINER(S)   IMAGE(S)                    SELECTOR                    NODE-SELECTOR
hello-daemonset   registry       openshift/hello-openshift   name in (hello-daemonset)   <none>
[root@ip-172-18-6-122 sample-app]# time oc delete ds/hello-daemonset -n dma
daemonset "hello-daemonset" deleted

real	0m1.075s
user	0m0.063s
sys	0m0.009s
[root@ip-172-18-6-122 sample-app]# cat daemonset.yaml 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: hello-daemonset
spec:
  selector:
      matchLabels:
        name: hello-daemonset
  template:
    metadata:
      labels:
        name: hello-daemonset
    spec:
      containers:
      - image: openshift/hello-openshift
        imagePullPolicy: Always
        name: registry
        ports:
        - containerPort: 80
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
      serviceAccount: default
      terminationGracePeriodSeconds: 1

Note You need to log in before you can comment on or make changes to this bug.