Bug 1308824
Summary: | [platformmanagement_public_596]Delete daemonset always need wait a long time | ||
---|---|---|---|
Product: | OKD | Reporter: | DeShuai Ma <dma> |
Component: | Pod | Assignee: | Paul Weil <pweil> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | DeShuai Ma <dma> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.x | CC: | agoldste, aos-bugs, dma, mmccomas, pweil |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-05-12 17:10:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
DeShuai Ma
2016-02-16 08:51:56 UTC
Delete daemonset with timeout, there is two pod, but only one pod is deleted successfully when delete daemonset. Misscheduled is 1 [root@dhcp-128-7 dma]# oc delete daemonset hello-daemonset -n dma1 error: timed out waiting for the condition [root@dhcp-128-7 dma]# [root@dhcp-128-7 dma]# oc get daemonset -n dma1 NAME CONTAINER(S) IMAGE(S) SELECTOR NODE-SELECTOR hello-daemonset registry openshift/hello-openshift 0ccb5de4-d61c-11e5-b46d-4439c48d4f6b=0ccb5e2c-d61c-11e5-b46d-4439c48d4f6b [root@dhcp-128-7 dma]# oc get pod -n dma1 NAME READY STATUS RESTARTS AGE ruby-hello-world-1-npcyj 1/1 Running 0 4m [root@dhcp-128-7 dma]# oc describe daemonset hello-daemonset -n dma1 Name: hello-daemonset Image(s): openshift/hello-openshift Selector: Node-Selector: 0ccb5de4-d61c-11e5-b46d-4439c48d4f6b=0ccb5e2c-d61c-11e5-b46d-4439c48d4f6b Labels: name=hello-daemonset Desired Number of Nodes Scheduled: 0 Current Number of Nodes Scheduled: 0 Number of Nodes Misscheduled: 1 Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed No events. I've debugged this. The client must currently wait for the daemonset controller to clean out all the pods before issuing a delete. Deleting the DS first results in the pods being left in a running state. The controller itself is designed to synchronize a DS one by one so that it does not race with multiple commands for the same DS. It may be possible to let DS have a terminating state like a namespace and allow the DS to be deleted by the system once all pods are cleaned up but this is a larger change. Will discuss with the ClusterInfra team. https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/k8s.io/kubernetes/pkg/kubectl/stop.go#L275 Andy, do you have an opinion on the above comment? I can confirm that I have seen deleting the above DS, with 1 pod on a 1-node all-in-one setup, take anywhere from 1-2 seconds (which I'd say is the average) to 30 seconds or more. One time it seemed to hang indefinitely, and I got impatient and just hit ctrl-c. I think this deserves some more investigation. It's possible there's some sort of race between the kube and origin finalizers, maybe. Removing UpcomingRelease as there might be a bad bug somewhere in there. Amending my opinion. It appears that this is caused by the invalid selector in the yaml: selector: name: hello-daemonset it is not in the correct format for unversioned.LabelSelector and defaulting will add the selector back for you if you omit it. Please remove this selector and retest to see if you can reproduce long wait times for a delete. After changing the yaml I timed this for 100 runs which all stayed steady between 2-5 seconds. https://gist.github.com/pweil-/459fe5e77236d3a91f1a for reference here is the exact yaml I was using: [pweil@localhost origin]$ cat paul_temp/daemonset2.yaml apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: hello-daemonset spec: template: metadata: labels: name: hello-daemonset spec: containers: - image: openshift/hello-openshift imagePullPolicy: Always name: registry ports: - containerPort: 80 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log serviceAccount: default terminationGracePeriodSeconds: 1 Use the new selector in yaml, verify this bug. [root@ip-172-18-6-122 sample-app]# openshift version openshift v1.1.3-172-g2f499da-dirty kubernetes v1.2.0-alpha.7-703-gbc4550d etcd 2.2.5 Steps: [root@ip-172-18-6-122 sample-app]# oc create -f daemonset.yaml -n dma daemonset "hello-daemonset" created [root@ip-172-18-6-122 sample-app]# oc get pod -n dma NAME READY STATUS RESTARTS AGE hello-daemonset-wgbr4 1/1 Running 0 4s [root@ip-172-18-6-122 sample-app]# oc get ds -n dma NAME CONTAINER(S) IMAGE(S) SELECTOR NODE-SELECTOR hello-daemonset registry openshift/hello-openshift name in (hello-daemonset) <none> [root@ip-172-18-6-122 sample-app]# time oc delete ds/hello-daemonset -n dma daemonset "hello-daemonset" deleted real 0m1.075s user 0m0.063s sys 0m0.009s [root@ip-172-18-6-122 sample-app]# cat daemonset.yaml apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: hello-daemonset spec: selector: matchLabels: name: hello-daemonset template: metadata: labels: name: hello-daemonset spec: containers: - image: openshift/hello-openshift imagePullPolicy: Always name: registry ports: - containerPort: 80 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log serviceAccount: default terminationGracePeriodSeconds: 1 |