| Summary: | [platformmanagement_public_596]Delete daemonset always need wait a long time | ||
|---|---|---|---|
| Product: | OKD | Reporter: | DeShuai Ma <dma> |
| Component: | Pod | Assignee: | Paul Weil <pweil> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | DeShuai Ma <dma> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.x | CC: | agoldste, aos-bugs, dma, mmccomas, pweil |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-12 17:10:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Delete daemonset with timeout, there is two pod, but only one pod is deleted successfully when delete daemonset. Misscheduled is 1 [root@dhcp-128-7 dma]# oc delete daemonset hello-daemonset -n dma1 error: timed out waiting for the condition [root@dhcp-128-7 dma]# [root@dhcp-128-7 dma]# oc get daemonset -n dma1 NAME CONTAINER(S) IMAGE(S) SELECTOR NODE-SELECTOR hello-daemonset registry openshift/hello-openshift 0ccb5de4-d61c-11e5-b46d-4439c48d4f6b=0ccb5e2c-d61c-11e5-b46d-4439c48d4f6b [root@dhcp-128-7 dma]# oc get pod -n dma1 NAME READY STATUS RESTARTS AGE ruby-hello-world-1-npcyj 1/1 Running 0 4m [root@dhcp-128-7 dma]# oc describe daemonset hello-daemonset -n dma1 Name: hello-daemonset Image(s): openshift/hello-openshift Selector: Node-Selector: 0ccb5de4-d61c-11e5-b46d-4439c48d4f6b=0ccb5e2c-d61c-11e5-b46d-4439c48d4f6b Labels: name=hello-daemonset Desired Number of Nodes Scheduled: 0 Current Number of Nodes Scheduled: 0 Number of Nodes Misscheduled: 1 Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed No events. I've debugged this. The client must currently wait for the daemonset controller to clean out all the pods before issuing a delete. Deleting the DS first results in the pods being left in a running state. The controller itself is designed to synchronize a DS one by one so that it does not race with multiple commands for the same DS. It may be possible to let DS have a terminating state like a namespace and allow the DS to be deleted by the system once all pods are cleaned up but this is a larger change. Will discuss with the ClusterInfra team. https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/k8s.io/kubernetes/pkg/kubectl/stop.go#L275 Andy, do you have an opinion on the above comment? I can confirm that I have seen deleting the above DS, with 1 pod on a 1-node all-in-one setup, take anywhere from 1-2 seconds (which I'd say is the average) to 30 seconds or more. One time it seemed to hang indefinitely, and I got impatient and just hit ctrl-c. I think this deserves some more investigation. It's possible there's some sort of race between the kube and origin finalizers, maybe. Removing UpcomingRelease as there might be a bad bug somewhere in there. Amending my opinion. It appears that this is caused by the invalid selector in the yaml:
selector:
name: hello-daemonset
it is not in the correct format for unversioned.LabelSelector and defaulting will add the selector back for you if you omit it. Please remove this selector and retest to see if you can reproduce long wait times for a delete.
After changing the yaml I timed this for 100 runs which all stayed steady between 2-5 seconds.
https://gist.github.com/pweil-/459fe5e77236d3a91f1a
for reference here is the exact yaml I was using:
[pweil@localhost origin]$ cat paul_temp/daemonset2.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: hello-daemonset
spec:
template:
metadata:
labels:
name: hello-daemonset
spec:
containers:
- image: openshift/hello-openshift
imagePullPolicy: Always
name: registry
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
serviceAccount: default
terminationGracePeriodSeconds: 1
Use the new selector in yaml, verify this bug.
[root@ip-172-18-6-122 sample-app]# openshift version
openshift v1.1.3-172-g2f499da-dirty
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5
Steps:
[root@ip-172-18-6-122 sample-app]# oc create -f daemonset.yaml -n dma
daemonset "hello-daemonset" created
[root@ip-172-18-6-122 sample-app]# oc get pod -n dma
NAME READY STATUS RESTARTS AGE
hello-daemonset-wgbr4 1/1 Running 0 4s
[root@ip-172-18-6-122 sample-app]# oc get ds -n dma
NAME CONTAINER(S) IMAGE(S) SELECTOR NODE-SELECTOR
hello-daemonset registry openshift/hello-openshift name in (hello-daemonset) <none>
[root@ip-172-18-6-122 sample-app]# time oc delete ds/hello-daemonset -n dma
daemonset "hello-daemonset" deleted
real 0m1.075s
user 0m0.063s
sys 0m0.009s
[root@ip-172-18-6-122 sample-app]# cat daemonset.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: hello-daemonset
spec:
selector:
matchLabels:
name: hello-daemonset
template:
metadata:
labels:
name: hello-daemonset
spec:
containers:
- image: openshift/hello-openshift
imagePullPolicy: Always
name: registry
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
serviceAccount: default
terminationGracePeriodSeconds: 1
|
Description of problem: 1.When delete a daemonset it always wait a long time, it should delete daemonset immediately. 2.When change node label, it always need a long time to delete the pod,so "Number of Nodes Misscheduled" will keep not 0 when describe daemonset. Version-Release number of selected component (if applicable): openshift v1.1.2-276-gabe8291-dirty kubernetes v1.2.0-origin etcd 2.2.2+git How reproducible: Always Steps to Reproduce: 1.Create a daemonset $ cat daemonset.yaml apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: hello-daemonset spec: selector: name: hello-daemonset template: metadata: labels: name: hello-daemonset spec: containers: - image: openshift/hello-openshift imagePullPolicy: Always name: registry ports: - containerPort: 80 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log serviceAccount: default terminationGracePeriodSeconds: 1 $ oc create -f daemonset.yaml -n dma 2.Check the daemonset $ oc get daemonset -n dma 3.Delete the daemonset $ oc delete daemonset hello-daemonset -n dma Actual results: 3.When delete it always wait a long time, then delete daemonset. Expected results: 3.It should delete daemonset immediately Additional info: [root@ip-172-18-2-109 sample-app]# oc describe daemonset hello-daemonset -n dma Name: hello-daemonset Image(s): openshift/hello-openshift Selector: Node-Selector: daemon=yes Labels: name=hello-daemonset Desired Number of Nodes Scheduled: 0 Current Number of Nodes Scheduled: 0 Number of Nodes Misscheduled: 1 Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed No events.