Bug 1571111

Summary: The desiredNumberScheduled of DS is incorrect
Product: OpenShift Container Platform Reporter: DeShuai Ma <dma>
Component: MasterAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED NOTABUG QA Contact: Wang Haoran <haowang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.10.0CC: aos-bugs, deads, jliggitt, jokerman, mmccomas, wmeng
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-03 20:36:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description DeShuai Ma 2018-04-24 06:34:49 UTC
Description of problem:
The desiredNumberScheduled of DS is incorrect, ref: https://bugzilla.redhat.com/show_bug.cgi?id=1501514#c25

Version-Release number of selected component (if applicable):
openshift v3.10.0-0.27.0
kubernetes v1.10.0+b81c8f8

How reproducible:
Always

Steps to Reproduce:
1. Create a ds and check the desiredNumberScheduled
[root@ip-172-18-9-197 ~]# oc get no
NAME                            STATUS    ROLES     AGE       VERSION
ip-172-18-11-225.ec2.internal   Ready     compute   5h        v1.10.0+b81c8f8
ip-172-18-12-238.ec2.internal   Ready     compute   5h        v1.10.0+b81c8f8
ip-172-18-9-197.ec2.internal    Ready     master    5h        v1.10.0+b81c8f8
[root@ip-172-18-9-197 ~]# oc get ds -n dma
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
hello-daemonset   3         2         2         2            2           <none>          1h
[root@ip-172-18-9-197 ~]# oc get ds hello-daemonset -n dma -o yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  creationTimestamp: 2018-04-24T05:13:28Z
  generation: 1
  labels:
    name: hello-daemonset
  name: hello-daemonset
  namespace: dma
  resourceVersion: "37816"
  selfLink: /apis/extensions/v1beta1/namespaces/dma/daemonsets/hello-daemonset
  uid: 3935778e-477e-11e8-8311-0e11fb53aa4e
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: hello-daemonset
  template:
    metadata:
      creationTimestamp: null
      labels:
        name: hello-daemonset
    spec:
      containers:
      - image: openshift/hello-openshift
        imagePullPolicy: Always
        name: registry
        ports:
        - containerPort: 80
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 10
  templateGeneration: 1
  updateStrategy:
    type: OnDelete
status:
  currentNumberScheduled: 2
  desiredNumberScheduled: 3
  numberAvailable: 2
  numberMisscheduled: 0
  numberReady: 2
  numberUnavailable: 1
  observedGeneration: 1
  updatedNumberScheduled: 2

Actual results:
1. desiredNumberScheduled is 3

Expected results:
1. desiredNumberScheduled is 2

Additional info:

Comment 1 Wang Haoran 2018-04-25 02:42:23 UTC
upstream tracked issue:
https://github.com/kubernetes/kubernetes/issues/53023

Comment 2 Tomáš Nožička 2018-04-25 05:35:33 UTC
Not sure that's it. I think in this case this is caused by the carry patch we have to target right nodes when project default node selector is present and we didn't patch the part counting status, but I'd have to check.

Comment 3 Jordan Liggitt 2018-05-03 15:49:10 UTC
> I think in this case this is caused by the carry patch we have to target right nodes when project default node selector is present and we didn't patch the part counting status, but I'd have to check

Adding David.

There was no node selector on the DS, indicating it wanted to run on all nodes, which is not allowed by the project's node selector.

I think the current behavior is accurate.

Comment 4 Jordan Liggitt 2018-05-03 15:51:12 UTC
if we wanted to act as though the DS limited itself to the nodes allowed by the project selector, we could do this:

if matches, matchErr := dsc.namespaceNodeSelectorMatches(node, ds); matchErr != nil {
  return false, false, false, matchErr
} else if !matches {
-  shouldSchedule = false
-  shouldContinueRunning = false
+  // This matches the behavior in the ErrNodeSelectorNotMatch case above
+  return false, false, false, nil
}


but that would make status not accurately reflect the intent expressed in the DS spec

Comment 5 David Eads 2018-05-03 16:25:01 UTC
The current behavior seems reasonable to me.  You tried to place yourself on every node and only got two.  I don't think I'm concerned enough about revealing the number of nodes in the cluster to adjust it.