Description of problem: scheduler policy of BalancedResourceAllocation only include cpu and memory, not include volume count Version-Release number of selected component (if applicable): oc v3.11.0-0.10.0 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO How reproducible: always Steps to Reproduce: 1.modify /etc/origin/master/master-config.yaml, add: kubernetesMasterConfig: .... schedulerArguments: feature-gates: - BalanceAttachedNodeVolumes=true 2.modify /etc/origin/master/scheduler.json, as follows: { "apiVersion": "v1", "kind": "Policy", "predicates": [ { "name": "GeneralPredicates" } ], "priorities": [ { "name": "BalancedResourceAllocation", "weight": 1 } ] } 3.modify /etc/origin/master/master.env: DEBUG_LOGLEVEL=10 4.restart master controllers #master-restart controllers 5.create a pvc apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ebs spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi persistentVolumeReclaimPolicy: Retain 6.create a pod and print contrllers log kind: Pod apiVersion: v1 metadata: name: mypod labels: name: frontendhttp spec: containers: - name: myfrontend image: jhou/hello-openshift ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/tmp" name: aws volumes: - name: aws persistentVolumeClaim: claimName: ebs #master-logs controllers controllers 1>temp.txt 2>&1 temp.txt as follows: I0803 02:45:29.720581 1 factory.go:1181] About to try and schedule pod mypod I0803 02:45:29.720603 1 scheduler.go:447] Attempting to schedule pod: default/mypod I0803 02:45:29.720805 1 graph_builder.go:586] GraphBuilder process object: v1/Pod, namespace default, name mypod, uid 48b33eff-96c7-11e8-99f4-42010af0000c, event type add I0803 02:45:29.720934 1 disruption.go:328] addPod called on pod "mypod" I0803 02:45:29.720964 1 taint_manager.go:396] Noticed pod update: types.NamespacedName{Namespace:"default", Name:"mypod"} I0803 02:45:29.720994 1 disruption.go:403] No PodDisruptionBudgets found for pod mypod, PodDisruptionBudget controller will avoid syncing. I0803 02:45:29.721000 1 pvc_protection_controller.go:284] Got event on pod default/mypod I0803 02:45:29.721003 1 disruption.go:331] No matching pdb for pod "mypod" I0803 02:45:29.721041 1 pvc_protection_controller.go:145] Processing PVC default/ebs I0803 02:45:29.721076 1 pvc_protection_controller.go:148] Finished processing PVC default/ebs (6.247µs) I0803 02:45:29.721321 1 resource_allocation.go:66] mypod -> qe-minmli-311-node-2: BalancedResourceAllocation, capacity 4000 millicores 15495647232 memory bytes, total request 4850 millicores 17785737216 memory bytes, score 9 I0803 02:45:29.721348 1 resource_allocation.go:66] mypod -> qe-minmli-311-node-registry-router-1: BalancedResourceAllocation, capacity 4000 millicores 15495647232 memory bytes, total request 4850 millicores 17248866304 memory bytes, score 9 I0803 02:45:29.721358 1 resource_allocation.go:66] mypod -> qe-minmli-311-master-etcd-1: BalancedResourceAllocation, capacity 4000 millicores 15495647232 memory bytes, total request 5300 millicores 17938829312 memory bytes, score 8 I0803 02:45:29.721366 1 resource_allocation.go:66] mypod -> qe-minmli-311-node-1: BalancedResourceAllocation, capacity 4000 millicores 15495639040 memory bytes, total request 5150 millicores 17760563200 memory bytes, score 8 I0803 02:45:29.721378 1 generic_scheduler.go:676] Host qe-minmli-311-node-2 => Score 9 I0803 02:45:29.721385 1 generic_scheduler.go:676] Host qe-minmli-311-node-registry-router-1 => Score 9 I0803 02:45:29.721391 1 generic_scheduler.go:676] Host qe-minmli-311-master-etcd-1 => Score 8 I0803 02:45:29.721397 1 generic_scheduler.go:676] Host qe-minmli-311-node-1 => Score 8 I0803 02:45:29.721453 1 scheduler_binder.go:194] AssumePodVolumes for pod "default/mypod", node "qe-minmli-311-node-registry-router-1" I0803 02:45:29.721480 1 scheduler_binder.go:331] PVC "default/ebs" is fully bound to PV "pvc-ac5db247-96c6-11e8-99f4-42010af0000c" I0803 02:45:29.721491 1 scheduler_binder.go:197] AssumePodVolumes for pod "default/mypod", node "qe-minmli-311-node-registry-router-1": all PVCs bound and nothing to do I0803 02:45:29.721525 1 factory.go:1407] Attempting to bind mypod to qe-minmli-311-node-registry-router-1 [notice:scheduler algorithm only take CPU and MEM into account, not include volume count.] 7.Create a pod with 2 pvc and print contrllers log 8.Create a pod with 3 pvc and print contrllers log Actual results: Expected results: in step 6~8, scheduler algorithm should take CPU,MEM and volume count into account. Additional info: when not add "feature-gates:- BalanceAttachedNodeVolumes=true" in master-config.yaml(mean not execute step 1), and other steps are as the same as above, I get similar scheduler result(mean in step 6~8, each pod are scheduled to the same node) So I think the scheduler policy of BalancedResourceAllocation not include volume count.
I have tested that whenever alpha feature BalanceAttachedNodeVolumes is enabled, it takes volumes into account. You are not seeing the effect because you have just created one PV/PVC and it is not enough to see the impact of BalanceAttachedNodeVolumes. For real testing of this priority function, you will have to create more volumes and more nodes, and then you would check if the scheduler is spreading pods by taking even distribution of cpu, mem and volume into account. Anyway, i have sent following PR upstream which shows volumes (capacity and requested) whenever alpha feature BalanceAttachedNodeVolumes is enabled. This PR also fixes an issue, i found during this testing, where total requested values of cpu and mem were being reported incorrectly: https://github.com/kubernetes/kubernetes/pull/67094
After the above PR, the output will look like following that includes volumes too: I0807 19:33:17.810565 1 resource_allocation.go:69] mypod -> ip-172-18-0-4.ec2.internal: BalancedResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 600 millicores 1480589312 memory bytes 1 volumes, score 9 I0807 19:33:17.810613 1 resource_allocation.go:69] mypod -> ip-172-18-1-197.ec2.internal: BalancedResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 400 millicores 943718400 memory bytes 1 volumes, score 9 I0807 19:33:17.810663 1 resource_allocation.go:69] mypod -> ip-172-18-1-197.ec2.internal: LeastResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 400 millicores 943718400 memory bytes 1 volumes, score 8 I0807 19:33:17.810762 1 resource_allocation.go:69] mypod -> ip-172-18-0-7.ec2.internal: BalancedResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 900 millicores 1782579200 memory bytes 1 volumes, score 9 I0807 19:33:17.810810 1 resource_allocation.go:69] mypod -> ip-172-18-0-7.ec2.internal: LeastResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 900 millicores 1782579200 memory bytes 1 volumes, score 6 I0807 19:33:17.810720 1 resource_allocation.go:69] mypod -> ip-172-18-0-4.ec2.internal: LeastResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 600 millicores 1480589312 memory bytes 1 volumes, score 7
Here is the pick for origin: https://github.com/openshift/origin/pull/20603
Can you describe how to see BalanceAttachedNodeVolumes effect? for example I want to see log-output include volumes info. For a simple scenario,I create 3 pvc, and create a pod which only need one pvc. in this way, I only saw cpu and memory in log, but no volumn. Or volume should be other resource not pvc?
(In reply to MinLi from comment #6) > Can you describe how to see BalanceAttachedNodeVolumes effect? for example > I want to see log-output include volumes info. > For a simple scenario,I create 3 pvc, and create a pod which only need one > pvc. in this way, I only saw cpu and memory in log, but no volumn. Or volume > should be other resource not pvc? Are you testing with a build that has this commit https://github.com/openshift/origin/pull/20603?
Is there any update?
I need read https://github.com/openshift/origin/pull/20603? carefully. Later reply to you
https://github.com/aveshagarwal/origin/blob/d4a7fe442b583c8da4e46dbe833296759cb3de31/vendor/k8s.io/kubernetes/pkg/scheduler/algorithm/priorities/resource_allocation.go in above file and in func PriorityMap, the line after annotation“Check if the pod has volumes and this could be added to scorer function for balanced resource allocation” : if len(pod.Spec.Volumes) >= 0 && utilfeature.DefaultFeatureGate.Enabled(features.BalanceAttachedNodeVolumes) && nodeInfo.TransientInfo != nil ask: nodeInfo.TransientInfo mean what? How can I judge one node's nodeInfo.TransientInfo not null? Could you explain it in detail? Thank you.
I can see volumn info in log in below version: oc v3.11.0-0.25.0 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO verified!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days