Bug 1611988

Summary: scheduler policy of BalancedResourceAllocation not include volume count
Product: OpenShift Container Platform Reporter: MinLi <minmli>
Component: NodeAssignee: Avesh Agarwal <avagarwa>
Status: CLOSED ERRATA QA Contact: MinLi <minmli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, avagarwa, jokerman, minmli, mmccomas, sjenning
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:23:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description MinLi 2018-08-03 08:03:00 UTC
Description of problem:
scheduler policy of BalancedResourceAllocation only include cpu and memory, not include volume count

Version-Release number of selected component (if applicable):
oc v3.11.0-0.10.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

How reproducible:
always

Steps to Reproduce:
1.modify /etc/origin/master/master-config.yaml, add:
kubernetesMasterConfig:
....
  schedulerArguments:
    feature-gates:
    - BalanceAttachedNodeVolumes=true

2.modify /etc/origin/master/scheduler.json, as follows:
{
    "apiVersion": "v1",
    "kind": "Policy",
    "predicates": [
        {
            "name": "GeneralPredicates"
        }
    ],
    "priorities": [
        {
            "name": "BalancedResourceAllocation",
            "weight": 1
        }
    ]
}

3.modify /etc/origin/master/master.env:
DEBUG_LOGLEVEL=10

4.restart master controllers
#master-restart controllers

5.create a pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  persistentVolumeReclaimPolicy: Retain

6.create a pod and print contrllers log
kind: Pod
apiVersion: v1
metadata:
  name: mypod
  labels:
    name: frontendhttp
spec:
  containers:
    - name: myfrontend
      image: jhou/hello-openshift
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/tmp"
          name: aws
  volumes:
    - name: aws
      persistentVolumeClaim:
        claimName: ebs

#master-logs controllers controllers  1>temp.txt 2>&1  
temp.txt as follows:
I0803 02:45:29.720581       1 factory.go:1181] About to try and schedule pod mypod
I0803 02:45:29.720603       1 scheduler.go:447] Attempting to schedule pod: default/mypod
I0803 02:45:29.720805       1 graph_builder.go:586] GraphBuilder process object: v1/Pod, namespace default, name mypod, uid 48b33eff-96c7-11e8-99f4-42010af0000c, event type add
I0803 02:45:29.720934       1 disruption.go:328] addPod called on pod "mypod"
I0803 02:45:29.720964       1 taint_manager.go:396] Noticed pod update: types.NamespacedName{Namespace:"default", Name:"mypod"}
I0803 02:45:29.720994       1 disruption.go:403] No PodDisruptionBudgets found for pod mypod, PodDisruptionBudget controller will avoid syncing.
I0803 02:45:29.721000       1 pvc_protection_controller.go:284] Got event on pod default/mypod
I0803 02:45:29.721003       1 disruption.go:331] No matching pdb for pod "mypod"
I0803 02:45:29.721041       1 pvc_protection_controller.go:145] Processing PVC default/ebs
I0803 02:45:29.721076       1 pvc_protection_controller.go:148] Finished processing PVC default/ebs (6.247µs)
I0803 02:45:29.721321       1 resource_allocation.go:66] mypod -> qe-minmli-311-node-2: BalancedResourceAllocation, capacity 4000 millicores 15495647232 memory bytes, total request 4850 millicores 17785737216 memory bytes, score 9
I0803 02:45:29.721348       1 resource_allocation.go:66] mypod -> qe-minmli-311-node-registry-router-1: BalancedResourceAllocation, capacity 4000 millicores 15495647232 memory bytes, total request 4850 millicores 17248866304 memory bytes, score 9
I0803 02:45:29.721358       1 resource_allocation.go:66] mypod -> qe-minmli-311-master-etcd-1: BalancedResourceAllocation, capacity 4000 millicores 15495647232 memory bytes, total request 5300 millicores 17938829312 memory bytes, score 8
I0803 02:45:29.721366       1 resource_allocation.go:66] mypod -> qe-minmli-311-node-1: BalancedResourceAllocation, capacity 4000 millicores 15495639040 memory bytes, total request 5150 millicores 17760563200 memory bytes, score 8
I0803 02:45:29.721378       1 generic_scheduler.go:676] Host qe-minmli-311-node-2 => Score 9
I0803 02:45:29.721385       1 generic_scheduler.go:676] Host qe-minmli-311-node-registry-router-1 => Score 9
I0803 02:45:29.721391       1 generic_scheduler.go:676] Host qe-minmli-311-master-etcd-1 => Score 8
I0803 02:45:29.721397       1 generic_scheduler.go:676] Host qe-minmli-311-node-1 => Score 8
I0803 02:45:29.721453       1 scheduler_binder.go:194] AssumePodVolumes for pod "default/mypod", node "qe-minmli-311-node-registry-router-1"
I0803 02:45:29.721480       1 scheduler_binder.go:331] PVC "default/ebs" is fully bound to PV "pvc-ac5db247-96c6-11e8-99f4-42010af0000c"
I0803 02:45:29.721491       1 scheduler_binder.go:197] AssumePodVolumes for pod "default/mypod", node "qe-minmli-311-node-registry-router-1": all PVCs bound and nothing to do
I0803 02:45:29.721525       1 factory.go:1407] Attempting to bind mypod to qe-minmli-311-node-registry-router-1

[notice:scheduler algorithm only take CPU and MEM into account, not include volume count.]

7.Create a pod with 2 pvc and print contrllers log

8.Create a pod with 3 pvc and print contrllers log

Actual results:


Expected results:
in step 6~8, scheduler algorithm should take CPU,MEM and volume count into account.

Additional info:
when not add "feature-gates:- BalanceAttachedNodeVolumes=true" in master-config.yaml(mean not execute step 1), and other steps are as the same as above,
I get similar scheduler result(mean in step 6~8, each pod are scheduled to the same node)
So I think the scheduler policy of BalancedResourceAllocation not include volume count.

Comment 2 Avesh Agarwal 2018-08-07 20:01:41 UTC
I have tested that whenever alpha feature BalanceAttachedNodeVolumes is enabled, it takes volumes into account. You are not seeing the effect because you have just created one PV/PVC and it is not enough to see the impact of BalanceAttachedNodeVolumes. For real testing of this priority function, you will have to create more volumes and more nodes, and then you would check if the scheduler is spreading pods by taking even distribution of cpu, mem and volume into account.

Anyway, i have sent following PR upstream which shows volumes (capacity and requested) whenever alpha feature BalanceAttachedNodeVolumes is enabled. This PR also fixes an issue, i found during this testing, where total requested values of cpu and mem were being reported incorrectly:

https://github.com/kubernetes/kubernetes/pull/67094

Comment 3 Avesh Agarwal 2018-08-07 20:03:06 UTC
After the above PR, the output will look like following that includes volumes too:

I0807 19:33:17.810565       1 resource_allocation.go:69] mypod -> ip-172-18-0-4.ec2.internal: BalancedResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 600 millicores 1480589312 memory bytes 1 volumes, score 9
I0807 19:33:17.810613       1 resource_allocation.go:69] mypod -> ip-172-18-1-197.ec2.internal: BalancedResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 400 millicores 943718400 memory bytes 1 volumes, score 9
I0807 19:33:17.810663       1 resource_allocation.go:69] mypod -> ip-172-18-1-197.ec2.internal: LeastResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 400 millicores 943718400 memory bytes 1 volumes, score 8
I0807 19:33:17.810762       1 resource_allocation.go:69] mypod -> ip-172-18-0-7.ec2.internal: BalancedResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 900 millicores 1782579200 memory bytes 1 volumes, score 9
I0807 19:33:17.810810       1 resource_allocation.go:69] mypod -> ip-172-18-0-7.ec2.internal: LeastResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 900 millicores 1782579200 memory bytes 1 volumes, score 6
I0807 19:33:17.810720       1 resource_allocation.go:69] mypod -> ip-172-18-0-4.ec2.internal: LeastResourceAllocation, capacity 2000 millicores 8095694848 memory bytes, 39 volumes, total request 600 millicores 1480589312 memory bytes 1 volumes, score 7

Comment 5 Avesh Agarwal 2018-08-10 14:55:46 UTC
Here is the pick for origin: https://github.com/openshift/origin/pull/20603

Comment 6 MinLi 2018-08-15 09:43:52 UTC
Can you describe how to see BalanceAttachedNodeVolumes effect? for example  I want to see log-output include volumes info. 
For a simple scenario,I create 3 pvc, and create a pod which only need one pvc. in this way, I only saw cpu and memory in log, but no volumn. Or volume should be other resource not pvc?

Comment 7 Avesh Agarwal 2018-08-15 13:40:39 UTC
(In reply to MinLi from comment #6)
> Can you describe how to see BalanceAttachedNodeVolumes effect? for example 
> I want to see log-output include volumes info. 
> For a simple scenario,I create 3 pvc, and create a pod which only need one
> pvc. in this way, I only saw cpu and memory in log, but no volumn. Or volume
> should be other resource not pvc?

Are you testing with a build that has this commit https://github.com/openshift/origin/pull/20603?

Comment 8 Avesh Agarwal 2018-08-17 12:55:17 UTC
Is there any update?

Comment 9 MinLi 2018-08-23 05:09:13 UTC
I need read  https://github.com/openshift/origin/pull/20603? carefully. Later reply to you

Comment 11 MinLi 2018-08-24 03:40:34 UTC
https://github.com/aveshagarwal/origin/blob/d4a7fe442b583c8da4e46dbe833296759cb3de31/vendor/k8s.io/kubernetes/pkg/scheduler/algorithm/priorities/resource_allocation.go

in above file and in func PriorityMap, the line after annotation“Check if the pod has volumes and this could be added to scorer function for balanced resource allocation” :
if len(pod.Spec.Volumes) >= 0 && utilfeature.DefaultFeatureGate.Enabled(features.BalanceAttachedNodeVolumes) && nodeInfo.TransientInfo != nil 

ask: nodeInfo.TransientInfo mean what? How can I judge one node's nodeInfo.TransientInfo not null?
Could you explain it in detail? Thank you.

Comment 12 MinLi 2018-08-30 06:59:40 UTC
I can see volumn info in log in below version:
oc v3.11.0-0.25.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

verified!

Comment 14 errata-xmlrpc 2018-10-11 07:23:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Comment 15 Red Hat Bugzilla 2023-09-14 04:32:41 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days