Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1535940 - Cluster capacity can not run after update for kube rebase 1.9
Cluster capacity can not run after update for kube rebase 1.9
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.9.0
Unspecified Unspecified
high Severity medium
: ---
: 3.9.0
Assigned To: Avesh Agarwal
weiwei jiang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-01-18 04:56 EST by weiwei jiang
Modified: 2018-03-28 10:20 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In a pod, kubeconfig is not supplied and cluster-capacity exited incorrectly because in a pod, in cluster config is used. Consequence: cluster capacity stopped working in a pod. Fix: Now in a pod, absence of kubeconfig is ignored because in cluster config is used. Result: cluster capacity is now woking in a pod.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-03-28 10:20:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:20 EDT

  None (edit)
Description weiwei jiang 2018-01-18 04:56:13 EST
Description of problem:
Checked with cluster capacity and found it always got "Failed to set default scheduler config: Error in opening default scheduler config file: open : no such file or directory" even I give a existed path for --default-config

/bin/sh -ec /bin/cluster-capacity --default-config=/test-s/scheduler.json --podspec=/test-pod/pod.yaml --verbose

$ ls /test-*
/test-pod:
pod.yaml

/test-s:
scheduler.json


Version-Release number of selected component (if applicable):
atomic-openshift-cluster-capacity-3.9.0-0.21.0.git.0.2a50d06.el7.x86_64
# openshift version 
openshift v3.9.0-0.20.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

How reproducible:
always

Steps to Reproduce:
1. Create a podspec as a configmap
oc create configmap cluster-capacity-configmap --from-file=pod.yaml=pod.yaml -n default
# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: cluster-capacity-stub-container
  namespace: cluster-capacity
spec:
  containers:
  - image: gcr.io/google_containers/pause:2.0
    imagePullPolicy: Always
    name: cluster-capacity-stub-container
    resources:
      limits:
        cpu: 200m
        memory: 100Mi
      requests:
        cpu: 100m
        memory: 80Mi
  dnsPolicy: Default
  nodeSelector:
    load: high
    region: hpc
  restartPolicy: OnFailure
  schedulerName: default-scheduler
status: {}

2. oc create secret generic sf --from-file=scheduler.json=/etc/origin/master/scheduler.json  -n default
3. create a rc for cluster-capacity
apiVersion: v1
kind: ReplicationController
metadata:
  creationTimestamp: 2018-01-18T07:30:32Z
  generation: 5
  labels:
    run: cluster-capacity
  name: cluster-capacity
  namespace: default
  resourceVersion: "22043"
  selfLink: /api/v1/namespaces/default/replicationcontrollers/cluster-capacity
  uid: 778a86ed-fc21-11e7-80da-fa163ead188e
spec:
  replicas: 1
  selector:
    run: cluster-capacity
  template:
    metadata:
      creationTimestamp: null
      labels:
        run: cluster-capacity
    spec:
      containers:
      - command:
        - /bin/sh
        - -ec
        - |
          /bin/cluster-capacity --default-config=/test-s/scheduler.json --podspec=/test-pod/pod.yaml --verbose;while true;do sleep 10;done
        env:
        - name: CC_INCLUSTER
          value: "true"
        image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-cluster-capacity
        imagePullPolicy: Always
        name: cluster-capacity
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /test-pod
          name: test-volume
        - mountPath: /test-s
          name: ss
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: cluster-capacity-sa
      serviceAccountName: cluster-capacity-sa
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: cluster-capacity-configmap
        name: test-volume
      - name: ss
        secret:
          defaultMode: 420
          secretName: sf


Actual results:
# oc logs -n default -f cluster-capacity-bgjh5
Failed to set default scheduler config: Error in opening default scheduler config file: open : no such file or directory

Expected results:
cluster-capacity should work well

Additional info:
Comment 1 Avesh Agarwal 2018-01-19 10:57:03 EST
(In reply to weiwei jiang from comment #0)
> Description of problem:
> Checked with cluster capacity and found it always got "Failed to set default
> scheduler config: Error in opening default scheduler config file: open : no
> such file or directory" even I give a existed path for --default-config
> 
> /bin/sh -ec /bin/cluster-capacity --default-config=/test-s/scheduler.json
> --podspec=/test-pod/pod.yaml --verbose
> 
> $ ls /test-*
> /test-pod:
> pod.yaml
> 
> /test-s:
> scheduler.json
> 
> 
> Version-Release number of selected component (if applicable):
> atomic-openshift-cluster-capacity-3.9.0-0.21.0.git.0.2a50d06.el7.x86_64
> # openshift version 
> openshift v3.9.0-0.20.0
> kubernetes v1.9.1+a0ce1bc657
> etcd 3.2.8
> 
> How reproducible:
> always
> 
> Steps to Reproduce:
> 1. Create a podspec as a configmap
> oc create configmap cluster-capacity-configmap --from-file=pod.yaml=pod.yaml
> -n default
> # cat pod.yaml 
> apiVersion: v1
> kind: Pod
> metadata:
>   creationTimestamp: null
>   name: cluster-capacity-stub-container
>   namespace: cluster-capacity
> spec:
>   containers:
>   - image: gcr.io/google_containers/pause:2.0
>     imagePullPolicy: Always
>     name: cluster-capacity-stub-container
>     resources:
>       limits:
>         cpu: 200m
>         memory: 100Mi
>       requests:
>         cpu: 100m
>         memory: 80Mi
>   dnsPolicy: Default
>   nodeSelector:
>     load: high
>     region: hpc
>   restartPolicy: OnFailure
>   schedulerName: default-scheduler
> status: {}
> 
> 2. oc create secret generic sf
> --from-file=scheduler.json=/etc/origin/master/scheduler.json  -n default

Why are you creating secret from scheduler config file? 


> 3. create a rc for cluster-capacity
> apiVersion: v1
> kind: ReplicationController
> metadata:
>   creationTimestamp: 2018-01-18T07:30:32Z
>   generation: 5
>   labels:
>     run: cluster-capacity
>   name: cluster-capacity
>   namespace: default
>   resourceVersion: "22043"
>   selfLink:
> /api/v1/namespaces/default/replicationcontrollers/cluster-capacity
>   uid: 778a86ed-fc21-11e7-80da-fa163ead188e
> spec:
>   replicas: 1
>   selector:
>     run: cluster-capacity
>   template:
>     metadata:
>       creationTimestamp: null
>       labels:
>         run: cluster-capacity
>     spec:
>       containers:
>       - command:
>         - /bin/sh
>         - -ec
>         - |
>           /bin/cluster-capacity --default-config=/test-s/scheduler.json

why are you mounting scheduler.json config file as secret?

> --podspec=/test-pod/pod.yaml --verbose;while true;do sleep 10;done
>         env:
>         - name: CC_INCLUSTER
>           value: "true"
>         image:
> brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-cluster-
> capacity
>         imagePullPolicy: Always
>         name: cluster-capacity
>         resources: {}
>         terminationMessagePath: /dev/termination-log
>         terminationMessagePolicy: File
>         volumeMounts:
>         - mountPath: /test-pod
>           name: test-volume
>         - mountPath: /test-s
>           name: ss
>       dnsPolicy: ClusterFirst
>       restartPolicy: Always
>       schedulerName: default-scheduler
>       securityContext: {}
>       serviceAccount: cluster-capacity-sa
>       serviceAccountName: cluster-capacity-sa
>       terminationGracePeriodSeconds: 30
>       volumes:
>       - configMap:
>           defaultMode: 420
>           name: cluster-capacity-configmap
>         name: test-volume
>       - name: ss
>         secret:
>           defaultMode: 420
>           secretName: sf
> 
> 
> Actual results:
> # oc logs -n default -f cluster-capacity-bgjh5
> Failed to set default scheduler config: Error in opening default scheduler
> config file: open : no such file or directory
> 
> Expected results:
> cluster-capacity should work well
> 
> Additional info:

Also in general, you need to node provide scheduler.json unless your cluster us using some customize default scheduler.
Comment 2 Avesh Agarwal 2018-01-19 10:58:02 EST
Just want to correct my comment:
Also in general, you need NOT provide scheduler.json unless your cluster is 
using some customized default scheduler.
Comment 3 Avesh Agarwal 2018-01-19 11:06:11 EST
Also can you show me your scheduler.json? I just tested locally --default-config and it seemed to work. I will try in a pod now.
Comment 4 Avesh Agarwal 2018-01-19 12:53:43 EST
I can reproduce the issue. I am working on a fix.
Comment 5 Avesh Agarwal 2018-01-19 16:17:15 EST
PR: https://github.com/openshift/origin/pull/18198

In case you want to test it before, you could use this image: docker.io/aveshagarwal/cluster-capacity
Comment 6 weiwei jiang 2018-01-26 04:54:32 EST
Checked with atomic-openshift-cluster-capacity-3.9.0-0.24.0.git.0.fc8ad63.el7.x86_64, and work now.

# oc logs -f cluster-capacity-g9xfc
cluster-capacity-stub-container pod requirements:
	- CPU: 100m
	- Memory: 80Mi
	- NodeSelector: load=high,region=hpc

The cluster can schedule 0 instance(s) of the pod cluster-capacity-stub-container.

Termination reason: Unschedulable: 0/5 nodes are available: 1 NodeUnschedulable, 5 MatchNodeSelector.
Comment 8 weiwei jiang 2018-01-28 21:17:29 EST
According to https://bugzilla.redhat.com/show_bug.cgi?id=1535940#c6, move to verified.
Comment 11 errata-xmlrpc 2018-03-28 10:20:40 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.