Bug 1861642
Summary: | baremetal: Cluster Autoscaler Operator doesn't expose max-node-provision-time arg of CA | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Daniel <dmaizel> |
Component: | Cloud Compute | Assignee: | Steven Hardy <shardy> |
Cloud Compute sub component: | BareMetal Provider | QA Contact: | Daniel <dmaizel> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | beth.white, dmaizel, mimccune, rbartal, stbenjam |
Version: | 4.6 | Keywords: | Triaged, UpcomingSprint |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: |
A new interface maxNodeProvisionTime was added to the ClusterAutoscaler resource. This can be used to control the time the cluster-autoscaler waits for a new machine to be provisioned before considering provisioning as failed.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:21:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Daniel
2020-07-29 07:00:11 UTC
Link to must-gather: http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/must-gather-bz1861642.zip Ok I tested this (4.6.0-0.ci-2020-07-21-114552), and I believe it's a problem with the test-case vs any issue with the auto-scaling itself. I'm not completely clear on the criteria used for making the scaling decision, but it seems that having a pod doing nothing (I tried the httpd example and a busybox container sleeping) while there are pending pods is not sufficient to trigger the scale-up. Instead, I created a new container which runs the "stress" tool to simulate memory pressure, Dockerfile looks like: $ cat Dockerfile FROM docker.io/centos:centos8 RUN dnf install -y epel-release && dnf install -y stress && dnf clean all I built this and pushed it to my local registry. I then applied the autoscaler and machineautoscaler manifests (note yaml files are available at https://gist.github.com/hardys/41a77adb69661d6c97e722905c0db169): $ oc project openshift-machine-api Now using project "openshift-machine-api" on server "https://api.ostest.test.metalkube.org:6443". $ oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE ostest-wb5t4-worker-0 2 2 2 2 15h $ oc apply -f autoscaler.yaml clusterautoscaler.autoscaling.openshift.io/default created $ oc apply -f machine_as.yaml machineautoscaler.autoscaling.openshift.io/scale-automatic created Then I switched to a new project and created a pod running the stress container $ oc new-project auto-scaling Now using project "auto-scaling" on server "https://api.ostest.test.metalkube.org:6443" $ oc apply -f stress.yaml deployment.apps/stress-deployment created $ oc get pods NAME READY STATUS RESTARTS AGE stress-deployment-77c4dd6786-fdk56 1/1 Running 0 10s stress-deployment-77c4dd6786-tr9s5 1/1 Running 0 10s $ oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE ostest-wb5t4-worker-0 2 2 2 2 15h I then scaled up the deployment: $ oc scale deployment --replicas=5 stress-deployment deployment.apps/stress-deployment scaled $ oc get pods NAME READY STATUS RESTARTS AGE stress-deployment-77c4dd6786-fdk56 1/1 Running 0 38s stress-deployment-77c4dd6786-mp7mb 0/1 Pending 0 5s stress-deployment-77c4dd6786-nql92 0/1 Pending 0 5s stress-deployment-77c4dd6786-szh2s 0/1 Pending 0 5s stress-deployment-77c4dd6786-tr9s5 1/1 Running 0 38s We see the machineset scale-up and a new machine in "Provisioning" state: $ oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE ostest-wb5t4-worker-0 3 3 2 2 15h $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE ostest-wb5t4-master-0 Running 15h ostest-wb5t4-master-1 Running 15h ostest-wb5t4-master-2 Running 15h ostest-wb5t4-worker-0-nfkxn Running 15h ostest-wb5t4-worker-0-rfwvl Provisioning 21s ostest-wb5t4-worker-0-z8gbd Running 15h A short time later (after adding an extra BMH resource), we see the machine is associated with a BMH and marked as provisioned: $ oc get machines -n openshift-machine-api | grep ostest-wb5t4-worker-0-rfwvl ostest-wb5t4-worker-0-rfwvl Provisioned 11m $ oc get bmh -n openshift-machine-api | grep ostest-wb5t4-worker-0-rfwvl ostest-extra-worker-0 OK inspecting ostest-wb5t4-worker-0-rfwvl ipmi://[fd2e:6f44:5dd8:c956::1]:6235 However, it takes some time for the BMH resource to be provisioned and for the node to join the cluster, which appears to result in the Machine getting deleted: $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE ostest-wb5t4-master-0 Running 16h ostest-wb5t4-master-1 Running 16h ostest-wb5t4-master-2 Running 16h ostest-wb5t4-worker-0-6rvjs Provisioned 18m ostest-wb5t4-worker-0-nfkxn Running 15h ostest-wb5t4-worker-0-rfwvl Deleting 45m ostest-wb5t4-worker-0-srrg2 Deleting 19m ostest-wb5t4-worker-0-z8gbd Running 15h So to make this work correctly we have to ensure whatever triggers that machine deletion waits longer, I'm not clear if the scale-down timeouts are relevant here - there doesn't seem to be any other interface in the docs that could influence this behavior: https://docs.openshift.com/container-platform/4.1/machine_management/applying-autoscaling.html#cluster-autoscaler-cr_applying-autoscaling Ok so it seems that the cluster autoscaler defaults to waiting only 15mins for a node after a machine is created: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca The max-node-provision-time argument appears to control this, but AFAICS it's not yet supported by openshift/cluster-autoscaler-operator so we'll have to add it to enable a longer waiting time for baremetal deployments. i looked at Steven's patch for the cluster-autoscaler-operator today. it looks mostly good and i feel we can probably merge once a few details are worked out. i'd also like to get a few reviews from other team members since we are modifying the CRD. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |