Bug 1034684 - [heat] potential autoscaling headroom remains unused
Summary: [heat] potential autoscaling headroom remains unused
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 4.0
Assignee: Eoghan Glynn
QA Contact: Kevin Whitney
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-26 10:51 UTC by Eoghan Glynn
Modified: 2014-02-02 22:41 UTC (History)
9 users (show)

Fixed In Version: openstack-heat-engine-2013.2-2.0.el6ost
Doc Type: Bug Fix
Doc Text:
The Orchestration engine used counterintuitive truncation logic when calculating the autoscaling of server group size changes. Specifically, autoscaling always only used the configured scaling increment, regardless of the configured maximum or minimum group size. This allowed certain scaling increment settings to prevent the autoscaling feature from actually hitting minimum or maximum group sizes. For example, with a scale-up setting of 2, the only possible autoscaling maximum would be 4 if the configured maximum group size if 5. With this release, the autoscaling feature now truncates scaling adjustments to upper/lower bounds in case of an overshoot. This allows the Orchestration engine to automatically scale to maximum and minimum group sizes, regardless of the configuring scaling increments.
Clone Of:
Environment:
Last Closed: 2013-12-20 00:39:08 UTC
Target Upstream Version:


Attachments (Terms of Use)
Heat template that reproduces te issue. (3.69 KB, application/x-yaml)
2013-11-26 10:52 UTC, Eoghan Glynn
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1254796 0 None None None Never
OpenStack gerrit 58343 0 None None None Never
OpenStack gerrit 58552 0 None None None Never
Red Hat Product Errata RHEA-2013:1859 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2013-12-21 00:01:48 UTC

Description Eoghan Glynn 2013-11-26 10:51:00 UTC
Description of problem:

Potential headroom for an autoscaling group growth or shrinkage will remain unused if the adjustment doesn't *exactly* hit the max or min size respectively.

Take for example an instance group with:

 * MinSize=1
 * MaxSize=6
 * ScaleupPolicy ScalingAdjustment=2
 * ScaledownPolicy ScalingAdjustment=-3

When the under-scaled alarm fires, the group will grow in increments of 2 from 1->3->5 and then grow no further, even if the under-scaled alarm condition persists. So the max group size is never reached.

Then if the over-scaled alarm fires subsequently, the group will shrink in one decrement of 3 from 5->2 and then shrink no further, even if the over-scaled alarm condition persists. So the min group size is never resumed.

This may seem like an edge case, but is actually quite likely to be hit especially if the adjustment type is set to PercentChangeInCapacity, in which case it's non-trivial to choose min and max size such that a compounded application of the percentage delta always exactly lands on the upper and lower bounds.

More intuitive behavior would be to truncate the adjustment to the upper or lower bound in the case of an over-shoot.

Version-Release number of selected component (if applicable):

openstack-heat-engine-2013.2-1.0.el6ost.noarch


How reproducible:

100%


Steps to Reproduce:

0. Install openstack, including heat & ceilometer. Ensure that the ceilometer compute agent is measuring cpu_util at a reasonable frequency (every minute as opposed to the default 10 mins):

  sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml
  sudo service openstack-ceilometer-compute restart


1. Upload the cirros images if not already present in glance:

  sudo yum install -y wget
  wget http://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-uec.tar.gz
  tar zxvf cirros-0.3.0-x86_64-uec.tar.gz 
 glance add name=cirros-aki is_public=true container_format=aki disk_format=aki < cirros-0.3.0-x86_64-vmlinuz 
  glance add name=cirros-ari is_public=true container_format=ari disk_format=ari < cirros-0.3.0-x86_64-initrd 
  glance add name=cirros-ami is_public=true container_format=ami disk_format=ami \
     "kernel_id=$(glance index | awk '/cirros-aki/ {print $1}')" \
     "ramdisk_id=$(glance index | awk '/cirros-ari/ {print $1}')" < cirros-0.3.0-x86_64-blank.img  


2. Add a UserKey if not already present in nova:

  nova keypair-add --pub_key ~/.ssh/id_rsa.pub userkey


3. Create stack with the attached template:

  heat stack-create test_stack --template-file=template.yaml --parameters="KeyName=userkey;InstanceType=m1.tiny;ImageId=$CIRROS_AMI_IMAGE"


4. Wait for the stack creation to complete:

  watch "heat stack-show test_stack | grep status"


5. Check that the high and low CPU alarms transition into the alarm and ok states respectively within a couple of minutes:

  watch "ceilometer alarm-list | grep test_stack"


6. Verify that the peak number of servers never goes beyond 5 (whereas the declared MaxSize is 6):

  watch "nova list | grep ServerGroup | wc -l" 



Actual results:

The autoscaling group will max out at 5 instances, regardless of how long the underscaled alarm persists for. 



Expected results:

The autoscaling group should max out at 6 instances.


Additional info:

This issue would also occur with native cloudwatch-style alarming, as opposed to ceilometer alarming.

Comment 1 Eoghan Glynn 2013-11-26 10:52:50 UTC
Created attachment 829191 [details]
Heat template that reproduces te issue.

Comment 2 Eoghan Glynn 2013-11-26 10:54:51 UTC
Fix proposed to master upstream:

  https://review.openstack.org/58343

Comment 4 Eoghan Glynn 2013-11-26 15:36:36 UTC
Fix landed on master upstream:

  http://github.com/openstack/heat/commit/2c25616e

Comment 5 Eoghan Glynn 2013-11-26 15:46:47 UTC
Fix proposed to stable/havana upstream:

  https://review.openstack.org/58552

Comment 6 Eoghan Glynn 2013-11-27 12:34:02 UTC
Fix landed on stable/havana upstream:

  https://github.com/openstack/heat/commit/a8c0b110

Comment 7 Eoghan Glynn 2013-11-27 12:43:21 UTC
Fix backported to internal rhos-4.0-rhel-6-patches branch:

  https://code.engineering.redhat.com/gerrit/16394

Comment 14 errata-xmlrpc 2013-12-20 00:39:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html


Note You need to log in before you can comment on or make changes to this bug.