The Orchestration engine used counterintuitive truncation logic when calculating the autoscaling of server group size changes. Specifically, autoscaling always only used the configured scaling increment, regardless of the configured maximum or minimum group size.
This allowed certain scaling increment settings to prevent the autoscaling feature from actually hitting minimum or maximum group sizes. For example, with a scale-up setting of 2, the only possible autoscaling maximum would be 4 if the configured maximum group size if 5.
With this release, the autoscaling feature now truncates scaling adjustments to upper/lower bounds in case of an overshoot. This allows the Orchestration engine to automatically scale to maximum and minimum group sizes, regardless of the configuring scaling increments.
Description of problem:
Potential headroom for an autoscaling group growth or shrinkage will remain unused if the adjustment doesn't *exactly* hit the max or min size respectively.
Take for example an instance group with:
* MinSize=1
* MaxSize=6
* ScaleupPolicy ScalingAdjustment=2
* ScaledownPolicy ScalingAdjustment=-3
When the under-scaled alarm fires, the group will grow in increments of 2 from 1->3->5 and then grow no further, even if the under-scaled alarm condition persists. So the max group size is never reached.
Then if the over-scaled alarm fires subsequently, the group will shrink in one decrement of 3 from 5->2 and then shrink no further, even if the over-scaled alarm condition persists. So the min group size is never resumed.
This may seem like an edge case, but is actually quite likely to be hit especially if the adjustment type is set to PercentChangeInCapacity, in which case it's non-trivial to choose min and max size such that a compounded application of the percentage delta always exactly lands on the upper and lower bounds.
More intuitive behavior would be to truncate the adjustment to the upper or lower bound in the case of an over-shoot.
Version-Release number of selected component (if applicable):
openstack-heat-engine-2013.2-1.0.el6ost.noarch
How reproducible:
100%
Steps to Reproduce:
0. Install openstack, including heat & ceilometer. Ensure that the ceilometer compute agent is measuring cpu_util at a reasonable frequency (every minute as opposed to the default 10 mins):
sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml
sudo service openstack-ceilometer-compute restart
1. Upload the cirros images if not already present in glance:
sudo yum install -y wget
wget http://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-uec.tar.gz
tar zxvf cirros-0.3.0-x86_64-uec.tar.gz
glance add name=cirros-aki is_public=true container_format=aki disk_format=aki < cirros-0.3.0-x86_64-vmlinuz
glance add name=cirros-ari is_public=true container_format=ari disk_format=ari < cirros-0.3.0-x86_64-initrd
glance add name=cirros-ami is_public=true container_format=ami disk_format=ami \
"kernel_id=$(glance index | awk '/cirros-aki/ {print $1}')" \
"ramdisk_id=$(glance index | awk '/cirros-ari/ {print $1}')" < cirros-0.3.0-x86_64-blank.img
2. Add a UserKey if not already present in nova:
nova keypair-add --pub_key ~/.ssh/id_rsa.pub userkey
3. Create stack with the attached template:
heat stack-create test_stack --template-file=template.yaml --parameters="KeyName=userkey;InstanceType=m1.tiny;ImageId=$CIRROS_AMI_IMAGE"
4. Wait for the stack creation to complete:
watch "heat stack-show test_stack | grep status"
5. Check that the high and low CPU alarms transition into the alarm and ok states respectively within a couple of minutes:
watch "ceilometer alarm-list | grep test_stack"
6. Verify that the peak number of servers never goes beyond 5 (whereas the declared MaxSize is 6):
watch "nova list | grep ServerGroup | wc -l"
Actual results:
The autoscaling group will max out at 5 instances, regardless of how long the underscaled alarm persists for.
Expected results:
The autoscaling group should max out at 6 instances.
Additional info:
This issue would also occur with native cloudwatch-style alarming, as opposed to ceilometer alarming.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHEA-2013-1859.html
Description of problem: Potential headroom for an autoscaling group growth or shrinkage will remain unused if the adjustment doesn't *exactly* hit the max or min size respectively. Take for example an instance group with: * MinSize=1 * MaxSize=6 * ScaleupPolicy ScalingAdjustment=2 * ScaledownPolicy ScalingAdjustment=-3 When the under-scaled alarm fires, the group will grow in increments of 2 from 1->3->5 and then grow no further, even if the under-scaled alarm condition persists. So the max group size is never reached. Then if the over-scaled alarm fires subsequently, the group will shrink in one decrement of 3 from 5->2 and then shrink no further, even if the over-scaled alarm condition persists. So the min group size is never resumed. This may seem like an edge case, but is actually quite likely to be hit especially if the adjustment type is set to PercentChangeInCapacity, in which case it's non-trivial to choose min and max size such that a compounded application of the percentage delta always exactly lands on the upper and lower bounds. More intuitive behavior would be to truncate the adjustment to the upper or lower bound in the case of an over-shoot. Version-Release number of selected component (if applicable): openstack-heat-engine-2013.2-1.0.el6ost.noarch How reproducible: 100% Steps to Reproduce: 0. Install openstack, including heat & ceilometer. Ensure that the ceilometer compute agent is measuring cpu_util at a reasonable frequency (every minute as opposed to the default 10 mins): sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml sudo service openstack-ceilometer-compute restart 1. Upload the cirros images if not already present in glance: sudo yum install -y wget wget http://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-uec.tar.gz tar zxvf cirros-0.3.0-x86_64-uec.tar.gz glance add name=cirros-aki is_public=true container_format=aki disk_format=aki < cirros-0.3.0-x86_64-vmlinuz glance add name=cirros-ari is_public=true container_format=ari disk_format=ari < cirros-0.3.0-x86_64-initrd glance add name=cirros-ami is_public=true container_format=ami disk_format=ami \ "kernel_id=$(glance index | awk '/cirros-aki/ {print $1}')" \ "ramdisk_id=$(glance index | awk '/cirros-ari/ {print $1}')" < cirros-0.3.0-x86_64-blank.img 2. Add a UserKey if not already present in nova: nova keypair-add --pub_key ~/.ssh/id_rsa.pub userkey 3. Create stack with the attached template: heat stack-create test_stack --template-file=template.yaml --parameters="KeyName=userkey;InstanceType=m1.tiny;ImageId=$CIRROS_AMI_IMAGE" 4. Wait for the stack creation to complete: watch "heat stack-show test_stack | grep status" 5. Check that the high and low CPU alarms transition into the alarm and ok states respectively within a couple of minutes: watch "ceilometer alarm-list | grep test_stack" 6. Verify that the peak number of servers never goes beyond 5 (whereas the declared MaxSize is 6): watch "nova list | grep ServerGroup | wc -l" Actual results: The autoscaling group will max out at 5 instances, regardless of how long the underscaled alarm persists for. Expected results: The autoscaling group should max out at 6 instances. Additional info: This issue would also occur with native cloudwatch-style alarming, as opposed to ceilometer alarming.