Hide Forgot
Description of problem: Potential headroom for an autoscaling group growth or shrinkage will remain unused if the adjustment doesn't *exactly* hit the max or min size respectively. Take for example an instance group with: * MinSize=1 * MaxSize=6 * ScaleupPolicy ScalingAdjustment=2 * ScaledownPolicy ScalingAdjustment=-3 When the under-scaled alarm fires, the group will grow in increments of 2 from 1->3->5 and then grow no further, even if the under-scaled alarm condition persists. So the max group size is never reached. Then if the over-scaled alarm fires subsequently, the group will shrink in one decrement of 3 from 5->2 and then shrink no further, even if the over-scaled alarm condition persists. So the min group size is never resumed. This may seem like an edge case, but is actually quite likely to be hit especially if the adjustment type is set to PercentChangeInCapacity, in which case it's non-trivial to choose min and max size such that a compounded application of the percentage delta always exactly lands on the upper and lower bounds. More intuitive behavior would be to truncate the adjustment to the upper or lower bound in the case of an over-shoot. Version-Release number of selected component (if applicable): openstack-heat-engine-2013.2-1.0.el6ost.noarch How reproducible: 100% Steps to Reproduce: 0. Install openstack, including heat & ceilometer. Ensure that the ceilometer compute agent is measuring cpu_util at a reasonable frequency (every minute as opposed to the default 10 mins): sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml sudo service openstack-ceilometer-compute restart 1. Upload the cirros images if not already present in glance: sudo yum install -y wget wget http://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-uec.tar.gz tar zxvf cirros-0.3.0-x86_64-uec.tar.gz glance add name=cirros-aki is_public=true container_format=aki disk_format=aki < cirros-0.3.0-x86_64-vmlinuz glance add name=cirros-ari is_public=true container_format=ari disk_format=ari < cirros-0.3.0-x86_64-initrd glance add name=cirros-ami is_public=true container_format=ami disk_format=ami \ "kernel_id=$(glance index | awk '/cirros-aki/ {print $1}')" \ "ramdisk_id=$(glance index | awk '/cirros-ari/ {print $1}')" < cirros-0.3.0-x86_64-blank.img 2. Add a UserKey if not already present in nova: nova keypair-add --pub_key ~/.ssh/id_rsa.pub userkey 3. Create stack with the attached template: heat stack-create test_stack --template-file=template.yaml --parameters="KeyName=userkey;InstanceType=m1.tiny;ImageId=$CIRROS_AMI_IMAGE" 4. Wait for the stack creation to complete: watch "heat stack-show test_stack | grep status" 5. Check that the high and low CPU alarms transition into the alarm and ok states respectively within a couple of minutes: watch "ceilometer alarm-list | grep test_stack" 6. Verify that the peak number of servers never goes beyond 5 (whereas the declared MaxSize is 6): watch "nova list | grep ServerGroup | wc -l" Actual results: The autoscaling group will max out at 5 instances, regardless of how long the underscaled alarm persists for. Expected results: The autoscaling group should max out at 6 instances. Additional info: This issue would also occur with native cloudwatch-style alarming, as opposed to ceilometer alarming.
Created attachment 829191 [details] Heat template that reproduces te issue.
Fix proposed to master upstream: https://review.openstack.org/58343
Fix landed on master upstream: http://github.com/openstack/heat/commit/2c25616e
Fix proposed to stable/havana upstream: https://review.openstack.org/58552
Fix landed on stable/havana upstream: https://github.com/openstack/heat/commit/a8c0b110
Fix backported to internal rhos-4.0-rhel-6-patches branch: https://code.engineering.redhat.com/gerrit/16394
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1859.html