Description of problem: If the AdjustmentType is set to PercentChangeInCapacity, then depending on the choice of the instance group MinSize and the scale-up policy ScalingAdjustment, no autoscaling actions may occur even when the under-scaled alarm fires. This situation occurs if (MinSize * ScalingAdjustment / 100.0) < 1.0, in which case the group never get gets scaled up even if the under-scaled alarm stays in the alarm state forever. It sounds like an edge-case, but would probably be common enough in reality, e.g. MinSize = 3, ScalingAdjustment = 33% or MinSize = 4, ScalingAdjustment = 20% etc. The problem is that AutoScaling.adjust() does not pay enough attention to rounding issues. It should instead follow the rounding rules used by AWS Autoscaling: http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-scale-based-on-demand.html i.e. round the adjustment up if abs(adjustment) < 1.0, otherwise round down Version-Release number of selected component (if applicable): RHOS 4.0 / Havana How reproducible: 100% Steps to Reproduce: 0. Ensure that the heat and ceilometer services are running (including the openstack-ceilometer-alarm-evaluator and openstack-ceilometer-alarm-notifier) 1. Create an autoscaled stack template with the following scale-up config: my_template.yaml: ... ServerGroup: Type: AWS::AutoScaling::AutoScalingGroup Properties: AvailabilityZones: {'Fn::GetAZs': ''} LaunchConfigurationName: {Ref: LaunchConfig} MinSize: '1' MaxSize: '5' Tags: - {Key: metering.server_group, Value: ServerGroup} ServerScaleUpPolicy: Type: AWS::AutoScaling::ScalingPolicy Properties: AdjustmentType: PercentChangeInCapacity AutoScalingGroupName: {Ref: ServerGroup} Cooldown: '60' ScalingAdjustment: '33' CPUAlarmHigh: Type: OS::Metering::Alarm Properties: description: Scale-up if the average CPU > 1% for 1 minute meter_name: cpu_util statistic: avg period: '60' evaluation_periods: '1' threshold: '0.01' alarm_actions: - {"Fn::GetAtt": [ServerScaleUpPolicy, AlarmUrl]} matching_metadata: {'metadata.user_metadata.server_group': 'ServerGroup'} comparison_operator: gt repeat_actions: True ... etc. 2. Create a stack from this template and wait for its state to transition to CREATE_COMPLETE: $ heat stack-create --template-file my_template.yaml my_stack $ watch "heat stack-show my_stack | grep status" 3. Note that scale-up action does not occur, even though the cpu_util for the single resources clearly exceeds 0.01% and the under-scaled alarm has fired: $ INSTANCE_ID=$(nova list | awk '/my_stack/ {print $2}') $ ceilometer statistics -m cpu_util -p 60 -q "resource_id=$INSTANCE_ID" # note the Avg statistic exceeds 0.01% $ ceilometer alarm-list | grep my_stack-CPUAlarmHigh # note the current state is 'alarm' $ watch "nova list | grep my_stack-ServerGroup | wc -l" # note the autoscaling group never grows beyond the initial size of 1 Actual results: $ $ nova list | grep ServerGroup | wc -l 1 Expected results: $ $ nova list | grep ServerGroup | wc -l 5 Additional info: The same problem will occur will with watch-rules as opposed to ceilometer alarms.
Fix proposed on master upstream: https://review.openstack.org/56281
Landed on master upstream: https://github.com/openstack/heat/commit/5720e6d6
Proposed to stable/havana upstream: https://review.openstack.org/56360
Landed on stable/havana upstream: https://github.com/openstack/heat/commit/f33297d7f13eedc8e8aa5e5db294d3a725679974
Backport landed in internal repo: https://code.engineering.redhat.com/gerrit/16394
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1859.html