Bug 1030064 - [heat] no autoscaling action occurs for percentage adjustment, depending on initial size & adjustment step size
Summary: [heat] no autoscaling action occurs for percentage adjustment, depending on i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 4.0
Assignee: Eoghan Glynn
QA Contact: Kevin Whitney
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-13 20:49 UTC by Eoghan Glynn
Modified: 2014-02-02 22:40 UTC (History)
7 users (show)

Fixed In Version: openstack-heat-engine-2013.2-2.0.el6ost
Doc Type: Bug Fix
Doc Text:
The Orchestration engine did not use proper rounding logic when using PercentChangeInCapacity to autoscale server group sizes. Specifically, autoscaling did not correctly round up if: (MinSize x ScalingAdjustment / 100.00) < 1.0 As a result, certain combinations of MinSize and ScalingAdjustments could incorrectly prevent under-scaled groups from scaling up. This fix adds the necessary rounding logic to PercentChangeInCapacity. As a result, PercentChangeInCapacity now correctly scales up as needed, regardless of the MinSize and ScalingAdjustment.
Clone Of:
Environment:
Last Closed: 2013-12-20 00:35:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1251007 0 None None None Never
OpenStack gerrit 16394 0 None None None Never
OpenStack gerrit 56281 0 None None None Never
OpenStack gerrit 56360 0 None None None Never
Red Hat Product Errata RHEA-2013:1859 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2013-12-21 00:01:48 UTC

Description Eoghan Glynn 2013-11-13 20:49:10 UTC
Description of problem:

If the AdjustmentType is set to PercentChangeInCapacity, then depending on the choice of the instance group MinSize and the scale-up policy ScalingAdjustment, no autoscaling actions may occur even when the under-scaled alarm fires.

This situation occurs if (MinSize * ScalingAdjustment / 100.0) < 1.0, in which case the group never get gets scaled up even if the under-scaled alarm stays in the alarm state forever.

It sounds like an edge-case, but would probably be common enough in reality, e.g. MinSize = 3, ScalingAdjustment = 33% or MinSize = 4, ScalingAdjustment = 20% etc.

The problem is that AutoScaling.adjust() does not pay enough attention to rounding issues. It should instead follow the rounding rules used by AWS Autoscaling:

http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-scale-based-on-demand.html

i.e. round the adjustment up if abs(adjustment) < 1.0, otherwise round down


Version-Release number of selected component (if applicable):

RHOS 4.0 / Havana


How reproducible:

100%


Steps to Reproduce:

0. Ensure that the heat and ceilometer services are running (including the openstack-ceilometer-alarm-evaluator and openstack-ceilometer-alarm-notifier)


1. Create an autoscaled stack template with the following scale-up config:

my_template.yaml:
...
  ServerGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AvailabilityZones: {'Fn::GetAZs': ''}
      LaunchConfigurationName: {Ref: LaunchConfig}
      MinSize: '1'
      MaxSize: '5'
      Tags:
      - {Key: metering.server_group, Value: ServerGroup}
  ServerScaleUpPolicy:
    Type: AWS::AutoScaling::ScalingPolicy
    Properties:
      AdjustmentType: PercentChangeInCapacity
      AutoScalingGroupName: {Ref: ServerGroup}
      Cooldown: '60'
      ScalingAdjustment: '33'
  CPUAlarmHigh:
    Type: OS::Metering::Alarm
    Properties:
      description: Scale-up if the average CPU > 1% for 1 minute
      meter_name: cpu_util
      statistic: avg
      period: '60'
      evaluation_periods: '1'
      threshold: '0.01'
      alarm_actions:
      - {"Fn::GetAtt": [ServerScaleUpPolicy, AlarmUrl]}
      matching_metadata: {'metadata.user_metadata.server_group': 'ServerGroup'}
      comparison_operator: gt
      repeat_actions: True
... etc.


2. Create a stack from this template and wait for its state to transition to CREATE_COMPLETE:

  $ heat stack-create --template-file my_template.yaml my_stack
  $ watch "heat stack-show my_stack | grep status"


3. Note that scale-up action does not occur, even though the cpu_util for the single resources clearly exceeds 0.01% and the under-scaled alarm has fired:

  $ INSTANCE_ID=$(nova list | awk '/my_stack/ {print $2}')
  $ ceilometer statistics -m cpu_util -p 60 -q "resource_id=$INSTANCE_ID"
  # note the Avg statistic exceeds 0.01%
 
  $ ceilometer alarm-list | grep my_stack-CPUAlarmHigh
  # note the current state is 'alarm'

  $ watch "nova list | grep my_stack-ServerGroup | wc -l"
  # note the autoscaling group never grows beyond the initial size of 1


Actual results:

  $ $ nova list | grep ServerGroup | wc -l
  1


Expected results:

  $ $ nova list | grep ServerGroup | wc -l
  5


Additional info:

The same problem will occur will with watch-rules as opposed to ceilometer alarms.

Comment 2 Eoghan Glynn 2013-11-13 21:00:10 UTC
Fix proposed on master upstream:

  https://review.openstack.org/56281

Comment 3 Eoghan Glynn 2013-11-14 07:34:35 UTC
Landed on master upstream:

  https://github.com/openstack/heat/commit/5720e6d6

Comment 4 Eoghan Glynn 2013-11-14 10:53:53 UTC
Proposed to stable/havana upstream:

  https://review.openstack.org/56360

Comment 5 Eoghan Glynn 2013-11-15 07:13:22 UTC
Landed on stable/havana upstream:

  https://github.com/openstack/heat/commit/f33297d7f13eedc8e8aa5e5db294d3a725679974

Comment 6 Eoghan Glynn 2013-12-04 15:31:28 UTC
Backport landed in internal repo:

  https://code.engineering.redhat.com/gerrit/16394

Comment 10 errata-xmlrpc 2013-12-20 00:35:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html


Note You need to log in before you can comment on or make changes to this bug.