Bug 1030064 - [heat] no autoscaling action occurs for percentage adjustment, depending on initial size & adjustment step size
[heat] no autoscaling action occurs for percentage adjustment, depending on i...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
4.0
Unspecified Unspecified
high Severity high
: rc
: 4.0
Assigned To: Eoghan Glynn
Kevin Whitney
: OtherQA, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-13 15:49 EST by Eoghan Glynn
Modified: 2014-02-02 17:40 EST (History)
7 users (show)

See Also:
Fixed In Version: openstack-heat-engine-2013.2-2.0.el6ost
Doc Type: Bug Fix
Doc Text:
The Orchestration engine did not use proper rounding logic when using PercentChangeInCapacity to autoscale server group sizes. Specifically, autoscaling did not correctly round up if: (MinSize x ScalingAdjustment / 100.00) < 1.0 As a result, certain combinations of MinSize and ScalingAdjustments could incorrectly prevent under-scaled groups from scaling up. This fix adds the necessary rounding logic to PercentChangeInCapacity. As a result, PercentChangeInCapacity now correctly scales up as needed, regardless of the MinSize and ScalingAdjustment.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-19 19:35:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1251007 None None None Never
OpenStack gerrit 16394 None None None Never
OpenStack gerrit 56281 None None None Never
OpenStack gerrit 56360 None None None Never

  None (edit)
Description Eoghan Glynn 2013-11-13 15:49:10 EST
Description of problem:

If the AdjustmentType is set to PercentChangeInCapacity, then depending on the choice of the instance group MinSize and the scale-up policy ScalingAdjustment, no autoscaling actions may occur even when the under-scaled alarm fires.

This situation occurs if (MinSize * ScalingAdjustment / 100.0) < 1.0, in which case the group never get gets scaled up even if the under-scaled alarm stays in the alarm state forever.

It sounds like an edge-case, but would probably be common enough in reality, e.g. MinSize = 3, ScalingAdjustment = 33% or MinSize = 4, ScalingAdjustment = 20% etc.

The problem is that AutoScaling.adjust() does not pay enough attention to rounding issues. It should instead follow the rounding rules used by AWS Autoscaling:

http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-scale-based-on-demand.html

i.e. round the adjustment up if abs(adjustment) < 1.0, otherwise round down


Version-Release number of selected component (if applicable):

RHOS 4.0 / Havana


How reproducible:

100%


Steps to Reproduce:

0. Ensure that the heat and ceilometer services are running (including the openstack-ceilometer-alarm-evaluator and openstack-ceilometer-alarm-notifier)


1. Create an autoscaled stack template with the following scale-up config:

my_template.yaml:
...
  ServerGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AvailabilityZones: {'Fn::GetAZs': ''}
      LaunchConfigurationName: {Ref: LaunchConfig}
      MinSize: '1'
      MaxSize: '5'
      Tags:
      - {Key: metering.server_group, Value: ServerGroup}
  ServerScaleUpPolicy:
    Type: AWS::AutoScaling::ScalingPolicy
    Properties:
      AdjustmentType: PercentChangeInCapacity
      AutoScalingGroupName: {Ref: ServerGroup}
      Cooldown: '60'
      ScalingAdjustment: '33'
  CPUAlarmHigh:
    Type: OS::Metering::Alarm
    Properties:
      description: Scale-up if the average CPU > 1% for 1 minute
      meter_name: cpu_util
      statistic: avg
      period: '60'
      evaluation_periods: '1'
      threshold: '0.01'
      alarm_actions:
      - {"Fn::GetAtt": [ServerScaleUpPolicy, AlarmUrl]}
      matching_metadata: {'metadata.user_metadata.server_group': 'ServerGroup'}
      comparison_operator: gt
      repeat_actions: True
... etc.


2. Create a stack from this template and wait for its state to transition to CREATE_COMPLETE:

  $ heat stack-create --template-file my_template.yaml my_stack
  $ watch "heat stack-show my_stack | grep status"


3. Note that scale-up action does not occur, even though the cpu_util for the single resources clearly exceeds 0.01% and the under-scaled alarm has fired:

  $ INSTANCE_ID=$(nova list | awk '/my_stack/ {print $2}')
  $ ceilometer statistics -m cpu_util -p 60 -q "resource_id=$INSTANCE_ID"
  # note the Avg statistic exceeds 0.01%
 
  $ ceilometer alarm-list | grep my_stack-CPUAlarmHigh
  # note the current state is 'alarm'

  $ watch "nova list | grep my_stack-ServerGroup | wc -l"
  # note the autoscaling group never grows beyond the initial size of 1


Actual results:

  $ $ nova list | grep ServerGroup | wc -l
  1


Expected results:

  $ $ nova list | grep ServerGroup | wc -l
  5


Additional info:

The same problem will occur will with watch-rules as opposed to ceilometer alarms.
Comment 2 Eoghan Glynn 2013-11-13 16:00:10 EST
Fix proposed on master upstream:

  https://review.openstack.org/56281
Comment 3 Eoghan Glynn 2013-11-14 02:34:35 EST
Landed on master upstream:

  https://github.com/openstack/heat/commit/5720e6d6
Comment 4 Eoghan Glynn 2013-11-14 05:53:53 EST
Proposed to stable/havana upstream:

  https://review.openstack.org/56360
Comment 5 Eoghan Glynn 2013-11-15 02:13:22 EST
Landed on stable/havana upstream:

  https://github.com/openstack/heat/commit/f33297d7f13eedc8e8aa5e5db294d3a725679974
Comment 6 Eoghan Glynn 2013-12-04 10:31:28 EST
Backport landed in internal repo:

  https://code.engineering.redhat.com/gerrit/16394
Comment 10 errata-xmlrpc 2013-12-19 19:35:29 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html

Note You need to log in before you can comment on or make changes to this bug.