Hide Forgot
Description of problem: Heat not able to maintain loadbalancer minimum member count. Customer is using two templates, one without load-balancer configuratio nand another with load-balancer configuration. When creating stack using *without* load-balancer template with minimum count of 2 nodes, when we are deleting an instance manually using "nova delete <instanceid>" command, heat is able to spin new instance to maintain the minimum instance count. However the same is not happening when we are creating stack *with* load-balancer template. Here is the heartbeat definition used in templates: From without load balancer template : ~~~ heartbeat_alarm: type: OS::Ceilometer::Alarm properties: comparison_operator: lt evaluation_periods: '1' meter_name: instance period: '60' statistic: count threshold: {get_param: desired_capacity} alarm_actions: - {get_attr: [java_server_scaleup_policy, alarm_url]} matching_metadata: {'metadata.user_metadata.stack': {get_param: "OS::stack_id"}} ~~~ From with load balancer template: ~~~ heartbeat_alarm: type: OS::Ceilometer::Alarm properties: comparison_operator: lt evaluation_periods: '2' meter_name: instance period: '60' statistic: count threshold: {get_param: desired_capacity} alarm_actions: - {get_attr: [web_server_scaleup_policy, alarm_url]} matching_metadata: {'metadata.user_metadata.stack': {get_param: "OS::stack_id"}} ~~~ Version-Release number of selected component (if applicable): RHEL OSP 9 # awk '/heat/ {print $1}' installed-rpms openstack-heat-api-6.1.0-1.el7.noarch openstack-heat-api-cfn-6.1.0-1.el7.noarch openstack-heat-api-cloudwatch-6.1.0-1.el7.noarch openstack-heat-common-6.1.0-1.el7.noarch openstack-heat-engine-6.1.0-1.el7.noarch python-heatclient-1.2.0-1.el7ost.noarch How reproducible: Everytime for Cu. Steps to Reproduce: 1. Create heat stack with load balancer configuration and add two members to it. 2. Delete one of the member. 3. heat is not able to spawn a new instance to maintain the minimum count defined in template. Actual results: Heat is not spawning new instance to maintain the min. count. Expected results: Heat should spawning new instance to maintain the min. count. Additional info:
stack-show on the nested (autoscaling group) stack will tell you why it failed. You can get the uuid of the nested stack by doing "openstack stack resource show WebServer-Stack WebServerASG", it's listed as the physical_resource_id.
So the error is: resources.tdhf2cznzqnd: StackValidationFailed: resources.member: Property error: member.Properties.address: Error validating value '' So it looks like the scaling unit is a nested stack, and that it contains a resource named 'member' with a property called 'address', and the address is resolving to an empty string when it needs to be a valid IP address. This could be a problem with the template, or it could be a bug in Heat. (At the validation stage, intrinsic functions like {get_attr: } don't return valid values, but Heat ought to cope with that gracefully.) Could you attach the lb_server.yaml template?
Here is the conten of lb_server.yaml template. ~~~ heat_template_version: 2013-05-23 description: A load-balancer server parameters: image: type: string description: Image used for servers key_name: type: string description: SSH key to connect to the servers flavor: type: string description: flavor used by the servers pool_id: type: string description: Pool to contact user_data: type: string description: Server user_data metadata: type: json network: type: string description: Network used by the server resources: server: type: OS::Nova::Server properties: flavor: {get_param: flavor} image: {get_param: image} key_name: {get_param: key_name} metadata: {get_param: metadata} user_data: {get_param: user_data} user_data_format: RAW networks: [{network: {get_param: network} }] member: type: OS::Neutron::PoolMember properties: pool_id: {get_param: pool_id} address: {get_attr: [server, first_address]} protocol_port: 80 outputs: server_ip: description: IP Address of the load-balanced server. value: { get_attr: [server, first_address] } lb_member: description: LB member details. value: { get_attr: [member, show] } ~~~
OK, that looks like a Heat bug then... {get_attr: [server, first_address]} probably returns an empty string during validation, and the validation ought to be able to handle that but apparently it's complaining. I wonder how it managed to create the autoscaling group in the first place without running into this issue for the initial members...
I suspect it may be failing to get the IP addresses of the the _existing_ members. For resources that aren't created yet, we should always get None returned for their attribute values without even asking the resource. An empty string (which is returned by the resource itself) suggests that the resource was in the created state but getting the server's address failed for some reason. That also explains how the group could be created initially but updating it fails. The first_address attribute is deprecated. You should probably replace that line with: address: {get_attr: [server, networks, {get_param: network}, 0]} That might actually resolve the problem.
Thanks Zane. Suggested the same to Cu. Awaiting Cu. response.
Zane, as per the latest update from Cu. they are still hitting the issue after making suggested change.
Hello Zane, Here's the updated template received from the customer: heat_template_version: 2013-05-23 description: A load-balancer instance parameters: image: type: string description: Image used for instances key_name: type: string description: SSH key to connect to the instances flavor: type: string description: flavor used by the instances pool_id: type: string description: Pool to contact user_data: type: string description: Server user_data metadata: type: json network: type: string description: Network used by the instance #security_groups: # type: string # description: Webinstance Security group resources: server: type: OS::Nova::Server properties: flavor: {get_param: flavor} image: {get_param: image} key_name: {get_param: key_name} metadata: {get_param: metadata} user_data: {get_param: user_data} #security_groups: webserverSG #security_groups: [{security_groups: {get_param: security_groups}}] user_data_format: RAW networks: [{network: {get_param: network} }] member: type: OS::Neutron::PoolMember properties: pool_id: {get_param: pool_id} address: {get_attr: [server, networks, {get_param: network}, 0]} #address: {get_attr: [instance, first_address]} protocol_port: 80 outputs: # instance_ip: # description: IP Address of the load-balanced instance. #value: { get_attr: [instance, first_address] } lb_member: description: LB member details. value: { get_attr: [member, show] } As mentioned by vikrant, the customer is still hitting the same issue. Thanks.
OK, after reading more carefully here, I see the cause of the problem. You're deleting a server from Nova manually, but Heat doesn't know that it's missing, so when it comes to validate the template the server is not found. This causes it to return a default value for the IP address (an empty string, as it happens), and that is being rejected as a valid IP address by the pool member. I'm not sure why this would have worked in Liberty but not Mitaka. Possibly the validation became more robust in Mitaka. One thing you can do is mark the server resource that you've deleted as FAILED using the 'resource mark-unhealthy' command. That should convince the autoscaling template generator to remove that resource from the template. I'll continue investigating to see if there's a way we can avoid the error in this case.
Hello Zane, Many thanks for suggesting the resource mark-unhealthy command. This suggestion has worked for the customer. After marking the deleting the resource as unhealthy, new instance was spawned automatically. Could you please advice if there is a way in which this can be incorporated in the heat template itself ? Thanks and Regards, Punit
This should get fixed in Pike by https://review.openstack.org/#/c/422983/ when it merges. For current releases... we _could_ fix the {get_attr: [instance, first_address]} attribute by changing the default value that it returns to something that will pass the IP address constraint, i.e. '0.0.0.0' instead of ''. But this attribute is already deprecated. There's no sane way to make {get_attr: [server, networks, {get_param: network}, 0]} not return None. I haven't seen the error message for this case, but I suspect it's failing in a different spot - it's a required property with a value of None (which reads as nothing specified) at a time when Heat is expecting to have resolved the real value. I can't think of anything we can do to resolve that.
Configured the following stack template: heat_template_version: pike resources: server: type: OS::Nova::Server properties: image: cirros-0.3.5-x86_64-disk flavor: m1.nano networks: - network: heat-net - subnet: heat-subnet value: type: OS::Heat::Value properties: value: {get_attr: [server, first_address]} type: string heat_template_version: pike resources: asg: type: OS::Heat::AutoScalingGroup properties: resource: type: server.yaml min_size: 2 desired_capacity: 3 max_size: 5 scale_up_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 scale_dn_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: '-1' outputs: scale_up_url: value: {get_attr: [scale_up_policy, alarm_url]} scale_dn_url: value: {get_attr: [scale_dn_policy, alarm_url]} Server has been deleted and by creting this stack 3 new servers has been scaled with the server stack attributes
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462