Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1073408

Summary: samples from notifications for autoscaled instances cannot be persisted in mongo
Product: Red Hat OpenStack Reporter: Eoghan Glynn <eglynn>
Component: openstack-ceilometerAssignee: Eoghan Glynn <eglynn>
Status: CLOSED ERRATA QA Contact: Ami Jeain <ajeain>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: ajeain, ddomingo, eglynn, jruzicka, pbrady, sclewis, sdake, yeylon
Target Milestone: z4Keywords: OtherQA, ZStream
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ceilometer-2013.2.2-5.el6ost Doc Type: Bug Fix
Doc Text:
Previously, the structure of user metadata in notifications for autoscales instances were unsuitable for persisting in MongoDB. As a result, any samples generated from such notifications were effectively dropped. The mapping logic for pollster-originated samples ensures that user metadata is suitable for persisting in MongoDB. With this release, this same mapping logic is used for user metadata for samples derives from instance-related notifications. This, in turn, helps ensure that the data persists in MongoDB, thereby preventing the notifications from being dropped.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-29 19:58:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Autoscaling template to reproduce issue. none

Description Eoghan Glynn 2014-03-06 11:46:10 UTC
Created attachment 871361 [details]
Autoscaling template to reproduce issue.

Description of problem:

For polled samples related to autoscaled instances, we apply a mapping to the user metadata so as to ensure that there are no embedded periods.

However, the same user metdata will also be present for notifications that related to the an autoscaled instance, but the mapping is not applied in that case.

This causes the mongo driver to fail with not okForStorage when the sample derived from the notification is dispatched.

For example, when the sample derived from a compute.instance.delete.end is processed:

  ceilometer-ceilometer.openstack.common.rpc.amqp DEBUG: received {u'_context_request_id': u'req-1fa1edc3-7cc2-482a-8526-7a308daacbe0', u'args': {u'data': [{u'counter_name': u'disk.root.size', u'user_id': u'ef4e983291ef4ad1b88eb1f776bd52b6', u'resource_id': u'0b900e89-cf10-44a8-a79c-f2f68f55fc07', u'timestamp': u'2014-02-24 11:59:55.295790', u'message_signature': u'49ea1df965fd09b6ce3c84812977e9c21474030006510ef402afd6b2e12c9f90', u'resource_metadata': {u'state_description': u'', u'event_type': u'compute.instance.delete.end', u'availability_zone': None, u'terminated_at': u'2014-02-24T11:59:54.965388', u'ephemeral_gb': 0, u'instance_type_id': 2, u'deleted_at': u'', u'reservation_id': u'r-8pz6nwrl', u'instance_id': u'0b900e89-cf10-44a8-a79c-f2f68f55fc07', u'user_id': u'ef4e983291ef4ad1b88eb1f776bd52b6', u'hostname': u'tyky-group-a-wste7bd63rkl-group-a-1-xzgy6fkgqvhc', u'state': u'deleted', u'launched_at': u'2014-02-24T11:58:16.000000', u'metadata': {u'metering.server_group': u'Group_A', u'AutoScalingGroupName': u'tyky-Group_A-wste7bd63rkl', u'assign_floating_ip': u'true'}, u'node': u'node-17', u'ramdisk_id': u'', u'access_ip_v6': None, u'disk_gb': 1, u'access_ip_v4': None, u'kernel_id': u'', u'host': u'compute.node-17', u'display_name': u'tyky-Group_A-wste7bd63rkl-Group_A-1-xzgy6fkgqvhc', u'image_ref_url': u'http://192.168.100.4:9292/images/11848cbf-a428-4dfb-8818-2f0a981f540b', u'root_gb': 1, u'tenant_id': u'efcca4ba425c4beda73eb31a54df931a', u'created_at': u'2014-02-24 11:58:08+00:00', u'memory_mb': 512, u'instance_type': u'm1.tiny', u'vcpus': 1, u'image_meta': {u'min_disk': u'1', u'container_format': u'bare', u'min_ram': u'0', u'disk_format': u'qcow2', u'base_image_ref': u'11848cbf-a428-4dfb-8818-2f0a981f540b'}, u'architecture': None, u'os_type': None, u'instance_flavor_id': u'1'}, u'source': u'openstack', u'counter_unit': u'GB', u'counter_volume': 1, u'project_id': u'efcca4ba425c4beda73eb31a54df931a', u'message_id': u'2d03ed96-9d4b-11e3-b0e0-080027e519cb', u'counter_type': u'gauge'}]}, u'_context_auth_token': '<SANITIZED>', u'_context_show_deleted': False, u'_context_tenant': None, u'_unique_id': u'6bf474db7f9f4bc3a3a4342508c54218', u'_context_is_admin': True, u'version': u'1.0', u'_context_read_only': False, u'_context_user': None, u'method': u'record_metering_data'}
<47>Feb 24 11:59:55 node-16 <U+FEFF>ceilometer-ceilometer.openstack.common.rpc.amqp DEBUG: unpacked context: {'read_only': False, 'show_deleted': False, 'auth_token': '<SANITIZED>', 'is_admin': True, 'user': None, 'request_id': u'req-1fa1edc3-7cc2-482a-8526-7a308daacbe0', 'tenant': None}
<47>Feb 24 11:59:55 node-16 <U+FEFF>ceilometer-ceilometer.collector.dispatcher.database DEBUG: metering data disk.root.size for 0b900e89-cf10-44a8-a79c-f2f68f55fc07 @ 2014-02-24 11:59:55.295790: 1
<47>Feb 24 11:59:55 node-16 ceilometer-amqp DEBUG: Channel open
<43>Feb 24 11:59:55 node-16 ceilometer-ceilometer.collector.dispatcher.database ERROR: Failed to record metering data: not okForStorage
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ceilometer/collector/dispatcher/database.py", line 65, in record_metering_data
    self.storage_conn.record_metering_data(meter)
  File "/usr/lib/python2.7/dist-packages/ceilometer/storage/impl_mongodb.py", line 417, in record_metering_data
    upsert=True,
  File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 487, in update
    check_keys, self.__uuid_subtype), safe)
  File "/usr/lib/python2.7/dist-packages/pymongo/mongo_client.py", line 969, in _send_message
    rv = self.__check_response_to_last_error(response)
  File "/usr/lib/python2.7/dist-packages/pymongo/mongo_client.py", line 911, in __check_response_to_last_error
    raise OperationFailure(details["err"], details["code"])
OperationFailure: not okForStorage


Version-Release number of selected component (if applicable):

openstack-ceilometer-2013.2.1-2.el6ost


How reproducible:

100%


Steps to Reproduce:

0. Install ceilometer in the usual way, e.g. with packstack allinone, including the Heat services also.


1. Upload the cirros images if not already present in glance:

  sudo yum install -y wget
  wget http://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-uec.tar.gz
  tar zxvf cirros-0.3.0-x86_64-uec.tar.gz 
 glance add name=cirros-aki is_public=true container_format=aki disk_format=aki < cirros-0.3.0-x86_64-vmlinuz 
  glance add name=cirros-ari is_public=true container_format=ari disk_format=ari < cirros-0.3.0-x86_64-initrd 
  glance add name=cirros-ami is_public=true container_format=ami disk_format=ami \
     "kernel_id=$(glance index | awk '/cirros-aki/ {print $1}')" \
     "ramdisk_id=$(glance index | awk '/cirros-ari/ {print $1}')" < cirros-0.3.0-x86_64-blank.img  


2. Add a UserKey if not already present in nova:

  nova keypair-add --pub_key ~/.ssh/id_rsa.pub userkey


3. Create stack with the attached template:

  heat stack-create test_stack --template-file=template.yaml --parameters="KeyName=userkey;InstanceType=m1.tiny;ImageId=$CIRROS_AMI_IMAGE"


4. Wait for the stack creation to complete:

  watch "heat stack-show test_stack | grep status"


5. Verify that the a number of a server from the configured group has become active:

  watch "nova list | grep ServerGroup"


6. Check for the error in the ceilometer collector logs:

  grep "OperationFailure: not okForStorage" /var/log/ceilometer/collector.log


Actual results:

Error is seen indicating that metering message could not be persisted.


Expected results:

Error should not be seen.

Comment 1 Eoghan Glynn 2014-03-06 11:50:17 UTC
Fix proposed to master upstream:

  https://review.openstack.org/77959

Comment 2 Steven Dake 2014-03-26 14:58:36 UTC
adjusting priority/severity to indicate this is something  that we should be working on.

Comment 3 Eoghan Glynn 2014-04-02 16:25:11 UTC
Fix landed on master upstream:

  https://github.com/openstack/ceilometer/commit/ddeb54bb

and proposed to stable/havana upstream:

  https://review.openstack.org/84096

also backported internally:

  https://code.engineering.redhat.com/gerrit/22363

Comment 4 Eoghan Glynn 2014-04-03 16:34:31 UTC
Internal backport has landed on rhos-4.0-rhel-6-patches with SHA:

  1e4963cd6fef1c1e9fdec6336baab6f1f37f3fed

Comment 7 Eoghan Glynn 2014-04-04 11:18:19 UTC
OtherQA
=======

1. Download rebuilt RPMs:

   https://brewweb.devel.redhat.com/buildinfo?buildID=348179

2. Upgrade ceilometer packages:

   $ SUFF=2013.2.2-5.el6ost.noarch.rpm
   $ for p in openstack-ceilometer-alarm openstack-ceilometer-api openstack-ceilometer-central openstack-ceilometer-collector openstack-ceilometer-common openstack-ceilometer-compute python-ceilometer
     do
         sudo yum upgrade -y $p-$SUFF
     done

3. Stop the ceilometer services:

   $ CEILO_SVCS='compute central collector api alarm-evaluator alarm-notifier'
   $ for svc in $CEILO_SVCS ; do sudo service openstack-ceilometer-$svc stop ; done

4. Archive the collector log file:

   $ sudo gzip -S $(date "+%FT%T").gz /var/log/ceilometer/collector.log

5. Restart the ceilometer services:

   $ CEILO_SVCS='compute central collector api alarm-evaluator alarm-notifier'
   $ for svc in $CEILO_SVCS ; do sudo service openstack-ceilometer-$svc start ; done

6. Create a Heat stack with autoscaling enabled and wait for creation to complete:

   $ heat stack-create test_stack --template-file=./template.yaml --parameters="KeyName=userkey;ImageId=$CIRROS_AMI"
   $ watch "heat stack-list"
   
7. Wait for newly spun-up instances to become ACTIVE:

   $ watch "nova list | grep ACTIVE"

8. Ensure "not okForStorage" error is not emiited

   $ sudo grep "not okForStorage" /var/log/ceilometer/collector.log

Comment 10 errata-xmlrpc 2014-05-29 19:58:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0577.html