Bug 1127526 - [RFE][ceilometer]: Central Agent work-load partitioning [NEEDINFO]
Summary: [RFE][ceilometer]: Central Agent work-load partitioning
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Upstream M3
: 6.0 (Juno)
Assignee: Eoghan Glynn
QA Contact: Amit Ugol
URL: https://blueprints.launchpad.net/ceil...
Whiteboard: upstream_milestone_juno-3 upstream_de...
Depends On:
Blocks: 1184668
TreeView+ depends on / blocked
 
Reported: 2014-08-07 04:02 UTC by RHOS Integration
Modified: 2015-02-09 14:59 UTC (History)
9 users (show)

Fixed In Version: openstack-ceilometer-2014.2-2.el7ost
Doc Type: Enhancement
Doc Text:
In previous releases, the accuracy and timeliness of Telemetry samples could be negatively impacted if the central agent became overloaded by a large number of resources. To mitigate this, the Telemetry service now features workload partitioning; this features allows the central agent to scale horizontally with each instance polling a disjointed set of resources. To do this, the 'tooz' utility coordinates group membership accross multiple central agents that share polling of resources.
Clone Of:
: 1184668 (view as bug list)
Environment:
Last Closed: 2015-02-09 14:59:44 UTC
ddomingo: needinfo? (eglynn)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2015:0149 normal SHIPPED_LIVE openstack-ceilometer enhancement advisory 2015-02-09 19:53:01 UTC

Description RHOS Integration 2014-08-07 04:02:58 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/ceilometer/+spec/central-agent-partitioning.

Description:

Provide a mechanism to allow the central agent to be horizontally scaled out, such that each agent polls a disjoint subset of resources.

Specification URL (additional information):

None

Comment 2 Eoghan Glynn 2014-11-26 19:34:50 UTC
Here's an approach to verifying central agent parititioning based on
tooz, by examining the way swift polling is partitioned by tenant.

Ensure the tooz and redis packages are installed and the redis service
is running on the controller host:

  $ sudo yum install -y python-tooz python-redis redis
  $ sudo service redis restart

(note that these packages are not all yet available for RHEL7 via the
LPC channel, so for now you could use the Fedora/EPEL & RDO packages
to get started)

Configure the tooz backend as redis with the following setting in the
/etc/ceilometer/ceilometer.conf:

  [coordination]
  backend_url = redis://CONTROLLER_HOSTNAME:6379

Accelerate the polling interval, so that we see a steady stream of
samples:

  $ sudo sed -i 's/interval: 600$/interval: 60/' \
      /etc/ceilometer/pipeline.yaml

Restart the central agent and check the subset of swift tenants that
this agent taking as its partition (swift polling is partitioned
per-tenant).

  $ grep 'My subset.*Tenant' /var/log/ceilometer/central.log  | head

This subset should initially only comprise the stock tenants such as
admin and services, assuming a fresh installation.

We then proceed to create 10 new tenants, and post a swift container
for each:

  $ for i in {1..10} ; do
      keystone tenant-create --name swift-tenant-$i \
        --description "swift tenant $i"
      keystone user-create --name swift-user-$i --tenant swift-tenant-$i \
        --pass swift-pass-$i
      keystone user-role-add --user swift-user-$i --role ResellerAdmin \
        --tenant swift-tenant-$i
      swift --os-project-name swift-tenant-$i --os-username swift-user-$i \
        --os-password swift-pass-$i post swift-container-$i
    done

Now we check that all these tenants are being taken care of by the
single central agent:

  $ function swift_tenants_in_subset {
      grep 'My subset' $1 | awk '/swift-tenant/ \
        {count=0;
         for (i = 1; i <= NF; i++) {
           if ($i ~ "swift-tenant-") {
             printf("%s ", $i);
             count++;
           }
           if ($i == $NF) printf(" count: %d\n", count);
           }
         }' | sed "s/u'/'/g"
    }
  $ sleep 60 ; swift_tenants_in_subset /var/log/ceilometer/central.log

The count in this case indicates that all 10 new tenants are included
in the partition for the single central agent (i.e. a trivial
partitioning).

We also ensure that the same number of storage.objects.containers
has been submitted for each tenant over the past 5 minutes:

  $ function count_samples_per_tenant {
      FIVE_MINS_AGO=$(date -u +"%Y-%m-%dT%H:%M:%SZ" -d '-5mins')
      ceilometer statistics -a count -g project_id -m $1 \
        -q "timestamp>=$FIVE_MINS_AGO"
    }
  $ count_samples_per_tenant storage.objects.containers

Either install & run openstack-ceilometer-central on a second node, or
as a shortcut simply launch a second process on the controller:

  $ sudo /usr/bin/python /usr/bin/ceilometer-agent-central \
        --logfile /var/log/ceilometer/central-2.log &

Check that an approximately even split of swift tenants is allocated
to each central agent:

  $ sleep 60 ; swift_tenants_in_subset /var/log/ceilometer/central.log
  $ swift_tenants_in_subset /var/log/ceilometer/central-2.log

Ensure that there is no duplication in the samples collected
per-tenant:

  $ count_samples_per_tenant storage.objects.containers

Then kill the additional central agent and check that all tenants
revert to the original agent:

  sudo kill $(ps -fe | grep central-2 | awk '{print $2}')
  sleep 60 ; swift_tenants_in_subset  /var/log/ceilometer/central.log

Comment 4 Eoghan Glynn 2014-12-18 09:55:25 UTC
Note that debug logging must be enabled for the testing approach described in comment #2, by ensuring the debug & verbose config options are set in /etc/ceilometer/ceilometer.conf and restarting central agent if necessary.

Comment 6 Chris Dent 2015-01-21 14:14:45 UTC
Please see the testing description in comments to see the related configuration settings to be used to enable the feature.

Comment 8 errata-xmlrpc 2015-02-09 14:59:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0149.html


Note You need to log in before you can comment on or make changes to this bug.