1127526 – [RFE][ceilometer]: Central Agent work-load partitioning

Bug 1127526 - [RFE][ceilometer]: Central Agent work-load partitioning

Summary: [RFE][ceilometer]: Central Agent work-load partitioning

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-ceilometer
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Upstream M3
Target Release:	6.0 (Juno)
Assignee:	Eoghan Glynn
QA Contact:	Amit Ugol
Docs Contact:
URL:	https://blueprints.launchpad.net/ceil...
Whiteboard:	upstream_milestone_juno-3 upstream_de...
Depends On:
Blocks:	1184668
TreeView+	depends on / blocked

Reported:	2014-08-07 04:02 UTC by RHOS Integration
Modified:	2023-09-14 02:50 UTC (History)
CC List:	9 users (show)
Fixed In Version:	openstack-ceilometer-2014.2-2.el7ost
Doc Type:	Enhancement
Doc Text:	In previous releases, the accuracy and timeliness of Telemetry samples could be negatively impacted if the central agent became overloaded by a large number of resources. To mitigate this, the Telemetry service now features workload partitioning; this features allows the central agent to scale horizontally with each instance polling a disjointed set of resources. To do this, the 'tooz' utility coordinates group membership accross multiple central agents that share polling of resources.
Clone Of:
Clones:	1184668 (view as bug list)
Environment:
Last Closed:	2015-02-09 14:59:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-28555	0	None	None	None	2023-09-14 02:48:13 UTC
Red Hat Product Errata	RHEA-2015:0149	0	normal	SHIPPED_LIVE	openstack-ceilometer enhancement advisory	2015-02-09 19:53:01 UTC

Description RHOS Integration 2014-08-07 04:02:58 UTC

Cloned from launchpad blueprint https://blueprints.launchpad.net/ceilometer/+spec/central-agent-partitioning.

Description:

Provide a mechanism to allow the central agent to be horizontally scaled out, such that each agent polls a disjoint subset of resources.

Specification URL (additional information):

None

Comment 2 Eoghan Glynn 2014-11-26 19:34:50 UTC

Here's an approach to verifying central agent parititioning based on
tooz, by examining the way swift polling is partitioned by tenant.

Ensure the tooz and redis packages are installed and the redis service
is running on the controller host:

  $ sudo yum install -y python-tooz python-redis redis
  $ sudo service redis restart

(note that these packages are not all yet available for RHEL7 via the
LPC channel, so for now you could use the Fedora/EPEL & RDO packages
to get started)

Configure the tooz backend as redis with the following setting in the
/etc/ceilometer/ceilometer.conf:

  [coordination]
  backend_url = redis://CONTROLLER_HOSTNAME:6379

Accelerate the polling interval, so that we see a steady stream of
samples:

  $ sudo sed -i 's/interval: 600$/interval: 60/' \
      /etc/ceilometer/pipeline.yaml

Restart the central agent and check the subset of swift tenants that
this agent taking as its partition (swift polling is partitioned
per-tenant).

  $ grep 'My subset.*Tenant' /var/log/ceilometer/central.log  | head

This subset should initially only comprise the stock tenants such as
admin and services, assuming a fresh installation.

We then proceed to create 10 new tenants, and post a swift container
for each:

  $ for i in {1..10} ; do
      keystone tenant-create --name swift-tenant-$i \
        --description "swift tenant $i"
      keystone user-create --name swift-user-$i --tenant swift-tenant-$i \
        --pass swift-pass-$i
      keystone user-role-add --user swift-user-$i --role ResellerAdmin \
        --tenant swift-tenant-$i
      swift --os-project-name swift-tenant-$i --os-username swift-user-$i \
        --os-password swift-pass-$i post swift-container-$i
    done

Now we check that all these tenants are being taken care of by the
single central agent:

  $ function swift_tenants_in_subset {
      grep 'My subset' $1 | awk '/swift-tenant/ \
        {count=0;
         for (i = 1; i <= NF; i++) {
           if ($i ~ "swift-tenant-") {
             printf("%s ", $i);
             count++;
           }
           if ($i == $NF) printf(" count: %d\n", count);
           }
         }' | sed "s/u'/'/g"
    }
  $ sleep 60 ; swift_tenants_in_subset /var/log/ceilometer/central.log

The count in this case indicates that all 10 new tenants are included
in the partition for the single central agent (i.e. a trivial
partitioning).

We also ensure that the same number of storage.objects.containers
has been submitted for each tenant over the past 5 minutes:

  $ function count_samples_per_tenant {
      FIVE_MINS_AGO=$(date -u +"%Y-%m-%dT%H:%M:%SZ" -d '-5mins')
      ceilometer statistics -a count -g project_id -m $1 \
        -q "timestamp>=$FIVE_MINS_AGO"
    }
  $ count_samples_per_tenant storage.objects.containers

Either install & run openstack-ceilometer-central on a second node, or
as a shortcut simply launch a second process on the controller:

  $ sudo /usr/bin/python /usr/bin/ceilometer-agent-central \
        --logfile /var/log/ceilometer/central-2.log &

Check that an approximately even split of swift tenants is allocated
to each central agent:

  $ sleep 60 ; swift_tenants_in_subset /var/log/ceilometer/central.log
  $ swift_tenants_in_subset /var/log/ceilometer/central-2.log

Ensure that there is no duplication in the samples collected
per-tenant:

  $ count_samples_per_tenant storage.objects.containers

Then kill the additional central agent and check that all tenants
revert to the original agent:

  sudo kill $(ps -fe | grep central-2 | awk '{print $2}')
  sleep 60 ; swift_tenants_in_subset  /var/log/ceilometer/central.log

Comment 4 Eoghan Glynn 2014-12-18 09:55:25 UTC

Note that debug logging must be enabled for the testing approach described in comment #2, by ensuring the debug & verbose config options are set in /etc/ceilometer/ceilometer.conf and restarting central agent if necessary.

Comment 6 Chris Dent 2015-01-21 14:14:45 UTC

Please see the testing description in comments to see the related configuration settings to be used to enable the feature.

Comment 8 errata-xmlrpc 2015-02-09 14:59:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0149.html

Comment 9 Red Hat Bugzilla 2023-09-14 02:45:17 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.