Description of problem: It looks like agent-notification is calculating the rate transformation on any controllers and it often has no "previous" value stored in its cache. That means that the rates are not really calculated. Version-Release number of selected component (if applicable): openstack-ceilometer-notification-7.1.0-4 How reproducible: All the time Actual results: agent-notification.log-20171129:2017-11-28 15:22:55.626 273838 DEBUG ceilometer.pipeline [-] Pipeline disk_sink: Transform sample <name: disk.device.allocation, volume: 643072, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:22:54.105361> from 0 transformer _publish_samples /usr/lib/python2.7/site-packages/ceilometer/ pipeline.py:486 agent-notification.log-20171129:2017-11-28 15:22:55.642 273838 DEBUG ceilometer.transformer.conversions [-] handling sample <name: disk.device. allocation, volume: 643072, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:22:54.105361> handle_sample /usr/lib/python2.7/site-packages/ceilometer/transformer/conversions.py:186 agent-notification.log-20171129:2017-11-28 15:22:55.663 273838 DEBUG ceilometer.transformer.conversions [-] converted to: <name: disk.device. allocation, volume: 0.0, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:22:54.105361> handle_sample /usr/lib/python2.7/site-packages/ceilometer/transformer/conversions.py:220 agent-notification.log-20171129:2017-11-28 15:32:55.283 82546 DEBUG ceilometer.pipeline [-] Pipeline disk_sink: Transform sample <name: disk.device. allocation, volume: 643072, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:32:54.093339> from 0 transformer _publish_samples /usr/lib/python2.7/site-packages/ceilometer/ pipeline.py:486 agent-notification.log-20171129:2017-11-28 15:32:55.301 82546 DEBUG ceilometer.transformer.conversions [-] handling sample <name: disk.device. allocation, volume: 643072, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:32:54.093339> handle_sample /usr/lib/python2.7/site-packages/ceilometer/transformer/conversions.py:186 agent-notification.log-20171129:2017-11-28 15:32:55.318 82546 WARNING ceilometer.transformer.conversions [-] dropping sample with no predecessor: (<name: disk.device.allocation, volume: 643072, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:32:54.093339>,) agent-notification.log-20171129:2017-11-28 15:42:55.200 82546 DEBUG ceilometer.pipeline [-] Pipeline disk_sink: Transform sample <name: disk.device. allocation, volume: 643072, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:42:54.125560> from 0 transformer _publish_samples /usr/lib/python2.7/site-packages/ceilometer/ pipeline.py:486 agent-notification.log-20171129:2017-11-28 15:42:55.216 82546 DEBUG ceilometer.transformer.conversions [-] handling sample <name: disk.device. allocation, volume: 643072, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:42:54.125560> handle_sample /usr/lib/python2.7/site-packages/ceilometer/transformer/conversions.py:186 agent-notification.log-20171129:2017-11-28 15:42:55.232 82546 DEBUG ceilometer.transformer.conversions [-] converted to: <name: disk.device.allocation, volume: 0.0, resource_id: 033f8e35-64b1-4cb3-8dc7-43803c2ce894-hda, timestamp: 2017-11-28T15:42:54.125560> handle_sample /usr/lib/python2.7/site-packages/ceilometer/transformer/conversions.py:220 Expected results: Agent Notification should not be dropping samples. It should be able to get the previous value, or at least have access to that information. It looks like it's taking this value from the cache. Additional info: Some samples are not plotted in gnocchi.
We have two ways to do that: - The current way to do it, on Ceilometer side, by setting workload_partitioning=True This creates many new queues on rabbitmq to be able to ensure that all "cpu" samples are routed to the same ceilometer-agent-notification worker. But this increases the cpu usage of ceilometer-agent-notification, the load on rabbitmq, and adds lag to the processing. Also that's solution is not perfect because samples can still comes unordered. So if the received sample is older that the previous kept one, it will be dropped. This computation of the rate of change will be good, but some points will miss like when workload_partitioning=False. This feature does not have comprehensive testing and I have reviewed many fixes upstream that are not backported in stable versions. It decreases a the performance of Ceilometer. - A better way to do it, on Gnocchi side: Create a special archive policy for all rated metrics (cpu_util, network.*rate, disk.*rate, ...), that computes the "rate:last" aggregation. Better calculation, Gnocchi keep all needed points to compute that correctly. No more missing point for "rate of change" computation. But it requires Gnocchi 4.X, so that can't be used before OSP12. And the archive policy need to be create manually.
vcpus, disk.ephemeral.size, disk.root.size are sent by nova every hour, so that normal you didn't see them every 10 minutes. Others are the rate metrics issue I'm talking about comment 6 and 9.
*** Bug 1525977 has been marked as a duplicate of this bug. ***
Verified, automated
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045