Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1472359

Summary: Ceilometer CPU utilization over 100%
Product: Red Hat OpenStack Reporter: Julien Danjou <jdanjou>
Component: openstack-ceilometerAssignee: Mehdi ABAAKOUK <mabaakou>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: ccollett, cfields, eglynn, jdanjou, jruzicka, jzaher, mabaakou, mmethot, mschuppe, pkilambi, srevivo, ssigwald, ssmolyak
Target Milestone: zstreamKeywords: Triaged, ZStream
Target Release: 7.0 (Kilo)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-ceilometer-2015.1.3-5.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1468041 Environment:
Last Closed: 2017-09-12 17:21:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1468041    
Bug Blocks:    

Comment 1 Mehdi ABAAKOUK 2017-07-18 14:50:22 UTC
osp7 run on rhel-7.2 that have only libvirt-1.2.17.

I can only backport the patch that cap the cpu_util to 100.

Other patches require libvirt >= 2.0.0.

Comment 4 Mehdi ABAAKOUK 2017-07-19 08:52:28 UTC
I have continued my investigation for backport patches on osp7. And a important dependency is missing "python-monotonic".

Comment 5 Mehdi ABAAKOUK 2017-07-19 12:41:17 UTC
To summarize issues with the 3 patches to backport:

* capping cpu_util to 100%:
  should be OK to backport but does not fix anything, just hide the root cause.

* using cpu.X.time and cpu.X.wait:
  -> need libvirt >= 1.3.2 (upstream recommended 2.0.0)
     rhel7.2 provides only 1.2.17 
     (even if the customer uses a more recent version, we can't require more for osp7)
  -> need kernel compiled with CONFIG_SCHEDSTATS = yes, that's not the case of
     rhel7.2 kernel

* not using 'timestamp' for rate of change computation:
  need python-monotonic, but it doesn't exists in osp7 and rhel7.2


Also all theses changes can't be cherry-picked as-is, many extra works are needed to adapt to the old Ceilometer version.

So, I do not recommended backport all these things to osp 7.

Comment 6 Mehdi ABAAKOUK 2017-07-19 17:11:35 UTC
After some discus, I may be able to backports the 3 packages, but the customer have to upgrade to rhel7.3 to get them.

Is that OK for you?

Comment 9 Mehdi ABAAKOUK 2017-07-24 11:56:11 UTC
I have built the test package here: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=575799

The Ceilometer configuration may need to be updated like this:

--- /etc/ceilometer/pipeline.yaml       2017-07-24 11:46:48.102149542 +0000
+++ /etc/ceilometer/pipeline.yaml.rpmnew        2017-07-21 20:21:40.000000000 
@@ -47,8 +47,8 @@
                     name: "cpu_util"
                     unit: "%"
                     type: "gauge"
+                    max: 100
                     scale: "100.0 / (10**9 * (resource_metadata.cpu_number or 1))"
-                    max: 100.0
       publishers:
           - notifier://
     - name: disk_sink

For testing, I have also change the pipeline interval from 600 seconds to 10 seconds:


With the old package I can see rounded cpu_util and sometimes it goes over 100:

# ceilometer sample-list -m cpu_util -l 20
+--------------------------------------+----------+-------+---------------+------+---------------------+
| Resource ID                          | Name     | Type  | Volume        | Unit | Timestamp           |
+--------------------------------------+----------+-------+---------------+------+---------------------+
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.4          | %    | 2017-07-24T11:41:43 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.5          | %    | 2017-07-24T11:41:33 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 85.1          | %    | 2017-07-24T11:41:23 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 102.777777778 | %    | 2017-07-24T11:41:13 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 91.4545454545 | %    | 2017-07-24T11:41:04 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.0          | %    | 2017-07-24T11:40:53 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 96.4          | %    | 2017-07-24T11:40:43 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.6          | %    | 2017-07-24T11:40:33 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 99.3          | %    | 2017-07-24T11:40:23 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.8          | %    | 2017-07-24T11:40:13 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.5          | %    | 2017-07-24T11:40:03 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.4          | %    | 2017-07-24T11:39:53 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.5          | %    | 2017-07-24T11:39:43 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.8          | %    | 2017-07-24T11:39:33 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.4          | %    | 2017-07-24T11:39:23 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.4          | %    | 2017-07-24T11:39:13 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.1          | %    | 2017-07-24T11:39:03 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 100.875       | %    | 2017-07-24T11:38:53 |


With the new package and rhel 7.3, I don't see rounded cpu_util anymore and it never goes over 100:

$ ceilometer sample-list -m cpu_util -l 20
+--------------------------------------+----------+-------+---------------+------+---------------------+
| Resource ID                          | Name     | Type  | Volume        | Unit | Timestamp           |
+--------------------------------------+----------+-------+---------------+------+---------------------+
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.5413043261 | %    | 2017-07-24T11:52:54 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.1651770972 | %    | 2017-07-24T11:52:44 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 96.5857996951 | %    | 2017-07-24T11:52:34 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.34093511   | %    | 2017-07-24T11:52:24 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.6299238006 | %    | 2017-07-24T11:52:14 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.7236292496 | %    | 2017-07-24T11:52:04 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.3480216046 | %    | 2017-07-24T11:51:54 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.2662685    | %    | 2017-07-24T11:51:44 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.6208533461 | %    | 2017-07-24T11:51:34 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.3754089654 | %    | 2017-07-24T11:51:24 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.5859138236 | %    | 2017-07-24T11:51:14 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.7494590835 | %    | 2017-07-24T11:51:04 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.4863709209 | %    | 2017-07-24T11:50:54 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.1475099223 | %    | 2017-07-24T11:50:44 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 96.9730960019 | %    | 2017-07-24T11:50:34 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.4502699386 | %    | 2017-07-24T11:50:24 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.8053930767 | %    | 2017-07-24T11:50:14 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.414334798  | %    | 2017-07-24T11:50:04 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.4509706794 | %    | 2017-07-24T11:49:54 |
| adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.2290031534 | %    | 2017-07-24T11:49:44 |
+--------------------------------------+----------+-------+---------------+------+---------------------+

Comment 16 errata-xmlrpc 2017-09-12 17:21:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2699