Bug 1472359
| Summary: | Ceilometer CPU utilization over 100% | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Julien Danjou <jdanjou> |
| Component: | openstack-ceilometer | Assignee: | Mehdi ABAAKOUK <mabaakou> |
| Status: | CLOSED ERRATA | QA Contact: | Sasha Smolyak <ssmolyak> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 (Kilo) | CC: | ccollett, cfields, eglynn, jdanjou, jruzicka, jzaher, mabaakou, mmethot, mschuppe, pkilambi, srevivo, ssigwald, ssmolyak |
| Target Milestone: | zstream | Keywords: | Triaged, ZStream |
| Target Release: | 7.0 (Kilo) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-ceilometer-2015.1.3-5.el7ost | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1468041 | Environment: | |
| Last Closed: | 2017-09-12 17:21:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1468041 | ||
| Bug Blocks: | |||
|
Comment 1
Mehdi ABAAKOUK
2017-07-18 14:50:22 UTC
I have continued my investigation for backport patches on osp7. And a important dependency is missing "python-monotonic". To summarize issues with the 3 patches to backport:
* capping cpu_util to 100%:
should be OK to backport but does not fix anything, just hide the root cause.
* using cpu.X.time and cpu.X.wait:
-> need libvirt >= 1.3.2 (upstream recommended 2.0.0)
rhel7.2 provides only 1.2.17
(even if the customer uses a more recent version, we can't require more for osp7)
-> need kernel compiled with CONFIG_SCHEDSTATS = yes, that's not the case of
rhel7.2 kernel
* not using 'timestamp' for rate of change computation:
need python-monotonic, but it doesn't exists in osp7 and rhel7.2
Also all theses changes can't be cherry-picked as-is, many extra works are needed to adapt to the old Ceilometer version.
So, I do not recommended backport all these things to osp 7.
After some discus, I may be able to backports the 3 packages, but the customer have to upgrade to rhel7.3 to get them. Is that OK for you? I have built the test package here: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=575799 The Ceilometer configuration may need to be updated like this: --- /etc/ceilometer/pipeline.yaml 2017-07-24 11:46:48.102149542 +0000 +++ /etc/ceilometer/pipeline.yaml.rpmnew 2017-07-21 20:21:40.000000000 @@ -47,8 +47,8 @@ name: "cpu_util" unit: "%" type: "gauge" + max: 100 scale: "100.0 / (10**9 * (resource_metadata.cpu_number or 1))" - max: 100.0 publishers: - notifier:// - name: disk_sink For testing, I have also change the pipeline interval from 600 seconds to 10 seconds: With the old package I can see rounded cpu_util and sometimes it goes over 100: # ceilometer sample-list -m cpu_util -l 20 +--------------------------------------+----------+-------+---------------+------+---------------------+ | Resource ID | Name | Type | Volume | Unit | Timestamp | +--------------------------------------+----------+-------+---------------+------+---------------------+ | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.4 | % | 2017-07-24T11:41:43 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.5 | % | 2017-07-24T11:41:33 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 85.1 | % | 2017-07-24T11:41:23 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 102.777777778 | % | 2017-07-24T11:41:13 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 91.4545454545 | % | 2017-07-24T11:41:04 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.0 | % | 2017-07-24T11:40:53 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 96.4 | % | 2017-07-24T11:40:43 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.6 | % | 2017-07-24T11:40:33 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 99.3 | % | 2017-07-24T11:40:23 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.8 | % | 2017-07-24T11:40:13 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.5 | % | 2017-07-24T11:40:03 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.4 | % | 2017-07-24T11:39:53 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.5 | % | 2017-07-24T11:39:43 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.8 | % | 2017-07-24T11:39:33 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.4 | % | 2017-07-24T11:39:23 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.4 | % | 2017-07-24T11:39:13 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.1 | % | 2017-07-24T11:39:03 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 100.875 | % | 2017-07-24T11:38:53 | With the new package and rhel 7.3, I don't see rounded cpu_util anymore and it never goes over 100: $ ceilometer sample-list -m cpu_util -l 20 +--------------------------------------+----------+-------+---------------+------+---------------------+ | Resource ID | Name | Type | Volume | Unit | Timestamp | +--------------------------------------+----------+-------+---------------+------+---------------------+ | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.5413043261 | % | 2017-07-24T11:52:54 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.1651770972 | % | 2017-07-24T11:52:44 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 96.5857996951 | % | 2017-07-24T11:52:34 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.34093511 | % | 2017-07-24T11:52:24 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.6299238006 | % | 2017-07-24T11:52:14 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.7236292496 | % | 2017-07-24T11:52:04 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.3480216046 | % | 2017-07-24T11:51:54 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.2662685 | % | 2017-07-24T11:51:44 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.6208533461 | % | 2017-07-24T11:51:34 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.3754089654 | % | 2017-07-24T11:51:24 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.5859138236 | % | 2017-07-24T11:51:14 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.7494590835 | % | 2017-07-24T11:51:04 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.4863709209 | % | 2017-07-24T11:50:54 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.1475099223 | % | 2017-07-24T11:50:44 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 96.9730960019 | % | 2017-07-24T11:50:34 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.4502699386 | % | 2017-07-24T11:50:24 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 95.8053930767 | % | 2017-07-24T11:50:14 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 97.414334798 | % | 2017-07-24T11:50:04 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.4509706794 | % | 2017-07-24T11:49:54 | | adb6d45d-b710-4b62-aec2-5e801ee2584d | cpu_util | gauge | 98.2290031534 | % | 2017-07-24T11:49:44 | +--------------------------------------+----------+-------+---------------+------+---------------------+ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2699 |