Bug 1209711
| Summary: | Thousands of OSP cinder snapshots cause significant EmsRefresh slowdown | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Thomas Hennessy <thenness> | ||||
| Component: | Providers | Assignee: | Matthew Draper <mdraper> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Thom Carlin <tcarlin> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 5.3.0 | CC: | dclarizi, gblomqui, jdeubel, jfrey, jhardy, jocarter, mfeifer, obarenbo, snansi, tcarlin, thenness | ||||
| Target Milestone: | GA | ||||||
| Target Release: | 5.4.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Known Issue | |||||
| Doc Text: |
The previous version of CloudForms Management Engine collected inventory on all snapshots on Red Hat OpenStack Platform providers, causing significant EmsRefresh slowdown. This issue was caused by scalability issues in the refresh process, which could not handle large volumes of OpenStack Platform cinder snapshots. A temporary workaround has been provided to fix this issue until the underlying scalability problem can be fixed. The fix will allow users that have a large number of snapshots to disable the collection of inventory information for snapshots, which avoids the EmsRefresh slowdown.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-06-16 12:57:12 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
https://github.com/ManageIQ/manageiq/pull/2723 Doc text could largely be pulled directly from the PR comment. New commit detected on manageiq/master: https://github.com/ManageIQ/manageiq/commit/a54999a048cc40be7af899375ac9e5d969463579 commit a54999a048cc40be7af899375ac9e5d969463579 Author: Matthew Draper <mdraper> AuthorDate: Thu Apr 23 02:51:52 2015 +0930 Commit: Matthew Draper <mdraper> CommitDate: Thu Apr 23 02:51:52 2015 +0930 Provide a limited ability for OpenStack refresh to skip item types Documenting this in the config file seems likely to make it too enticing; as a quick fix, it's not really as universal as it would first appear. To wit: while it works for our immediate situation (needing to skip `:cloud_volumes` and `:cloud_volume_snapshots`), it would have no effect on (for example) `:firewall_rules`, and would actively break for `:security_groups`. Not to mention the likelihood of someone assuming such an option will work equally in other providers, where it would in fact be ignored completely. With the above limitations in mind, to use, configure: ems_refresh: openstack: :inventory_ignore: - :cloud_volumes - :cloud_volume_snapshots https://bugzilla.redhat.com/show_bug.cgi?id=1209711 vmdb/app/models/ems_refresh/parsers/openstack.rb | 1 + .../openstack_refresher_rhos_havana_spec.rb | 24 ++++++++++++++++++++++ 2 files changed, 25 insertions(+) This fix is largely a temporary workaround until we can address the real scalability issue being tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1214780 Please provide steps to reproduce. sure: create an Openstack environemtn with a few thousand images. take multiple volume snapshots of each from OpenStack. when you have about 16k volume snapshots do a standard ems refresh of the openstack provider. I suspect this is not likely to be a reasonable ask for the QE department, but that is the environment in which this problem has presented itself. it seems to me that a better question that ought to be being asked is: Is there a reasonable number of volume snapshots that we ought to include? Should any be included? What does CFME do with these? Verified in 5.4.0.5.20150605150206_7daa1a8 by: 1) Run SmartState Analysis on image and verifying both cloud_volumes and cloud_volume_snapshots are created 2) Configuring as above and verifying neither cloud_volumes nor cloud_volume_snapshots are created Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1100.html |
Created attachment 1012010 [details] zip of current fog.log from customer QA appliance (not the full archive set) Description of problem:Customer is reporting that a single instance of OpenStack is being used in multiple CFME environments. Each environment is reporting the same degradation in emsrefresh times over the course of several weeks, growing from about two minutes to > 1200 seconds or more. Version-Release number of selected component (if applicable): Version: 5.3.0.15 Build: 20140929084440_2192916 How reproducible: Behavior not reproduced in testing environment, but is reproduced in several of the customer environments after CFME has been operating for at least two weeks. Steps to Reproduce: 1. 2. 3. Actual results: EMS refresh times grow on a daily basis. Each worker process lives the restart_interval time period and is replaced by another worker process. The growth in emsrefresh times seems to be independent of worker process. Expected results: ems refresh times should stay roughly the same in an environment where vm instance counts are either roughtly the same or are actually reduced by nearly 50% Additional info: