Description of problem: it is not possible temporary stop/suspend provider including all its workers in case there is running maintenance (or for example network maintenance leading to connectivity issue to provider). Usually there is a lot of error messages in logs related to workers that belongs to provider in maintenance. It is just possible to Delete, but it is not practical as it deletes all related configuration items stored in database. Version-Release number of selected component (if applicable): 5.7.z How reproducible: all of the time Steps to Reproduce: 1.place provider in maintenance mode 2. 3. Actual results: worker errors Expected results: a way to stop/pause/suspend a provider that is in maintenance mode Additional info:
Hi Brad, Assigning this to you for evaluation. Thanks Bronagh
Ryan, I am told that the approach we recommend is to create an "In Maintenance" zone and place the provider in it, effectively taking away its credentials. I understand you may have started a tech note on this, is that correct? Thanks Bronagh
hi Dave, I am assigning this to you hoping the recommendation in comment4 above can be tested. Thanks Bronagh
Hello Bronagh, I am wondering how will "in maintenance" zone help. Thy to imagine there are more zones in region, each zone have more than one appliance taking care for more than one provider - I will have 2 standalone providers called "A" and "B" (does not matter type of provider, may be same, but fulfilling serve different requirements) connected. In "Maintenance" zone scenario if I will like to make provided "B" maintenance I will have to move at least one appliances to this zone and deactivate workers (or have there spare appliance). But it can lead to many issues. For example by removing appliance from "working" zone will lost some performance within zone. In database there will be changed relation of provider to zone (probably many stuff must be reindexed). Additionally person ho will be responsible for this operation can make mistake - activate appliance in different zone then required, forget to activate some worker, etc... In my opinion the only feasible solution is to be able to have option to stop / start provider (so it will start / stop its workers) similarly as: - workers are started when provider is connected (appliance started) - workers are restarted when "Check authentication" action is initiated - workers are stopped when appliance is stopped or provider deleted Kind regards, Vaclav
Jan, Why did the customer not like the work around that was presented by Dave Johnson: "...the customer should create a "parking" zone with no appliances in it to "park" the environment. Since the zone has no appliances, all management should stop until the provider is moved back to its zone with appliances. Hope that makes sense. " Bronagh
Hi Bronagh, We are going to add to our implementation client dedicated environments (client = provider), at initial phase (july) there will be more than 10 providers, later the year additional clients will be enabled (migration from current "legacy" solution. There are several reasons technical for this feature requirement: - on CFME 5.7.3 (released two days ago) was tested, that in case provider is not available already, it is required to validate its credentials before it can be saved in "parking" zone. So this approach is not working (I can imagine it would be possible in case of planned maintenance to park provider before it is disconnected). - in case there are more providers in the zone (for example we have 3 vCenters managed by one vCD), it is not possible to just disable workers because of one provider under maintenance. Additionaly, from business perspective I would expect that enterprise ready application will support not only provider Registration, Deletion and "some workaround" but Stop and Start as well. Kind regards, Vaclav
https://github.com/ManageIQ/manageiq/pull/17452
I've discussed with colleagues, how to properly label the new buttons, and we agreed on using "Suspend" / "Resume". While checking the toolbars, where the feature should be added, I've found a toolbar that already has the buttons for the feature in containers providers screen, introduced in manageiq-ui-classic/2603 [1]. The buttons are using terminology "Pause" / "Resume". I'm not sure, which version is correct for this case. I've tried to find the answer in Patternfly Terminology and Wording [2], but the information is missing (an issue was created though, the icons might be an issue as well [3]). The OpenStack documentation [4] explains the difference between "pause" and "suspend" for VMs, but I'm not sure it is applicable in this case as well. For now, I'll use the same terminology as in container providers to keep consistency. Later I can update the PR with correct labels for the new buttons and fix the merged ones as well. Roman [1] https://github.com/ManageIQ/manageiq-ui-classic/pull/2603 [2] http://www.patternfly.org/styles/terminology-and-wording/ [3] https://github.com/patternfly/patternfly-design/issues/670 [4] https://wiki.openstack.org/wiki/Kvm-Pause-Suspend
Toolbar buttons for the functionality and notification for summary view added in: * https://github.com/ManageIQ/manageiq/pull/17500 * https://github.com/ManageIQ/manageiq-ui-classic/pull/4012
Issue with PR relations: https://github.com/ManageIQ/manageiq/issues/17489
https://github.com/ManageIQ/manageiq-schema/pull/222
https://github.com/ManageIQ/manageiq/pull/17602
https://github.com/ManageIQ/manageiq/pull/17602 is not related - mistake
UI PR: https://github.com/ManageIQ/manageiq-ui-classic/pull/4269
https://github.com/ManageIQ/manageiq-api/pull/434
Dear customer, The CloudForms team is reviewing the current CloudForms RFE(Request for Enhancement) backlog in order to improve our responsiveness to customers. We are closing any requests for versions no longer within full support(link below to the lifecycle) or that do not have a clear spot on the product roadmap. We are committing to better management of the backlog as we move forward. If you have an RFE that you still have a strong business case for, please open a new BZ against the currently supported version 4.6. Lifecycle page: https://access.redhat.com/support/policy/updates/cloudforms If you have any concerns about this, please let us know. Thanks and regards!”
https://github.com/ManageIQ/manageiq/pull/18037
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/352ddb4f1b54186eef5fd9273828f6ee87a3250c commit 352ddb4f1b54186eef5fd9273828f6ee87a3250c Author: Martin Slemr <mslemr> AuthorDate: Mon May 21 09:51:29 2018 -0400 Commit: Martin Slemr <mslemr> CommitDate: Mon May 21 09:51:29 2018 -0400 Pause/Resume EMS Enables/Disables ems with children and puts to maintenance zone https://bugzilla.redhat.com/show_bug.cgi?id=1455145 app/models/ext_management_system.rb | 61 +- app/models/miq_queue.rb | 1 + app/models/zone.rb | 12 + spec/models/ext_management_system_spec.rb | 47 + spec/models/zone_spec.rb | 2 +- 5 files changed, 111 insertions(+), 12 deletions(-)
New commit detected on ManageIQ/manageiq-api/master: https://github.com/ManageIQ/manageiq-api/commit/cc4dd1b981d6a6ebeb872dc72a95f847b5054fb5 commit cc4dd1b981d6a6ebeb872dc72a95f847b5054fb5 Author: Dávid Halász <dhalasz> AuthorDate: Fri Jul 27 06:09:13 2018 -0400 Commit: Dávid Halász <dhalasz> CommitDate: Fri Jul 27 06:09:13 2018 -0400 Use the new universal methods for suspending/resuming a provider https://bugzilla.redhat.com/show_bug.cgi?id=1455145 app/controllers/api/providers_controller.rb | 4 +- 1 file changed, 2 insertions(+), 2 deletions(-)
Hey Martin/David can this be moved to POST?
No, Martin is still testing a PR for foreman/ansible: https://github.com/ManageIQ/manageiq-ui-classic/pull/5173
https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/155 added to support frontend on ansible
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/0634007d7c47ff18524a935296237aca5fade759 commit 0634007d7c47ff18524a935296237aca5fade759 Author: Martin Slemr <mslemr> AuthorDate: Thu Jan 24 07:15:03 2019 -0500 Commit: Martin Slemr <mslemr> CommitDate: Thu Jan 24 07:15:03 2019 -0500 EMS.enable!/disable! removed from public Replaced by pause!/resume! instead https://bugzilla.redhat.com/show_bug.cgi?id=1455145 app/models/ext_management_system.rb | 18 +- 1 file changed, 6 insertions(+), 12 deletions(-)
New commit detected on ManageIQ/manageiq-providers-ansible_tower/master: https://github.com/ManageIQ/manageiq-providers-ansible_tower/commit/329999d8130588877c075bb204d6ebf5d428ca8f commit 329999d8130588877c075bb204d6ebf5d428ca8f Author: Martin Slemr <mslemr> AuthorDate: Mon Feb 4 07:28:04 2019 -0500 Commit: Martin Slemr <mslemr> CommitDate: Mon Feb 4 07:28:04 2019 -0500 Changed provider zone when EMS paused/resumed Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1455145 app/models/manageiq/providers/ansible_tower/shared/automation_manager.rb | 11 + app/models/manageiq/providers/ansible_tower/shared/provider.rb | 2 +- spec/support/ansible_shared/automation_manager.rb | 37 + 3 files changed, 49 insertions(+), 1 deletion(-)
The previous commit isn't the last one, just wrongly contains the fixes keyword, setting back to ON_DEV.
List of all related PRs: https://github.com/ManageIQ/manageiq/issues/17489
Verified in 5.11.0.13. Suspended vmware provider and then it resumed after several hours. While suspended no items were changed and no provider refresh was triggered. After resuming items were changed and refresh automatically triggered.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:4199