Bug 1331803
| Summary: | perf_capture_time method resulting in the failure to schedule C&U capture | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Colin Arnott <carnott> | ||||||
| Component: | Performance | Assignee: | Keenan Brock <kbrock> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Nandini Chandra <nachandr> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 5.5.0 | CC: | benglish, carnott, cpelland, dajohnso, dmetzger, jhardy, kbrock, kmorey, ldomb, mfeifer, nachandr, obarenbo, simaishi | ||||||
| Target Milestone: | GA | Keywords: | ZStream | ||||||
| Target Release: | 5.6.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | c&u | ||||||||
| Fixed In Version: | 5.6.0.6 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Previously, the perf_capture_time method responsible for generating and capturing capacity and utilization data in the message queue failed frequently. As a result, capacity and utilization capture was not scheduled for most elements in the environment. This occurred when metrics for virtual machines orphaned from an EMS were handled incorrectly. This patch fixes the issue so that when an EMS is orphaned, the contents of ems_cluster are cleared, and metrics collection is only scheduled for virtual machines with an EMS.
|
Story Points: | --- | ||||||
| Clone Of: | |||||||||
| : | 1333096 (view as bug list) | Environment: | |||||||
| Last Closed: | 2016-06-29 15:56:14 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1333096 | ||||||||
| Attachments: |
|
||||||||
|
Description
Colin Arnott
2016-04-29 14:44:05 UTC
It looks like some of the Vms did not properly clear the cluster A quickfix: Please run the following from a rails console on a machine: # clear ems cluster on orphaned vms VmOrTemplate.where(:ems_id => nil).where.not(:ems_cluster_id => nil).update_all(:ems_cluster_id => nil) # remove cap and u for orphaned vms MiqQueue.where(:class_name => "ManageIQ::Providers::Vmware::InfraManager::Vm", :instance_id => VmOrTemplate.where(:ems_id => nil).select(:id), method_name: "perf_rollup").destroy_all.count I will make a code change so this will no happen in the future. Here is the code I used to diagnose the problem: I ran this in the rails console AvailabilityZone.where(:ext_management_system => nil).count EmsCluster.where(:ext_management_system => nil).count Host.where(:ext_management_system => nil).count VmOrTemplate.where(:ext_management_system => nil).count VmOrTemplate.where(:ext_management_system => nil).group(:ems_cluster_id).count A count of 0 for everything is ultimate. But a Vm without an ems is not that bad. it is a retired or orphaned vm. The issue to be aware is if it is still linked to a cluster. The sample database that I had showed that there were records in the db that had a cluster and no ems. New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/15bbd18c6df1234bdeb59ac5602273a6534cb440 commit 15bbd18c6df1234bdeb59ac5602273a6534cb440 Author: Keenan Brock <kbrock> AuthorDate: Fri Apr 29 17:06:51 2016 -0400 Commit: Keenan Brock <kbrock> CommitDate: Sat Apr 30 12:42:58 2016 -0400 Clear cluster when ems is cleared https://bugzilla.redhat.com/show_bug.cgi?id=1331803 When an ems is orphaned, clear out the ems_cluser as well app/models/ems_cluster.rb | 2 +- app/models/host.rb | 1 + app/models/vm_or_template.rb | 1 + spec/models/host_spec.rb | 20 ++++++++++++++++++++ spec/models/vm_or_template_spec.rb | 21 +++++++++++++++++++++ 5 files changed, 44 insertions(+), 1 deletion(-) Created attachment 1153910 [details]
Patch to fix cloud provider orphans breaking cap and u collection
This can be applied with
patch -p0 < cap_u_ems_c.patch
To see if the patch will work, use the following script.
It should show bad before the patch and 'good' after ti
vmdb
bundle exec rails c
Zone.all.map { |zone| Metric::Targets.capture_cloud_targets(zone).detect { |vm| vm.ems_id.nil? } ? "#{zone.name}: BAD" : "#{zone.name}: GOOD" }
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/db0a24629e9442b2ba754bb374c6141b9aba7430 commit db0a24629e9442b2ba754bb374c6141b9aba7430 Author: Keenan Brock <kbrock> AuthorDate: Wed May 4 22:43:34 2016 -0400 Commit: Keenan Brock <kbrock> CommitDate: Wed May 4 22:53:30 2016 -0400 Metric::Target capture_cloud_targets rewrite - only bring back vms that are on (when availability_zone.nil?) - only bring back vms that have an ems https://bugzilla.redhat.com/show_bug.cgi?id=1332579 https://bugzilla.redhat.com/show_bug.cgi?id=1331803 app/models/metric/targets.rb | 1 + 1 file changed, 1 insertion(+) Created attachment 1156704 [details]
Patch to fix cloud provider orphans (v2)
Sorry about that.
The previous patch assumed some previous changes to the models.
Here is one that I have tested on 5.5.2.4
vmdb
patch -p1 < cap_u_ems_c.v2.patch
Hi Nandini, The issue is when a vm is in a cluster or an availability zone. Then when the vm is deleted on ec2, we call that orphaning. Locally we remove the association between the vm and the ems. The bug is that while the vm is not linked to the ems, it is still linked to the cluster or availability zone. We approached this with 2 similar fixes: 1. Ensure that the vms won't be included in the submit 'cap&u' job (this BZ) 2. Cleared the cluster and availability zone when orphaning a vm. I suppose with #2, #1 is not necessary. But #1 also got us some performance improvements. If fix #2 is interfering with you, you could always orphan a vm and then go into rails console and assign a cluster to an orphaned vm. VmOrTemplate.orphaned.first.update_attributes(:ems_cluster => EmsCluster.first) Thanks Keenan. Steps to reproduce: 1)Terminate an instance from the ec2 console. 2)Refresh ems. With the fix, Run these commands on the rails console 1. Vm.where(:ems_id => nil).count Output : VM count is greater than 0. 2. Vm.where(:ems_id => nil).where.not(:availability_zone_id => nil).count Output : VM count = 0 Verified that C&U continues to work after this step. 3. Vm.where(:ems_id => nil).update_all(:availability_zone_id => AvailabilityZone.first.id) Verified that C&U continues to work after this step. Verified in 5.6.0.10 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1348 |