Bug 1569090
Summary: | During metrics collection for a VMWare provider, SOAP exception occurs during queryAvailablePerfMetric for non-existent VM | |||
---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Robb Manes <rmanes> | |
Component: | Providers | Assignee: | Adam Grare <agrare> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nandini Chandra <nachandr> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 5.8.0 | CC: | cpelland, gblomqui, jfrey, jhardy, nachandr, obarenbo, rmanes, rovalent, simaishi | |
Target Milestone: | GA | Keywords: | TestOnly, ZStream | |
Target Release: | 5.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | 5.10.0.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1595461 1595462 (view as bug list) | Environment: | ||
Last Closed: | 2019-02-11 14:06:36 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | VMware | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1595461, 1595462 |
Description
Robb Manes
2018-04-18 15:16:37 UTC
Hey Robb, what most likely happened here is that while the metrics capture request sat in the queue that VM was deleted. The vm will be archived (have its ems_id nulled out) which will cause it to not be part of the provider's vms collection anymore which is why `vm = provider.vms.find_by(name: "vm-1849")` returned nil. You can try `VmOrTemplate.where(:ems_ref => 'vm-1849')` which will search archived vms as well as current vms. Actually, you searched for vms named 'vm-1849' instead of VM's whose ems_ref is 'vm-1846'. For VMware we use ems_ref to track the ManagedObjectReference. > Questions are: > A) Are we intentionally trying to query a VM that is non-existent for C&U data? Is it expected that VMWare PerformanceManager objects can be queried/store data about VM's that no longer exist in the infrastructure? No > B) If this is an error, can we not query VM's that do not exist to prevent errors? We can add a check to ensure that the VM's target.ext_management_system isn't nil before querying metrics, it will still fail but at least with a better error message. It should be noted that this will not catch all errors, this will still happen if we try to query metrics after the vm was deleted but before the refresh runs and disconnects the vm. > C) Since the VM name does not exist in the vms table, where are we getting the VM name when we do C&U to create SOAP calls? That's actually the VM's ems_ref not the name. It looks like the name is mdhas_TPS (from the log line you pasted above). We are getting the ems_ref from the database because the metrics capture args on the queue store the ID of the VM we are capturing metrics for. In fact the ID is in the log line, try VmOrTemplate.find(4000000001845). Actually we do exactly this kind of check already, https://github.com/ManageIQ/manageiq-providers-vmware/blob/f89f47c322e3dda61fdac38ebc37ae20cfa77997/app/models/manageiq/providers/vmware/infra_manager/metrics_capture.rb#L267-L268 Robb can you run that query again and check for the ems_ref instead of the name to see if we think there is still a VM with that ref attached to the provider? And you said you already checked the MOB and didn't see that VM? In this case it looks like the summary.config.uuid property is nil, while the config.uuid property is available. The simple fix is to try to get the uuid from either summary.config.uuid or config.uuid. Longer term I think we should look at only not disconnecting vms which are invalid, rather than a single invalid vm causing nothing to be disconnected from the whole environment given this has other side-effects (specifically we will try to collect metrics from deleted VMs). New commit detected on ManageIQ/vmware_web_service/master: https://github.com/ManageIQ/vmware_web_service/commit/ae3e1c45e163efef14fa119580bd80eed594c247 commit ae3e1c45e163efef14fa119580bd80eed594c247 Author: Adam Grare <agrare> AuthorDate: Wed May 2 13:30:04 2018 -0400 Commit: Adam Grare <agrare> CommitDate: Wed May 2 13:30:04 2018 -0400 Add VirtualMachine config.uuid to VimPropMaps Collect config.uuid in addition to summary.config.uuid for cases when the later is null. https://bugzilla.redhat.com/show_bug.cgi?id=1569090 lib/VMwareWebService/VimPropMaps.rb | 1 + 1 file changed, 1 insertion(+) New commit detected on ManageIQ/manageiq-providers-vmware/master: https://github.com/ManageIQ/manageiq-providers-vmware/commit/f340d716369626e66c89ba68c534c8861748012a commit f340d716369626e66c89ba68c534c8861748012a Author: Adam Grare <agrare> AuthorDate: Wed May 2 12:57:20 2018 -0400 Commit: Adam Grare <agrare> CommitDate: Wed May 2 12:57:20 2018 -0400 Try to get VM UUID from summary.config or config In some instances a VM will have a nil summary.config.uuid but a valid config.uuid. Attempt to get the UUID from config.uuid as a fallback in the event that summary.config.uuid is nil. https://bugzilla.redhat.com/show_bug.cgi?id=1569090 .yamllint | 1 + app/models/manageiq/providers/vmware/infra_manager/refresh_parser.rb | 5 +- app/models/manageiq/providers/vmware/infra_manager/selector_spec.rb | 1 + manageiq-providers-vmware.gemspec | 2 +- spec/models/manageiq/providers/vmware/infra_manager/refresh_parser_spec.rb | 2 +- spec/models/manageiq/providers/vmware/infra_manager/refresher_spec.rb | 6 + spec/tools/vim_data/miq_vim_inventory/virtualMachinesByMor.yml | 911 +- 7 files changed, 520 insertions(+), 408 deletions(-) verified in 5.10.0.4 |