Bug 1662972
| Summary: | Provision dialog for ec2 with public images is broken after selecting image - first step | ||
|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Matouš Mojžíš <mmojzis> |
| Component: | Appliance | Assignee: | Nick LaMuro <nlamuro> |
| Status: | CLOSED DEFERRED | QA Contact: | Sudhir Mallamprabhakara <smallamp> |
| Severity: | high | Docs Contact: | Red Hat CloudForms Documentation <cloudforms-docs> |
| Priority: | high | ||
| Version: | 5.10.0 | CC: | abellott, bmidwood, dmetzger, gtanzill, hkataria, lavenel, mfeifer, mpovolny, nlamuro, obarenbo, simaishi |
| Target Milestone: | GA | Flags: | mfeifer:
mirror+
|
| Target Release: | 5.11.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | ui:ec2 | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-06-23 18:41:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | CFME Core | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Matouš Mojžíš
2019-01-02 15:09:46 UTC
Please retry with latest build So have a replicated environment to work with, and wanted to give an update on what I have found out so far. Two corrections/notes regarding the original description: * The above error in the description is a red herring and unrelated, and what should be looked at is `production.log`, and not `evm.log` * The request being triggered is actually timing out, so the error in the UI of "Error requesting data from server" is just a message suggesting that After determining the bug was something else, I decided to tail the specific request that was being made in the last reproduction step by doing the following: ``` [root@dhcp-8-198-2 vmdb]# tail -f log/production.log | grep pre_prov [----] I, [2019-01-03T14:50:53.014387 #20149:486e828] INFO -- : Started POST "/vm_cloud/pre_prov?button=continue" for 127.0.0.1 at 2019-01-03 14:50:53 -0500 [----] I, [2019-01-03T14:50:53.806026 #20149:486e828] INFO -- : Processing by VmCloudController#pre_prov as JS ``` And noticed the PID stopped reporting after that. I then watched the PID in `top` and noticed that it spiked in memory quite quickly, up to 1.5g before it finally was died/killed off. I did a quick profile of a different pid making that same request using `rbspy`, and it seems to be making an active record query nearly the entire time, so I will be looking into that. * * * This seems to be an issue with a bad query, I suspect specifically with the AWS provider, but I will have to do more digging into the profile data to know more. I will update again when I find out more. Update: I have created three patches to address this issue: - https://github.com/ManageIQ/manageiq/pull/18353 - https://github.com/ManageIQ/manageiq-schema/pull/322 - https://github.com/ManageIQ/manageiq/pull/18354 The first are a collection of isolated fixes that is relatively noninvasive. This patch speeds up the request by about 50% and drop the memory in half. Unfortunately, this is down only to 1.5Gigs in memory on my tests, and that was still within the threshold of it being killed on the reference appliance. The next two patches are a bit more involved, as the first is a migration to add some caching columns, and the second implements the changes to get the maximum benefit from those new columns. This brings the request to under ten seconds, and the memory foot print is around 200MB extra from idle. While a much better scenario, the patches require some significant changes to take the full affect, and may or may not be desired. -Nick https://github.com/ManageIQ/manageiq/pull/18353 has been merged, waiting for backport. New commit detected on ManageIQ/manageiq/ivanchuk: https://github.com/ManageIQ/manageiq/commit/1a1a2147ce289159621d158e40ce610625cb6603 commit 1a1a2147ce289159621d158e40ce610625cb6603 Author: Greg McCullough <gmccullo> AuthorDate: Mon Aug 19 17:55:51 2019 -0400 Commit: Greg McCullough <gmccullo> CommitDate: Mon Aug 19 17:55:51 2019 -0400 Merge pull request #18353 from NickLaMuro/miq_provision_virt_workflow_better_allowed_templates Performance improvements to MiqProvisionVirtWorkflow#allowed_templates (cherry picked from commit e1ed394bf103e5dd921748b2ee5508e6bd1a454e) https://bugzilla.redhat.com/show_bug.cgi?id=1662972 app/models/miq_provision_virt_workflow.rb | 78 +- 1 file changed, 55 insertions(+), 23 deletions(-) Nick, I tried to verify this in 5.11.0.22 and I got same issue with selecting images except I don't get any error and there is no error in logs. I was able to select image once after not doing on the appliance for an hour. Alright, I will try and take a look at this later today, but a PR was created recently that addressed a bug I caused: https://github.com/ManageIQ/manageiq/pull/19237 So unsure if that was part of the issue you were seeing or not. * * * That said, the fix that did get merged for this fix only includes some of the fixes I have proposed. So as a result, this is still not a complete solution and has a decent amount of memory bloat. We most likely will be looking at trying to address this further in the next release. |