Bug 1733351

Summary: Child tenant users unable to load 'Compute > Infrastructure > Virtual Machines > VMs'
Product: Red Hat CloudForms Management Engine Reporter: Tuan <tuado>
Component: ApplianceAssignee: Joe Rafaniello <jrafanie>
Status: CLOSED CURRENTRELEASE QA Contact: Parthvi Vala <pvala>
Severity: high Docs Contact: Red Hat CloudForms Documentation <cloudforms-docs>
Priority: high    
Version: 5.10.6CC: abellott, akarol, bmidwood, bwoolf, dmetzger, gekis, gtanzill, hkataria, jocarter, kbrock, lavenel, mpovolny, mshriver, obarenbo, pvala, sigbjorn, simaishi
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.11.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: 5.11.0.18 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1738266 (view as bug list) Environment:
Last Closed: 2019-12-13 14:55:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: Bug
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1738266    

Comment 3 Joe Rafaniello 2019-07-26 14:44:15 UTC
Summarizing the previous private comment from last night:

Looking at the logs we're seeing occurrences where the UI worker servicing "/vm_infra/report_data" requests is running well over a minute and exceeding the memory threshold. The UI worker is eventually killed by the server 

[----] W, [2019-07-25T13:00:41.034781 #31695:71ebfc]  WARN -- : MIQ(MiqUiWorker::Runner#log_long_running_requests) Long running http(s) request: '/vm_infra/report_data' handled by #31695:3cf7580, running for 60.64 seconds
...
[----] W, [2019-07-25T13:00:57.110946 #13751:486f5c]  WARN -- : MIQ(MiqServer#validate_worker) Worker [MiqUiWorker] with ID: [XXX], PID: [XXX], GUID: [XXX] process memory usage [5233468000] exceeded limit [1073741824], requesting worker to exit


Last entry for pid in top log showing 1.9gb mem usage*****************************************

31695.pid:./evm_current_XXX_20190725_181201/log/top_output.log:31695 13751 root      21   1 2335996   1.9g   4436 S   4.6  3.0   1:33.36 puma 3.7.1 (tcp://127.0.0.1:3008) [MIQ: Web Server Worker]


1) We might want to see if we have this database or can get one that is similar to this for testing internally.

2) But, regardless, we should ask them do the following:

Setup an appliance (production or a new one) and run this again with rails debug logging, which will help us isolate the issue.

They would need to:
* ensure they have the user interface and web services roles enabled on this appliance
* they should then change "level_rails" from info to debug in the log section of the advanced settings
* once debug is save,  use tail -f log/production.log to ensure it starts logging DEBUG messages (it may take a minute or two)
* once it starts logging DEBUG messages, they should copy off the log/production.log and log/evm.log and then truncate these logs to minimize the size of the logs
* try to get the child tenant vms screen to recreate the reported issue

Provide the logs from this appliance after 5-10 minutes later

* Then, put the level_rails setting back to info, save, and verify there are no more DEBUG messages after several seconds.

Comment 5 Gregg Tanzillo 2019-07-26 21:38:12 UTC
Tuan,

Based on what we see in the appliance and PG logs, there is clearly a performance issue causing loading the VM list view to fail. We are trying to understand the specific cause however, with the limited diagnostic information we have we can only speculate. We would need the customers database (which you've already requested) to fully troubleshoot this issue.

We do have a pretty strong theory about a contributor to this performance problem. We think that the inclusion of last compliance status in the list view is causing a huge amount of data to come back from SQL. According to other things we've seen, we think they have a very large number of compliances rows.

We would expect that the number of compliance records between the regions where the view loads and doesn't to be significantly different.

If the customer needs a short term fix while we continue to troubleshoot, they can remove last compliance status from that view by editing a file on the appliance. These are the instructions to do that:

On both UI appliances, the file named VmOrTemplate.yaml needs to be edited to remove the offending column -

vmdb
cd `bundle show manageiq-ui-classic`/product/views
# make a backup copy of the file first
cp -p VmOrTemplate.yaml VmOrTemplate.yaml.org
vi VmOrTemplate.yaml
# remove all occurrences of "last_compliance_status" and its header "Compliant".

This is what the diff should look like after the change

diff VmOrTemplate.yaml{.org,} 
25d24
< - last_compliance_status
58d56
< - last_compliance_status
72d69
< - Compliant

Comment 7 Gregg Tanzillo 2019-07-31 22:10:12 UTC
Hi Tuan,

Here's the latest status on this issue -

We are in the process of creating a fix that will allow the VM views to render in an acceptable amount of time, without removing the "Compliant" column. We do not have an ETA at this time on when a hotfix with this change in it. 

In the meantime, we have a work around that removes the "Compliant" column from the affected views to alleviate the issue. We have successfully tested this with the customer's DB. Here are the steps necessary to apply the work around -

Note: the patch step below should quietly reject an already applied change.

yum install patch
cd ~

wget https://github.com/ManageIQ/manageiq-ui-classic/compare/hammer...jrafanie:remove_compliance_from_vm_template_accordions.patch

cd `bundle show manageiq-ui-classic`

patch -p1 -N < ~/hammer...jrafanie\:remove_compliance_from_vm_template_accordions.patch

systemctl restart evmserverd


***To reverse the patch, if necessary:
patch --reverse -p1 < ~/hammer...jrafanie\:remove_compliance_from_vm_template_accordions.patch

systemctl restart evmserverd

Comment 9 CFME Bot 2019-08-01 08:11:40 UTC
New commit detected on ManageIQ/manageiq-ui-classic/master:

https://github.com/ManageIQ/manageiq-ui-classic/commit/174f9beb37e3a17145312f089c4b79011f149858
commit 174f9beb37e3a17145312f089c4b79011f149858
Author:     Keenan Brock <keenan>
AuthorDate: Wed Jul 31 11:46:42 2019 -0400
Commit:     Keenan Brock <keenan>
CommitDate: Wed Jul 31 11:46:42 2019 -0400

    views: remove unneeded compliances join

    For some databases, this brings back way too many records.

    It is unnecessary. And if it were necessary, it will get automatically anyway

    See also https://github.com/ManageIQ/manageiq-ui-classic/pull/5283

    https://bugzilla.redhat.com/show_bug.cgi?id=1733351
 product/views/ManageIQ_Providers_CloudManager_Template-all_vms_and_templates.yaml | 1 -
 product/views/MiqTemplate.yaml | 1 -
 product/views/ProvisionCloudTemplates.yaml | 1 -
 product/views/Vm-all_vms.yaml | 1 -
 product/views/VmOrTemplate-all_archived.yaml | 1 -
 5 files changed, 5 deletions(-)

Comment 10 Keenan Brock 2019-08-01 19:08:34 UTC
To fix this, 

This is merged and needs to be backported:

https://github.com/ManageIQ/manageiq/pull/17475

These needs to be merged and backported:

https://github.com/ManageIQ/manageiq-ui-classic/pull/5919
https://github.com/ManageIQ/manageiq-ui-classic/pull/5926

Comment 11 CFME Bot 2019-08-01 22:51:39 UTC
New commit detected on ManageIQ/manageiq-ui-classic/ivanchuk:

https://github.com/ManageIQ/manageiq-ui-classic/commit/3e8082d63a70e9911b6c75b7551ae22cccd4fb2b
commit 3e8082d63a70e9911b6c75b7551ae22cccd4fb2b
Author:     Milan Zázrivec <mzazrivec>
AuthorDate: Thu Aug  1 04:09:29 2019 -0400
Commit:     Milan Zázrivec <mzazrivec>
CommitDate: Thu Aug  1 04:09:29 2019 -0400

    Merge pull request #5919 from kbrock/fixup_vm_includes

    views: remove unneeded compliances join
    (cherry picked from commit fc8c4911aa5c6a358566078ab26dc19f2d6d253b)

    https://bugzilla.redhat.com/show_bug.cgi?id=1733351

 product/views/ManageIQ_Providers_CloudManager_Template-all_vms_and_templates.yaml | 1 -
 product/views/MiqTemplate.yaml | 1 -
 product/views/ProvisionCloudTemplates.yaml | 1 -
 product/views/Vm-all_vms.yaml | 1 -
 product/views/VmOrTemplate-all_archived.yaml | 1 -
 5 files changed, 5 deletions(-)

Comment 12 CFME Bot 2019-08-02 16:12:12 UTC
New commit detected on ManageIQ/manageiq-ui-classic/ivanchuk:

https://github.com/ManageIQ/manageiq-ui-classic/commit/87ce65b6c7cb64ce576d0b3afaa521c32718fb67
commit 87ce65b6c7cb64ce576d0b3afaa521c32718fb67
Author:     Milan Zázrivec <mzazrivec>
AuthorDate: Fri Aug  2 05:06:09 2019 -0400
Commit:     Milan Zázrivec <mzazrivec>
CommitDate: Fri Aug  2 05:06:09 2019 -0400

    Merge pull request #5926 from kbrock/compliance_statuses

    Circling back and removing compliance join for all

    (cherry picked from commit f7dcdc6fd248f53ceb0780a76c9f8c24f9d99238)

    https://bugzilla.redhat.com/show_bug.cgi?id=1733351

 product/views/InstanceOrImage.yaml | 1 -
 product/views/ManageIQ_Providers_CloudManager_Template.yaml | 1 -
 product/views/ManageIQ_Providers_CloudManager_Vm-all_vms_and_templates.yaml | 1 -
 product/views/ManageIQ_Providers_CloudManager_Vm-vms.yaml | 1 -
 product/views/ManageIQ_Providers_CloudManager_Vm.yaml | 1 -
 product/views/ManageIQ_Providers_InfraManager_Template.yaml | 1 -
 product/views/ManageIQ_Providers_InfraManager_Vm.yaml | 1 -
 product/views/MiqTemplate-all_miq_templates.yaml | 1 -
 product/views/ProvisionInfraTemplates.yaml | 1 -
 product/views/Vm.yaml | 1 -
 product/views/VmOrTemplate-all_orphaned.yaml | 1 -
 product/views/VmOrTemplate-all_vms_and_templates.yaml | 1 -
 12 files changed, 12 deletions(-)

Comment 14 Sigbjorn Lie 2019-08-07 07:45:29 UTC
We see the same "Long running http request" issue when expanding a report tree. Will the last commit also fix the issue for reports? 

[----] W, [2019-08-07T09:36:42.443589 #53073:154b0a4]  WARN -- : MIQ(MiqUiWorker::Runner#log_long_running_requests) Long running http(s) request: '/report/tree_autoload' handled by #53073:537ff78, running for 113.72 seconds

Comment 17 Parthvi Vala 2019-09-03 06:31:32 UTC
The customer DB is huge and QE doesn't have enough resources to recreate the reproducer.
Hence this BZ will be marked as verified.

Comment 18 Parthvi Vala 2019-09-03 06:47:20 UTC
Moving it to verified.

Reference BZ- https://bugzilla.redhat.com/show_bug.cgi?id=1738266