Bug 1566452 - Saved Report "2018-04-09 11:18:31 +03" not found, Schedule may have failed
Summary: Saved Report "2018-04-09 11:18:31 +03" not found, Schedule may have failed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Reporting
Version: 5.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.10.0
Assignee: Libor Pichler
QA Contact: Tasos Papaioannou
URL:
Whiteboard:
Depends On:
Blocks: 1591939
TreeView+ depends on / blocked
 
Reported: 2018-04-12 10:53 UTC by Niladri Roy
Modified: 2021-06-10 15:47 UTC (History)
12 users (show)

Fixed In Version: 5.10.0.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1591939 (view as bug list)
Environment:
Last Closed: 2019-02-11 13:59:33 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Niladri Roy 2018-04-12 10:53:28 UTC
Description of problem:
Unable to generated chargeback report, takes huge time to run, upon completion no data is shown, the report shows blank name under the "Saved Reports" accordion

Version-Release number of selected component (if applicable):
5.7.3.2

How reproducible:
Everytime at customer's environment

Steps to Reproduce:
1. Try to run the report named "Costs2"
2.
3.

Actual results:
Report generation takes too much time and doesn't succeed

Expected results:
Chargeback report should successfully generate

Additional info:
Saw a alerts for worker exceeding memory limit. Not sure if this is connected with this issue.

Comment 5 Dave Johnson 2018-04-12 13:39:03 UTC
Nandini, please retest this on 5.8 and 5.9.  We are thinking this 5.7 issue is already fixed in later versions but not sure and need to check that.  THX!

Comment 6 Marianne Feifer 2018-04-17 20:33:34 UTC
Hey, Nandini!  Have you had a chance to retest this?

Comment 9 Joe Rafaniello 2018-04-24 14:57:49 UTC
Hi Niladri,

From evm.log, it's exceeding memory limits when generating the chargeback report.


[----] I, [2018-04-09T11:18:50.324664 #30845:e2f130]  INFO -- : MIQ(MiqQueue#deliver) Message id: [11000125738601], Delivering...

[----] I, [2018-04-09T11:18:50.331890 #30845:e2f130]  INFO -- : MIQ(MiqTask#update_status) Task: [11000000264146] [Active] [Ok] [Generating report]

[----] I, [2018-04-09T11:18:50.380152 #30845:e2f130]  INFO -- : MIQ(ChargebackVm.build_results_for_report_chargeback) Calculating chargeback costs...

[----] W, [2018-04-09T11:19:28.439793 #1813:e2f130]  WARN -- : MIQ(MiqServer#validate_worker) Worker [MiqReportingWorker] with ID: [11000003738713], PID: [30845], GUID: [88fa9b44-278b-11e8-831b-001a4af7c509] process memory usage [731592000] exceeded limit [524288000], requesting worker to exit


[----] W, [2018-04-09T11:29:14.538366 #1813:e2f130]  WARN -- : MIQ(MiqServer#worker_not_responding) Worker [MiqReportingWorker] with ID: [11000003738713], PID: [30845], GUID: [88fa9b44-278b-11e8-831b-001a4af7c509] being killed because it is not responding

[----] I, [2018-04-09T11:29:14.581721 #1813:e2f130]  INFO -- : MIQ(MiqReportingWorker#kill) Killing worker: ID [11000003738713], PID [30845], GUID [88fa9b44-278b-11e8-831b-001a4af7c509], status [stopping]


Top output log is showing this reporting process go from 589 MB RSS to 5 GB before it's killed.  I'll look at the recreated appliance but there's something wrong with this report.  Has this report worked previously?  It seems like the report is bringing back way too much data.


30845  1813 root      27   7  948204 588500   3196 S   0.1  2.4  71:08.15 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7  948204 588500   3196 S   0.1  2.4  71:08.21 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7  948204 588500   3196 S   0.1  2.4  71:08.25 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7  948204 588500   3196 S   0.1  2.4  71:08.30 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7  948204 588500   3196 S   0.1  2.4  71:08.36 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7  948204 588500   3196 S   0.1  2.4  71:08.40 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 1879448 1.448g   3232 R  72.9  6.2  71:52.17 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 2493436 2.032g   3232 S  42.7  8.7  72:17.80 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 3089172 2.593g   3232 S   3.6 11.1  72:19.98 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 3725148 3.204g   3232 S   3.6 13.7  72:22.15 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 5110824 4.529g   3232 R  97.3 19.4  73:20.58 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 5110824 4.529g   3232 R  99.7 19.4  74:20.46 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 5110956 4.529g   3232 R  99.8 19.4  75:20.39 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 5419908 4.821g   3232 R  99.8 20.6  76:20.37 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 5560104 4.953g   3232 R  99.7 21.2  77:20.29 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
30845  1813 root      27   7 5632752 5.021g   3232 R  99.4 21.5  78:20.00 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting

Comment 10 Joe Rafaniello 2018-04-24 15:56:37 UTC
Libor, can you take a look?  Clicking preview on the costs2 report on the recreated appliance in comment 8 causes the priority worker to jump to 5 GB and 100% cpu.

Comment 12 Joe Rafaniello 2018-04-30 16:07:13 UTC
Hi Niladri,

I don't think it's a good idea to increase the reporting worker memory threshold to 6 or 8 GB in production.  We are investigating why it takes so long and consumes so much memory.

Do you know how many more VMs have been added since this Infrastructure costs report was working?  From what I remember, I believe there were 1,000 vms and templates now.  This should not be a problem, we have much larger installations.  I'm guessing it's something we're doing in that report that we need to improve.

Thanks,
Joe

Comment 14 Joe Rafaniello 2018-04-30 17:57:02 UTC
Libor, it depends what versions have this problem.  Normally, you'd try to fix it on master first and backport it to gaprindashvili/5.9, fine/5.8, and finally euwe/5.7 but since chargeback was changed significantly from gaprindashvili/master and fine (I believe), we should figure out how to fix it on both situations.

Comment 18 Joe Rafaniello 2018-05-02 18:05:17 UTC
I don't think ntp plays a role in that the costs2 takes 5 GB of memory. Out of sync appliance time (or jumping time) could perhaps time out the worker incorrectly but I can't see how time can matter for the report consuming too much memory.

Note, all servers need to have ntp synchronized to the same ntp servers for many other reasons including our appliance failover of roles... but not for this report.

Comment 19 Joe Rafaniello 2018-05-02 18:08:13 UTC
Libor, please update this bug with any updates you have found for this report being slow and consuming lots of memory.

Thanks!

Comment 36 Tasos Papaioannou 2018-07-09 19:25:29 UTC
Verified on 5.10.0.3.


Note You need to log in before you can comment on or make changes to this bug.