Description of problem: Unable to generated chargeback report, takes huge time to run, upon completion no data is shown, the report shows blank name under the "Saved Reports" accordion Version-Release number of selected component (if applicable): 5.7.3.2 How reproducible: Everytime at customer's environment Steps to Reproduce: 1. Try to run the report named "Costs2" 2. 3. Actual results: Report generation takes too much time and doesn't succeed Expected results: Chargeback report should successfully generate Additional info: Saw a alerts for worker exceeding memory limit. Not sure if this is connected with this issue.
Nandini, please retest this on 5.8 and 5.9. We are thinking this 5.7 issue is already fixed in later versions but not sure and need to check that. THX!
Hey, Nandini! Have you had a chance to retest this?
Hi Niladri, From evm.log, it's exceeding memory limits when generating the chargeback report. [----] I, [2018-04-09T11:18:50.324664 #30845:e2f130] INFO -- : MIQ(MiqQueue#deliver) Message id: [11000125738601], Delivering... [----] I, [2018-04-09T11:18:50.331890 #30845:e2f130] INFO -- : MIQ(MiqTask#update_status) Task: [11000000264146] [Active] [Ok] [Generating report] [----] I, [2018-04-09T11:18:50.380152 #30845:e2f130] INFO -- : MIQ(ChargebackVm.build_results_for_report_chargeback) Calculating chargeback costs... [----] W, [2018-04-09T11:19:28.439793 #1813:e2f130] WARN -- : MIQ(MiqServer#validate_worker) Worker [MiqReportingWorker] with ID: [11000003738713], PID: [30845], GUID: [88fa9b44-278b-11e8-831b-001a4af7c509] process memory usage [731592000] exceeded limit [524288000], requesting worker to exit [----] W, [2018-04-09T11:29:14.538366 #1813:e2f130] WARN -- : MIQ(MiqServer#worker_not_responding) Worker [MiqReportingWorker] with ID: [11000003738713], PID: [30845], GUID: [88fa9b44-278b-11e8-831b-001a4af7c509] being killed because it is not responding [----] I, [2018-04-09T11:29:14.581721 #1813:e2f130] INFO -- : MIQ(MiqReportingWorker#kill) Killing worker: ID [11000003738713], PID [30845], GUID [88fa9b44-278b-11e8-831b-001a4af7c509], status [stopping] Top output log is showing this reporting process go from 589 MB RSS to 5 GB before it's killed. I'll look at the recreated appliance but there's something wrong with this report. Has this report worked previously? It seems like the report is bringing back way too much data. 30845 1813 root 27 7 948204 588500 3196 S 0.1 2.4 71:08.15 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 948204 588500 3196 S 0.1 2.4 71:08.21 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 948204 588500 3196 S 0.1 2.4 71:08.25 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 948204 588500 3196 S 0.1 2.4 71:08.30 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 948204 588500 3196 S 0.1 2.4 71:08.36 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 948204 588500 3196 S 0.1 2.4 71:08.40 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 1879448 1.448g 3232 R 72.9 6.2 71:52.17 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 2493436 2.032g 3232 S 42.7 8.7 72:17.80 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 3089172 2.593g 3232 S 3.6 11.1 72:19.98 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 3725148 3.204g 3232 S 3.6 13.7 72:22.15 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 5110824 4.529g 3232 R 97.3 19.4 73:20.58 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 5110824 4.529g 3232 R 99.7 19.4 74:20.46 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 5110956 4.529g 3232 R 99.8 19.4 75:20.39 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 5419908 4.821g 3232 R 99.8 20.6 76:20.37 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 5560104 4.953g 3232 R 99.7 21.2 77:20.29 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting 30845 1813 root 27 7 5632752 5.021g 3232 R 99.4 21.5 78:20.00 MIQ: MiqReportingWorker id: 11000003738713, queue: reporting
Libor, can you take a look? Clicking preview on the costs2 report on the recreated appliance in comment 8 causes the priority worker to jump to 5 GB and 100% cpu.
Hi Niladri, I don't think it's a good idea to increase the reporting worker memory threshold to 6 or 8 GB in production. We are investigating why it takes so long and consumes so much memory. Do you know how many more VMs have been added since this Infrastructure costs report was working? From what I remember, I believe there were 1,000 vms and templates now. This should not be a problem, we have much larger installations. I'm guessing it's something we're doing in that report that we need to improve. Thanks, Joe
Libor, it depends what versions have this problem. Normally, you'd try to fix it on master first and backport it to gaprindashvili/5.9, fine/5.8, and finally euwe/5.7 but since chargeback was changed significantly from gaprindashvili/master and fine (I believe), we should figure out how to fix it on both situations.
I don't think ntp plays a role in that the costs2 takes 5 GB of memory. Out of sync appliance time (or jumping time) could perhaps time out the worker incorrectly but I can't see how time can matter for the report consuming too much memory. Note, all servers need to have ntp synchronized to the same ntp servers for many other reasons including our appliance failover of roles... but not for this report.
Libor, please update this bug with any updates you have found for this report being slow and consuming lots of memory. Thanks!
Verified on 5.10.0.3.