Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2144615 - Report Template output generation can take hours to complete if the template is only about printing different host facts
Summary: Report Template output generation can take hours to complete if the template ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Reporting
Version: 6.11.4
Hardware: All
OS: All
unspecified
medium
Target Milestone: 6.15.0
Assignee: Jeremy Lenz
QA Contact: Pablo Mendez Hernandez
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-21 21:19 UTC by Sayan Das
Modified: 2024-04-23 17:12 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-04-23 17:12:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 36715 0 Normal Closed Report Template output generation can take hours to complete if the template is only about printing different host facts... 2023-10-04 17:13:57 UTC
Red Hat Issue Tracker SAT-16865 0 None None None 2023-03-30 14:28:03 UTC
Red Hat Product Errata RHSA-2024:2010 0 None None None 2024-04-23 17:12:51 UTC

Description Sayan Das 2022-11-21 21:19:50 UTC
Description of problem:

Even if it is 300 hosts or 200 hosts, We can generate reports from Template like "Host - Registered Content Hosts" or "Host - Statuses" in a couple of minutes but if a custom template will be created that only parses five to ten host facts, then generation of the result for that report template could take 30minutes to 1 hour against the same set of hosts. 


Version-Release number of selected component (if applicable):

Satellite 6.11.4


How reproducible:

In the Customer's environment for 2000+ hosts with a good amount of sub-man & ansible facts in each


Steps to Reproduce:

NA ( Please check the reproducer details in the private comment )


Actual results:

 Host - Statuses                 -> Has the least amount of data to be collected --> Took 30 seconds 

 Host - Registered Content Hosts -> Has more data to collect for each system including applicable errata names and subscriptions --> Took 3 minutes and then gave the results.

SSS_Access_Control              -> Has only 8 ansible facts to collect from each host --> Took 28 minutes 


Expected results:

The "SSS_Access_Control" report template or any other "Host Facts" related templates, should not take that huge amount of time to get the resulting CSV data. It should be completed withing 1 - 5 minutes only ( if host count is 200 - 2000 ) .


Additional info:

NA

Comment 3 Sayan Das 2022-11-22 13:47:43 UTC
JFYI, I created this file in ruby to collect the exact same data as the SSS_Access_Control report template for a limited number or hosts, via rake console. 

# cat hostinfo.rb
conf.echo=false
require 'csv'

file = "/tmp/host_data.csv"
hosts = Host.where(:operatingsystem_id => 35).order(:id)

column_headers = ["CTD", "TimeStamp", "Model", "MAC", "IP_address", "Owner", "building_name", "hostname"]

CSV.open(file, 'w', write_headers: true, headers: column_headers) do |writer|
  hosts.each do |h|
    writer << [h.facts['ansible_local::gls_ansible_ctd_posture::_ctd'], h.facts['ansible_local::gls_ansible_timestemp::_timestemp'], h.facts['ansible_local::gls_ansible_model::_model'], h.facts['ansible_local::gls_ansible_mac::_mac_address'], h.facts['ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress'], h.facts['ansible_local::gls_ansible_owner::_owner'], h.facts['ansible_local::gls_ansible_lrt::_lab_lrt'], h.facts['ansible_local::gls_ansible_hostname::_gls_ansible_hostname']]
  end
end



And executed it i.e.


# time cat hostinfo.rb | foreman-rake console 
Loading production environment (Rails 6.0.4.7)
Switch to inspect mode.
conf.echo=false
require 'csv'

file = "/tmp/host_data.csv"
hosts = Host.where(:operatingsystem_id => 35).order(:id)

column_headers = ["CTD", "TimeStamp", "Model", "MAC", "IP_address", "Owner", "building_name", "hostname"]

CSV.open(file, 'w', write_headers: true, headers: column_headers) do |writer|
      hosts.each do |h|
          writer << [h.facts['ansible_local::gls_ansible_ctd_posture::_ctd'], h.facts['ansible_local::gls_ansible_timestemp::_timestemp'], h.facts['ansible_local::gls_ansible_model::_model'], h.facts['ansible_local::gls_ansible_mac::_mac_address'], h.facts['ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress'], h.facts['ansible_local::gls_ansible_owner::_owner'], h.facts['ansible_local::gls_ansible_lrt::_lab_lrt'], h.facts['ansible_local::gls_ansible_hostname::_gls_ansible_hostname']]
      end
  end

real	8m22.559s
user	7m11.535s
sys	0m5.617s


Around 403 entries are there in that file ( some being blanks ) i.e. 

# wc -l /tmp/host_data.csv
403 /tmp/host_data.csv


So yeah, It took nearly the same amount of time as the TC report template i.e. 8 - 10 minutes , on the exact same set of hosts.

Comment 4 Marek Hulan 2022-11-22 15:50:06 UTC
Hello,

I've added few optimizations to the template, take a look at the reproducer machine. The most important one is, when host.facts is called, it always fetches all facts from the DB. Given we access this several times for each host, I added a cache to  the variable for a particular host. It can still take quite some time, since it always loads all host facts (~281 per one host), instead of just those 8 we're interested in. We can't improve it in the template itself,  without adding more optimizations to load just specific facts. OTOH I'd personally discourage from using facts directly, if possible, native attributes should be used. E.g. instead of relying on custom ansible fact for IP, customers should use host.ip (which is based on all fact sources information like puppet and subscription-manager). It's also much faster than reading it from fact values storage (which means joining 2 SQL tables which tend to be very large).

With the optimization I can render the report in 15 minutes for the entire inventory (nearly 3k hosts), I'm sure it would be much faster on aforementioned 386 systems.

Please share such update (and primarily the new version of the template) with the customer and let us know whether more optimizations are necessary.

Comment 5 Marek Hulan 2022-11-22 15:54:42 UTC
Just found the original filter, it takes 95 seconds to generated for those 386 hosts on the reproducer.

Comment 6 Sayan Das 2022-11-22 16:02:15 UTC
Hello,

That filter was only used by me to test on a small number of hosts but I do agree It has improved quite a lot. 

I believe the customer should be able to fetch these 

        'MAC': host_facts['ansible_local::gls_ansible_mac::_mac_address'],
        'IP_address': host_facts['ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress'],
        'hostname': host_facts['ansible_local::gls_ansible_hostname::_gls_ansible_hostname']

using host.mac , host.ip and host.name instead 


But for the rest, he will still need to fetch the value from facts only. I will report back here with the response from the customer once i have shared these details with him. 


Thanks again for looking into the reproducer. 


-- Sayan

Comment 20 Jeremy Lenz 2023-09-01 21:35:23 UTC
Created redmine issue https://projects.theforeman.org/issues/36715 from this bug

Comment 22 Bryan Kearney 2023-09-02 00:02:27 UTC
Upstream bug assigned to jlenz

Comment 23 Bryan Kearney 2023-09-02 00:02:29 UTC
Upstream bug assigned to jlenz

Comment 30 Bryan Kearney 2023-09-27 16:02:13 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/36715 has been resolved.

Comment 31 Jeremy Lenz 2023-09-27 18:03:07 UTC
With this change in, you can now pass in fact names as arguments to host.facts. This will ensure that the SQL query only retrieves (and builds the Ruby hash for) the requested facts, and not all facts.

(please note the syntax has improved since the private https://bugzilla.redhat.com/show_bug.cgi?id=2144615#c26 and you no longer have to use a new loader or method name.)

So your template can be similar to this, and you should see greatly improved report generation time:

<%- load_hosts(search: input('Hosts filter').each_record do |host| -%>
<%-   fact_names = [
      'ansible_local::gls_ansible_ctd_posture::_ctd',
      'ansible_local::gls_ansible_timestemp::_timestemp',
      'ansible_local::gls_ansible_model::_model',
      'ansible_local::gls_ansible_mac::_mac_address',
      'ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress',
      'ansible_local::gls_ansible_owner::_owner',
      'ansible_local::gls_ansible_lrt::_lab_lrt',
      'ansible_local::gls_ansible_hostname::_gls_ansible_hostname'
    ]
    -%>
<%-   host_facts = host.facts(fact_names) -%>
<%-   report_row(
        'CTD': host_facts['ansible_local::gls_ansible_ctd_posture::_ctd'],
        'TimeStemp': host_facts['ansible_local::gls_ansible_timestemp::_timestemp'],
        'Model': host_facts['ansible_local::gls_ansible_model::_model'],
        'MAC': host_facts['ansible_local::gls_ansible_mac::_mac_address'],
        'IP_address': host_facts['ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress'],
        'Owner': host_facts['ansible_local::gls_ansible_owner::_owner'],
        'building_name': host_facts['ansible_local::gls_ansible_lrt::_lab_lrt'],
        'hostname': host_facts['ansible_local::gls_ansible_hostname::_gls_ansible_hostname']
      ) -%>
<%- end -%>

In addition, if you forget to pass in names and just continue using host.facts['fact_name'], this is now cached per-host which should help some as well.

Comment 33 Brad Buckingham 2023-10-30 11:29:29 UTC
Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set.

Comment 38 errata-xmlrpc 2024-04-23 17:12:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010


Note You need to log in before you can comment on or make changes to this bug.