Bug 2144615
Summary: | Report Template output generation can take hours to complete if the template is only about printing different host facts | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Sayan Das <saydas> |
Component: | Reporting | Assignee: | Jeremy Lenz <jlenz> |
Status: | CLOSED ERRATA | QA Contact: | Pablo Mendez Hernandez <pmendezh> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.11.4 | CC: | ahumbe, jlenz, lvrtelov, mhulan, pmendezh, wpinheir |
Target Milestone: | 6.15.0 | Keywords: | Performance, Triaged |
Target Release: | Unused | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-04-23 17:12:49 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sayan Das
2022-11-21 21:19:50 UTC
JFYI, I created this file in ruby to collect the exact same data as the SSS_Access_Control report template for a limited number or hosts, via rake console. # cat hostinfo.rb conf.echo=false require 'csv' file = "/tmp/host_data.csv" hosts = Host.where(:operatingsystem_id => 35).order(:id) column_headers = ["CTD", "TimeStamp", "Model", "MAC", "IP_address", "Owner", "building_name", "hostname"] CSV.open(file, 'w', write_headers: true, headers: column_headers) do |writer| hosts.each do |h| writer << [h.facts['ansible_local::gls_ansible_ctd_posture::_ctd'], h.facts['ansible_local::gls_ansible_timestemp::_timestemp'], h.facts['ansible_local::gls_ansible_model::_model'], h.facts['ansible_local::gls_ansible_mac::_mac_address'], h.facts['ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress'], h.facts['ansible_local::gls_ansible_owner::_owner'], h.facts['ansible_local::gls_ansible_lrt::_lab_lrt'], h.facts['ansible_local::gls_ansible_hostname::_gls_ansible_hostname']] end end And executed it i.e. # time cat hostinfo.rb | foreman-rake console Loading production environment (Rails 6.0.4.7) Switch to inspect mode. conf.echo=false require 'csv' file = "/tmp/host_data.csv" hosts = Host.where(:operatingsystem_id => 35).order(:id) column_headers = ["CTD", "TimeStamp", "Model", "MAC", "IP_address", "Owner", "building_name", "hostname"] CSV.open(file, 'w', write_headers: true, headers: column_headers) do |writer| hosts.each do |h| writer << [h.facts['ansible_local::gls_ansible_ctd_posture::_ctd'], h.facts['ansible_local::gls_ansible_timestemp::_timestemp'], h.facts['ansible_local::gls_ansible_model::_model'], h.facts['ansible_local::gls_ansible_mac::_mac_address'], h.facts['ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress'], h.facts['ansible_local::gls_ansible_owner::_owner'], h.facts['ansible_local::gls_ansible_lrt::_lab_lrt'], h.facts['ansible_local::gls_ansible_hostname::_gls_ansible_hostname']] end end real 8m22.559s user 7m11.535s sys 0m5.617s Around 403 entries are there in that file ( some being blanks ) i.e. # wc -l /tmp/host_data.csv 403 /tmp/host_data.csv So yeah, It took nearly the same amount of time as the TC report template i.e. 8 - 10 minutes , on the exact same set of hosts. Hello, I've added few optimizations to the template, take a look at the reproducer machine. The most important one is, when host.facts is called, it always fetches all facts from the DB. Given we access this several times for each host, I added a cache to the variable for a particular host. It can still take quite some time, since it always loads all host facts (~281 per one host), instead of just those 8 we're interested in. We can't improve it in the template itself, without adding more optimizations to load just specific facts. OTOH I'd personally discourage from using facts directly, if possible, native attributes should be used. E.g. instead of relying on custom ansible fact for IP, customers should use host.ip (which is based on all fact sources information like puppet and subscription-manager). It's also much faster than reading it from fact values storage (which means joining 2 SQL tables which tend to be very large). With the optimization I can render the report in 15 minutes for the entire inventory (nearly 3k hosts), I'm sure it would be much faster on aforementioned 386 systems. Please share such update (and primarily the new version of the template) with the customer and let us know whether more optimizations are necessary. Just found the original filter, it takes 95 seconds to generated for those 386 hosts on the reproducer. Hello, That filter was only used by me to test on a small number of hosts but I do agree It has improved quite a lot. I believe the customer should be able to fetch these 'MAC': host_facts['ansible_local::gls_ansible_mac::_mac_address'], 'IP_address': host_facts['ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress'], 'hostname': host_facts['ansible_local::gls_ansible_hostname::_gls_ansible_hostname'] using host.mac , host.ip and host.name instead But for the rest, he will still need to fetch the value from facts only. I will report back here with the response from the customer once i have shared these details with him. Thanks again for looking into the reproducer. -- Sayan Created redmine issue https://projects.theforeman.org/issues/36715 from this bug Upstream bug assigned to jlenz Upstream bug assigned to jlenz Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/36715 has been resolved. With this change in, you can now pass in fact names as arguments to host.facts. This will ensure that the SQL query only retrieves (and builds the Ruby hash for) the requested facts, and not all facts. (please note the syntax has improved since the private https://bugzilla.redhat.com/show_bug.cgi?id=2144615#c26 and you no longer have to use a new loader or method name.) So your template can be similar to this, and you should see greatly improved report generation time: <%- load_hosts(search: input('Hosts filter').each_record do |host| -%> <%- fact_names = [ 'ansible_local::gls_ansible_ctd_posture::_ctd', 'ansible_local::gls_ansible_timestemp::_timestemp', 'ansible_local::gls_ansible_model::_model', 'ansible_local::gls_ansible_mac::_mac_address', 'ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress', 'ansible_local::gls_ansible_owner::_owner', 'ansible_local::gls_ansible_lrt::_lab_lrt', 'ansible_local::gls_ansible_hostname::_gls_ansible_hostname' ] -%> <%- host_facts = host.facts(fact_names) -%> <%- report_row( 'CTD': host_facts['ansible_local::gls_ansible_ctd_posture::_ctd'], 'TimeStemp': host_facts['ansible_local::gls_ansible_timestemp::_timestemp'], 'Model': host_facts['ansible_local::gls_ansible_model::_model'], 'MAC': host_facts['ansible_local::gls_ansible_mac::_mac_address'], 'IP_address': host_facts['ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress'], 'Owner': host_facts['ansible_local::gls_ansible_owner::_owner'], 'building_name': host_facts['ansible_local::gls_ansible_lrt::_lab_lrt'], 'hostname': host_facts['ansible_local::gls_ansible_hostname::_gls_ansible_hostname'] ) -%> <%- end -%> In addition, if you forget to pass in names and just continue using host.facts['fact_name'], this is now cached per-host which should help some as well. Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2010 |