Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2203077 - Slow generate applicability for Hosts with multiple modulestreams installed
Summary: Slow generate applicability for Hosts with multiple modulestreams installed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Hosts - Content
Version: 6.12.4
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: 6.15.0
Assignee: Jeremy Lenz
QA Contact: visawant
URL:
Whiteboard:
: 2226567 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-11 07:46 UTC by Pavel Moravec
Modified: 2024-04-23 17:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2250350 (view as bug list)
Environment:
Last Closed: 2024-04-23 17:11:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 36850 0 High New Slow generate applicability for Hosts with multiple modulestreams installed 2023-10-20 15:49:49 UTC
Red Hat Issue Tracker SAT-19395 0 None None None 2023-08-05 11:57:50 UTC
Red Hat Issue Tracker SAT-21377 0 None None None 2023-11-17 19:33:59 UTC
Red Hat Knowledge Base (Solution) 7033387 0 None None None 2023-09-14 14:40:38 UTC
Red Hat Product Errata RHSA-2024:2010 0 None None None 2024-04-23 17:11:14 UTC

Description Pavel Moravec 2023-05-11 07:46:19 UTC
Description of problem:
When having a Content Host with many dnf modulestreams installed, Generate Applicability task can be very slow (esp. on scaled Satellites with many CVs with many big repos, i.e. with huge tables like katello_repository_rpms and namely katello_installed_packages).

Postgres logs slow queries in form:

2023-05-08 15:45:00 CEST LOG:  duration: 796130.755 ms  execute <unnamed>: SELECT "katello_rpms"."id" FROM "katello_rpms" INNER JOIN katello_repository_rpms ON
                    katello_rpms.id = katello_repository_rpms.rpm_id INNER JOIN katello_installed_packages ON
                    katello_rpms.name = katello_installed_packages.name AND
                    katello_rpms.arch = katello_installed_packages.arch AND
                    katello_rpms.evr > katello_installed_packages.evr AND
                    katello_installed_packages.id in (SELECT DISTINCT ON (katello_installed_packages.name) katello_installed_packages.id FROM katello_installed_packages INNER JOIN katello_host_installed_packages ON katello_installed_packages.id = katello_host_installed_packages.installed_package_id WHERE katello_host_installed_packages.host_id = 11603 ORDER BY katello_installed_packages.name, katello_installed_packages.evr DESC) LEFT JOIN katello_module_stream_rpms ON
                    katello_rpms.id = katello_module_stream_rpms.rpm_id INNER JOIN katello_host_installed_packages ON
                    katello_installed_packages.id = katello_host_installed_packages.installed_package_id WHERE (katello_repository_rpms.repository_id in (21041,21040,21472,21076,34244)) AND (katello_host_installed_packages.host_id = 11603) AND ((katello_module_stream_rpms.module_stream_id IS NULL AND
                    katello_installed_packages.id NOT IN (SELECT "katello_installed_packages"."id" FROM "katello_installed_packages" WHERE "katello_installed_packages"."nvra" IN (......
.... here is a huge list of RPMs that *repeat* often...
......) AND "katello_installed_packages"."epoch" IN (.....
..... another huge list of numbers, with huge repetition of '0' or '1'))) OR

(now a similar huge query).

The "nvra IN AND epoch in" query can have 500k characters.

See the duration on a scaled Satellite, which makes timeouts to the calculation itself and also affects QoS of foreman/postgres/tasks, which are busy by that work (from multiple Content Hosts).


One simple improvement:

The query is called from https://github.com/Katello/katello/blob/master/app/services/katello/applicability/applicable_content_helper.rb#L82 :

      def fetch_rpm_content_ids
..
          where("(katello_module_stream_rpms.module_stream_id IS NULL AND
            katello_installed_packages.id NOT IN (:locked_modular_installed_packages)) OR
            (katello_module_stream_rpms.module_stream_id IN (:enabled_module_streams)
            AND katello_installed_packages.id IN (:locked_modular_installed_packages))",
            :enabled_module_streams => enabled_module_stream_ids,
            :locked_modular_installed_packages => locked_modular_installed_packages(enabled_module_stream_ids)).pluck(:id).uniq

The huge list with duplicates is populated from locked_modular_installed_packages method, which is:


      # Installed packages that are locked for the host due to enabled module stream membership
      def locked_modular_installed_packages(enabled_module_streams)
        rpms_in_enabled_module_streams = ::Katello::Rpm.
          joins("INNER JOIN katello_module_stream_rpms ON katello_rpms.id = katello_module_stream_rpms.rpm_id").
          where("katello_module_stream_rpms.module_stream_id IN (:enabled_module_streams)",
                :enabled_module_streams => enabled_module_streams).select(:nvra, :epoch)

        ::Katello::InstalledPackage.where(nvra: rpms_in_enabled_module_streams.map(&:nvra),
                                          epoch: rpms_in_enabled_module_streams.map(&:epoch)).select(:id)
      end

Here, rpms_in_enabled_module_streams (big in our scenario for a Host with many modulestreams installed) is a list of pairs ('foo-1.2.3', '0'), ('bar-3.2.1', '0'), .. where the packages can easily repeat (e.g. netcf-0.2.8-12.module+el8.1.0+4066+0f1aadab.x86_64 repeated 28 times, '0' epoch repeated 4885times).

And the

nvra: rpms_in_enabled_module_streams.map(&:nvra)

clause passes the non-uniq list to psql.

Adding there ".uniq" twice:

        ::Katello::InstalledPackage.where(nvra: rpms_in_enabled_module_streams.map(&:nvra).uniq,
                                          epoch: rpms_in_enabled_module_streams.map(&:epoch).uniq).select(:id)

shrinks the query substantially, while it has no impact to semantics(*) of the query.   (*) see Additional Details, I smell a bug here.


Repeated tests by adding this improvement shows 1/3 improvement in psql query duration (some small time is additionally spent on the "uniq" call but that is max few percents).


Version-Release number of selected component (if applicable):
Sat6.12 (also 6.11 or 6.13 affected)


How reproducible:
100% on a scaled Satellite


Steps to Reproduce:
1. Have many Content Hosts registered to Satellite, with many installed packages each
2. Have more Content Views with bigger repos published to multiple LEs
3. Have a Content Host (el8/9) with installed multiple modulestreams - the more the better
4. Optionally, modify in /var/lib/pgsql/data/postgresql.conf the "log_min_duration_statement 1000" to a lower value (depends how much you scale)
5. Recalculate applicability of the Host (hammer host errata recalculate --host-id 123)
 

Actual results:
Depending how much you scale, the "Bulk generate applicability for host" task can last very long (on unscaled Sat with just many modulestreams installed, 6 seconds instead of 0-1, customer has 30 minutes(!)).

postgres logs show big duration of the above queries


Expected results:
Reasonably lower times - e.g. just applying the ".uniq", I got 1/3 lower psql durations and similarly shorter tasks execution.


Additional info:
(*) I smell a bug in the code that can cause wrong applicability calculation. rpms_in_enabled_module_streams is a list of NEVRAs of packages enabled by a module stream. BUT the query:

        ::Katello::InstalledPackage.where(nvra: rpms_in_enabled_module_streams.map(&:nvra),
                                          epoch: rpms_in_enabled_module_streams.map(&:epoch)).select(:id)

applies the "where the NEVRA matches" wrongly / too vaguely. See example:

rpms_in_enabled_module_streams = (
    ('foo-1.2.3', '0'),
    ('bar-3.2.1', '1')
)

and ::Katello::InstalledPackage.select(:nvra, :epoch) = (
..
    ('foo-1.2.3', '0'),
    ('foo-1.2.3', '1'),
    ('bar-3.2.1', '0'),
    ('bar-3.2.1', '1'),
..
)

Then, the "where" clause will ask for "give me packages with nvra IN ('foo-1.2.3', 'bar-3.2.1') and with epoch IN ('0', '1')", whoch returns all four records above - BUT the intention is to return *just* the two from rpms_in_enabled_module_streams!

I dont know how theoretical or real counterexample this is, though. BUT if it could be real, it would be very tricky to reveal such a bug.

Comment 1 Pavel Moravec 2023-10-06 20:59:45 UTC
Re-reproducing this BZ now, I can achieve "less bad" performance figures (most probably b'cos I have a few Hosts only, so katello_installed_packages is small..?).

Even when I enabled all modules for RHEL8 BaseOS+AppStream via:

for i in $(dnf module list | grep "\[d\]" | awk '{ print $1 }' | sort -u); do dnf module install $i -y; done

the worst I got was:

2023-10-06 22:50:26 CEST LOG:  duration: 1416.644 ms  execute <unnamed>: SELECT "katello_rpms"."id" FROM "katello_rpms" INNER JOIN katello_repository_rpms ON

which is still order(s) of magnitude more than one would expect.

Applying the two ".uniq", times drop to 800-ish seconds.

Comment 2 Pavel Moravec 2023-10-07 19:07:29 UTC
OK, much improved reproducer - one needs to scale katello_installed_packages to 20k-ish records. Assuming you have 10k records already, let generate artificial 10k to simulate bigger Satellite managing many systems with many installed packages).

1) Generate 10k (or more) packages:

1a) Have hello-world.spec file:

Name:       hello-world-1
Version:    1
Release:    1
Summary:    Most simple RPM package
License:    FIXME

%description
This is my first RPM package, which does nothing.

%prep
# we have no source, so nothing here

%build
cat > %{name}.sh <<EOF
#!/usr/bin/bash
echo Hello world
EOF

%install
mkdir -p %{buildroot}/usr/bin/
install -m 755 %{name}.sh %{buildroot}/usr/bin/%{name}.sh

%files
/usr/bin/%{name}.sh

%changelog
# let's skip this for now


1b) Generate 10k packages with name hello-world-* :
for i in $(seq 1 10000); do sed -i "s/hello-world-.*$/hello-world-${i}/1" hello-world.spec; rpmbuild -ba hello-world.spec > /dev/null 2>&1; echo $i; done

1c) Install all of them (assuming you are on a Host registered to a Satellite):
dnf install /root/rpmbuild/RPMS/x86_64/hello-world-*rpm

2) Enable just 10 module streams on a RHEL8 system, like:
dnf module install subversion swig varnish virt squid scala ruby redis php postgresql

3) Assuming you have the postgres logs duration enabled, recalculate applicability of the RHEL8 Host:
hammer host errata recalculate --host <rhel8.fqdn>

4) Check postgres logs to see something like:
2023-10-07 20:53:55 CEST LOG:  duration: 3937.426 ms  execute <unnamed>: SELECT "katello_rpms"."id" FROM "katello_rpms" INNER JOIN katello_repository_rpms ON


Here, when I applied the simple "add .uniq twice" patch, the key SELECT duration dropped from average 4 seconds to 2.2seconds.

Comment 8 Jeremy Lenz 2023-10-20 15:24:29 UTC
*** Bug 2226567 has been marked as a duplicate of this bug. ***

Comment 9 Jeremy Lenz 2023-10-20 15:49:48 UTC
Created redmine issue https://projects.theforeman.org/issues/36850 from this bug

Comment 10 Bryan Kearney 2023-10-20 16:02:25 UTC
Upstream bug assigned to jlenz

Comment 11 Bryan Kearney 2023-10-20 16:02:27 UTC
Upstream bug assigned to jlenz

Comment 12 Brad Buckingham 2023-10-30 11:29:29 UTC
Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set.

Comment 18 errata-xmlrpc 2024-04-23 17:11:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010


Note You need to log in before you can comment on or make changes to this bug.