Bug 2097204

Summary: Migration stucks in calculating total of rpms to be migrated for 5 days
Product: Red Hat Satellite Reporter: Hao Chang Yu <hyu>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED DUPLICATE QA Contact: Lai <ltran>
Severity: medium Docs Contact:
Priority: high    
Version: 6.9.9CC: ahumbe, ajambhul, dkliban, ggainey, jpasqual, pdwyer, rchan, wpinheir
Target Milestone: UnspecifiedKeywords: Performance, PrioBumpGSS, Triaged, Upgrades
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-18 16:25:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hao Chang Yu 2022-06-15 06:46:49 UTC
Description of problem:
The migration could spent long time or stuck in calculating the total repository rpms to be migrated if the total is extremely large (e.g. 2 millions++).


As we can see below, the migration sutck in "Migrating content to Pulp 3" for 3 days without any progress
=========================================
2022-06-09 19:49:08 +1000: Migrating docker_tag content to Pulp 3 57057/62272
2022-06-09 19:49:18 +1000: Migrating docker_tag content to Pulp 3 60060/62272
2022-06-09 19:49:38 +1000: Initial Migration steps complete.
2022-06-09 19:49:48 +1000: Migrating content to Pulp 3 606012/3485647
...
2022-06-10 10:14:14 +1000: Migrating content to Pulp 3 606012/3485647
2022-06-10 10:14:24 +1000: Migrating content to Pulp 3 606012/3485647
...
...
2022-06-11 14:16:21 +1000: Migrating content to Pulp 3 606012/3485647
=========================================


Checking the postgres activities. We found that the "count()" query has been active since 3 days.
=========================================
# su - postgres -c "psql foreman -c \"select * from pg_stat_activity where datname = 'pulpcore' and state <> 'idle';\""
  datid  | datname  |  pid  | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type | wait_event | state  | backend_xid | backend_xmin |                                                                                                        
                                                                                                                               query                                                                                                                                                                                                                                       |  backend_type   
---------+----------+-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+------------------------------------------------+-----------------
 123456 | pulpcore | XXXXXX |  XXXXXX | pulp    |                  | XX.XX.XX.XX   |                 |       XXXXXX | 2022-06-08 11:57:11.798522+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186243+10 |                 |            | active |             |     XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | client backend

 123456 | pulpcore | XXXXXX |  XXXXXX | pulp    |                  |             |                 |             | 2022-06-09 19:49:35.191817+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.196112+10 |                 |            | active |             |     XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | parallel worker

 123456 | pulpcore | XXXXXX |  XXXXXX | pulp    |                  |             |                 |             | 2022-06-09 19:49:35.192686+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.196497+10 |                 |            | active |             |     XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | parallel worker
(3 rows)
=========================================


We can see 1 RQ worker is busy
=========================================
#ps -aux | grep '/usr/bin/rq
pulp     XXXXX 64.1  0.3 1468496 836400 ?      Sl   Jun08 1854:22 /usr/bin/python3 /usr/bin/rq worker -w pulpcore.tasking.worker.PulpWorker -c pulpcore.rqconfig --disable-job-desc-logging
=========================================



Additional info:
I think the poor query could be caused by the large contents + the sub query (large Pulp2LazyCatalog).


# /usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py

        if is_lazy_type:
            # go through all of the content that haven't been migrated OR have been migrated
            # but have new lazy catalog entries.
            units_with_new_lces = Pulp2LazyCatalog.objects.filter(
                is_migrated=False).values('pulp2_unit_id').distinct()
            already_migrated = ~Q(pulp2content__pulp3_content=None)
            no_new_lces = ~Q(pulp2content__pulp2_id__in=units_with_new_lces)
            pulp_2to3_detail_qs = content_model.objects.exclude(already_migrated & no_new_lces) <=============== this query
        else:
            # go through all of the content that haven't been migrated
            pulp_2to3_detail_qs = content_model.objects.filter(pulp2content__pulp3_content=None)

        # order by pulp2_repo if it's set
        if content_model.set_pulp2_repo:
            pulp_2to3_detail_qs = pulp_2to3_detail_qs.order_by('repo_id')

        with ProgressReport(
            message='Migrating {} content to Pulp 3'.format(content_type),
            code='migrating.{}.content'.format(self.migrator.pulp2_plugin),
            total=pulp_2to3_detail_qs.count()   <=================================== It appears to stuck here.

Comment 3 Hao Chang Yu 2022-07-06 07:27:00 UTC
Created an upstream issue https://github.com/pulp/pulp-2to3-migration/issues/568

Comment 5 Waldirio M Pinheiro 2022-08-18 16:25:19 UTC

*** This bug has been marked as a duplicate of bug 2038995 ***