Description of problem:
The migration could spent long time or stuck in calculating the total repository rpms to be migrated if the total is extremely large (e.g. 2 millions++).
As we can see below, the migration sutck in "Migrating content to Pulp 3" for 3 days without any progress
=========================================
2022-06-09 19:49:08 +1000: Migrating docker_tag content to Pulp 3 57057/62272
2022-06-09 19:49:18 +1000: Migrating docker_tag content to Pulp 3 60060/62272
2022-06-09 19:49:38 +1000: Initial Migration steps complete.
2022-06-09 19:49:48 +1000: Migrating content to Pulp 3 606012/3485647
...
2022-06-10 10:14:14 +1000: Migrating content to Pulp 3 606012/3485647
2022-06-10 10:14:24 +1000: Migrating content to Pulp 3 606012/3485647
...
...
2022-06-11 14:16:21 +1000: Migrating content to Pulp 3 606012/3485647
=========================================
Checking the postgres activities. We found that the "count()" query has been active since 3 days.
=========================================
# su - postgres -c "psql foreman -c \"select * from pg_stat_activity where datname = 'pulpcore' and state <> 'idle';\""
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin |
query | backend_type
---------+----------+-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+------------------------------------------------+-----------------
123456 | pulpcore | XXXXXX | XXXXXX | pulp | | XX.XX.XX.XX | | XXXXXX | 2022-06-08 11:57:11.798522+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186243+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | client backend
123456 | pulpcore | XXXXXX | XXXXXX | pulp | | | | | 2022-06-09 19:49:35.191817+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.196112+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | parallel worker
123456 | pulpcore | XXXXXX | XXXXXX | pulp | | | | | 2022-06-09 19:49:35.192686+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.196497+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | parallel worker
(3 rows)
=========================================
We can see 1 RQ worker is busy
=========================================
#ps -aux | grep '/usr/bin/rq
pulp XXXXX 64.1 0.3 1468496 836400 ? Sl Jun08 1854:22 /usr/bin/python3 /usr/bin/rq worker -w pulpcore.tasking.worker.PulpWorker -c pulpcore.rqconfig --disable-job-desc-logging
=========================================
Additional info:
I think the poor query could be caused by the large contents + the sub query (large Pulp2LazyCatalog).
# /usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py
if is_lazy_type:
# go through all of the content that haven't been migrated OR have been migrated
# but have new lazy catalog entries.
units_with_new_lces = Pulp2LazyCatalog.objects.filter(
is_migrated=False).values('pulp2_unit_id').distinct()
already_migrated = ~Q(pulp2content__pulp3_content=None)
no_new_lces = ~Q(pulp2content__pulp2_id__in=units_with_new_lces)
pulp_2to3_detail_qs = content_model.objects.exclude(already_migrated & no_new_lces) <=============== this query
else:
# go through all of the content that haven't been migrated
pulp_2to3_detail_qs = content_model.objects.filter(pulp2content__pulp3_content=None)
# order by pulp2_repo if it's set
if content_model.set_pulp2_repo:
pulp_2to3_detail_qs = pulp_2to3_detail_qs.order_by('repo_id')
with ProgressReport(
message='Migrating {} content to Pulp 3'.format(content_type),
code='migrating.{}.content'.format(self.migrator.pulp2_plugin),
total=pulp_2to3_detail_qs.count() <=================================== It appears to stuck here.
Description of problem: The migration could spent long time or stuck in calculating the total repository rpms to be migrated if the total is extremely large (e.g. 2 millions++). As we can see below, the migration sutck in "Migrating content to Pulp 3" for 3 days without any progress ========================================= 2022-06-09 19:49:08 +1000: Migrating docker_tag content to Pulp 3 57057/62272 2022-06-09 19:49:18 +1000: Migrating docker_tag content to Pulp 3 60060/62272 2022-06-09 19:49:38 +1000: Initial Migration steps complete. 2022-06-09 19:49:48 +1000: Migrating content to Pulp 3 606012/3485647 ... 2022-06-10 10:14:14 +1000: Migrating content to Pulp 3 606012/3485647 2022-06-10 10:14:24 +1000: Migrating content to Pulp 3 606012/3485647 ... ... 2022-06-11 14:16:21 +1000: Migrating content to Pulp 3 606012/3485647 ========================================= Checking the postgres activities. We found that the "count()" query has been active since 3 days. ========================================= # su - postgres -c "psql foreman -c \"select * from pg_stat_activity where datname = 'pulpcore' and state <> 'idle';\"" datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type ---------+----------+-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+------------------------------------------------+----------------- 123456 | pulpcore | XXXXXX | XXXXXX | pulp | | XX.XX.XX.XX | | XXXXXX | 2022-06-08 11:57:11.798522+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186243+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | client backend 123456 | pulpcore | XXXXXX | XXXXXX | pulp | | | | | 2022-06-09 19:49:35.191817+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.196112+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | parallel worker 123456 | pulpcore | XXXXXX | XXXXXX | pulp | | | | | 2022-06-09 19:49:35.192686+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.196497+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | parallel worker (3 rows) ========================================= We can see 1 RQ worker is busy ========================================= #ps -aux | grep '/usr/bin/rq pulp XXXXX 64.1 0.3 1468496 836400 ? Sl Jun08 1854:22 /usr/bin/python3 /usr/bin/rq worker -w pulpcore.tasking.worker.PulpWorker -c pulpcore.rqconfig --disable-job-desc-logging ========================================= Additional info: I think the poor query could be caused by the large contents + the sub query (large Pulp2LazyCatalog). # /usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py if is_lazy_type: # go through all of the content that haven't been migrated OR have been migrated # but have new lazy catalog entries. units_with_new_lces = Pulp2LazyCatalog.objects.filter( is_migrated=False).values('pulp2_unit_id').distinct() already_migrated = ~Q(pulp2content__pulp3_content=None) no_new_lces = ~Q(pulp2content__pulp2_id__in=units_with_new_lces) pulp_2to3_detail_qs = content_model.objects.exclude(already_migrated & no_new_lces) <=============== this query else: # go through all of the content that haven't been migrated pulp_2to3_detail_qs = content_model.objects.filter(pulp2content__pulp3_content=None) # order by pulp2_repo if it's set if content_model.set_pulp2_repo: pulp_2to3_detail_qs = pulp_2to3_detail_qs.order_by('repo_id') with ProgressReport( message='Migrating {} content to Pulp 3'.format(content_type), code='migrating.{}.content'.format(self.migrator.pulp2_plugin), total=pulp_2to3_detail_qs.count() <=================================== It appears to stuck here.