Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
If the pulp 2 to 3 migration fails with an unexpected error like insufficient disk space. The waiting tasks which assigned to workers are not getting cancel and reserved resources are not getting release after restarting the pulpcore workers and pulpcore resource manager.
Re-run the prepare script gets "Only one migration plan can run or be reset at a time". Run prepare-abort doesn't solve the issue too.
$ satellite-maintain content prepare
Running Prepare content for Pulp 3
...
Prepare content for Pulp 3:
Checking for valid Katello configuraton.
Starting task.
2021-09-13 16:03:43 +1000: Migrating content to Pulp 3 2269/2269WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
#<Thread:0x0000000012f8d560@/opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:471 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
15: from /opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
14: from /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/connectors/database.rb:25:in `block in start'
13: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:411:in `listen'
12: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:478:in `check_database_errors'
11: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:412:in `block in listen'
10: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/database/connecting.rb:269:in `synchronize'
9: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/connection_pool/threaded.rb:91:in `hold'
8: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:442:in `block (2 levels) in listen'
7: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:442:in `ensure in block (2 levels) in listen'
6: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:134:in `execute'
5: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:110:in `check_disconnect_errors'
4: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:134:in `block in execute'
3: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:146:in `execute_query'
2: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/database/logging.rb:38:in `log_connection_yield'
1: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:146:in `block in execute_query'
/opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:146:in `async_exec': PG::UnableToSend: no connection to the server (Sequel::DatabaseDisconnectError)
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
rake aborted!
ActiveRecord::StatementInvalid: PG::UnableToSend: server closed the connection unexpectedly
This probably means the server terminated abnormally <================== Postgres server terminated unexpectedly due to out of disk space
before or while processing the request.
# Abort migration
$ satellite-maintain content prepare-abort
Running Abort all running Pulp 2 to Pulp 3 migration tasks
================================================================================
Abort all running Pulp 2 to Pulp 3 migration tasks:
Cancelled 1 tasks. [OK]
--------------------------------------------------------------------------------
# Run again
$ satellite-maintain content prepare
Running Prepare content for Pulp 3
...
Prepare content for Pulp 3:
Checking for valid Katello configuraton.
Starting task.
2021-09-13 16:32:57 +1000: Content migration starting. Migration failed, You will want to investigate: https://satellite.example.com/foreman_tasks/tasks/98c5e895-432d-4dee-806b-6074c71d1a15
rake aborted!
ForemanTasks::TaskError: Task 98c5e895-432d-4dee-806b-6074c71d1a15: Pulp2to3MigrationClient::ApiError: Error message: the server returns an error
HTTP status code: 400
Response headers: {"Date"=>"xxxxxxxxxxxxxx", "Server"=>"gunicorn/20.0.4", "Content-Type"=>"application/json", "Vary"=>"Accept,Cookie", "Allow"=>"POST, OPTIONS", "X-Frame-Options"=>"SAMEORIGIN", "Content-Length"=>"57", "Via"=>"1.1 satellite.example.com", "Connection"=>"close"}
Response body: ["Only one migration plan can run or be reset at a time"] <=============== Failed
# Waiting tasks not getting cancelled after restarting pulpcore workers
$ systemctl restart pulpcore-resource-manager pulpcore-worker@*
$ psql
pulpcore=# select pulp_id, name, state, worker_id from core_task where state = 'waiting' and worker_id is not null;
pulp_id | name | state | worker_id
--------------------------------------+----------------------------------------------------------+---------+--------------------------------------
ce06bbdb-79ff-4599-a654-bd479a9bf9d6 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | e149a672-6b18-46c8-b101-84b68e058929
1db98110-8c05-402e-8ed7-22a150bea1d1 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 496f7cae-0a35-4b13-acce-490d5ae4451e
e0f226c0-65bc-4c2b-84e3-096355bd9201 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | a4919996-c212-4e98-89a1-430325ffabe0
05546a0b-0995-4cab-a492-a67bad1055e9 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 6d4d278a-c761-40d9-8ba4-a4b4413ea739
3796774b-c3e0-4c9b-a5dc-ca67bbe3fc06 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | b86a1dc2-e573-4997-8a65-4030edaa1def
a7a987bf-bc97-409e-85a7-111d7fe5ebc9 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | da250740-f85d-4720-8756-8bdea8a53baa
c0328867-99bd-4008-948a-a193330036df | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 90dd9e6c-4336-4d81-baaf-584521da2ee7
149fc38d-0a9d-436b-b96b-d6b3f84122bc | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | a24fb0ce-bb2a-47d4-8038-422305b66810
985c1b42-26af-4465-b905-04fd8a56846c | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 5e484165-8485-4d01-8f36-835f661b4ad1
04b7db6c-0c73-4814-8565-2b5360385d0d | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 538340c7-3d0b-4a5e-bb37-96508f466b4e
7dd21f01-b9d9-4cce-85bd-ab1f61c9dbe3 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 2f37829d-6205-480f-9bd5-04aa0de587d0
52c4c1df-53f6-42da-8ddf-0b70aa0f509d | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | c565339f-22e5-4185-b8da-58077803805c
...
# Try to cancel the tasks manually
$ foreman-rake console
> api = ::Katello::Pulp3::Api::Core.new(SmartProxy.default_capsule)
> api.tasks_api.list(state: "waiting").results.each { |task| pp api.cancel_task(task.pulp_href) }
#<PulpcoreClient::TaskResponse:0x0000000010137300
@child_tasks=[],
@created_resources=[],
@name="pulp_2to3_migration.app.migration.complex_repo_migration",
@parent_task="/pulp/api/v3/tasks/56152493-2013-431e-8b61-29400ae73bfc/",
@progress_reports=[],
@pulp_created=Sun, 12 Sep 2021 23:18:20 +0000,
@pulp_href="/pulp/api/v3/tasks/884b97d4-a5d5-4798-8ab8-43ca50f26080/",
@reserved_resources_record=
["/pulp/api/v3/repositories/rpm/rpm/ba5fd31c-315d-4569-bc37-33ef600abc99/"],
@state="canceled",
@task_group="/pulp/api/v3/task-groups/c999d3f7-7d4e-4865-b651-52cd76a4a421/">
#<PulpcoreClient::TaskResponse:0x00000000100d3d50
@child_tasks=[],
@created_resources=[],
@name="pulp_2to3_migration.app.migration.complex_repo_migration",
@parent_task="/pulp/api/v3/tasks/56152493-2013-431e-8b61-29400ae73bfc/",
@progress_reports=[],
@pulp_created=Sun, 12 Sep 2021 23:18:20 +0000,
@pulp_href="/pulp/api/v3/tasks/f5605606-2ca9-40a0-9d82-77e71cc9bdea/",
@reserved_resources_record=
["/pulp/api/v3/repositories/rpm/rpm/aace68b8-e71b-4ef9-8a65-a5c12cb662b5/"],
@state="canceled",
@task_group="/pulp/api/v3/task-groups/c999d3f7-7d4e-4865-b651-52cd76a4a421/">
#<PulpcoreClient::TaskResponse:0x000000000ffa9bf0
@child_tasks=[],
@created_resources=[],
@name="pulp_2to3_migration.app.migration.complex_repo_migration",
@parent_task="/pulp/api/v3/tasks/56152493-2013-431e-8b61-29400ae73bfc/",
@progress_reports=[],
@pulp_created=Sun, 12 Sep 2021 23:18:20 +0000,
@pulp_href="/pulp/api/v3/tasks/b6743e03-a308-40f9-8d12-fb541b90ea07/",
@reserved_resources_record=
["/pulp/api/v3/repositories/rpm/rpm/3b584cf2-27b7-4f36-955e-77e0f5ea187d/"],
@state="canceled",
@task_group="/pulp/api/v3/task-groups/c999d3f7-7d4e-4865-b651-52cd76a4a421/">
#<PulpcoreClient::TaskResponse:0x000000000fed45e0
@child_tasks=[],
@created_resources=[],
@name="pulp_2to3_migration.app.migration.complex_repo_migration",
@parent_task="/pulp/api/v3/tasks/56152493-2013-431e-8b61-29400ae73bfc/",
@progress_reports=[],
@pulp_created=Sun, 12 Sep 2021 23:18:19 +0000,
@pulp_href="/pulp/api/v3/tasks/fe7aa645-6cdb-4b44-9319-f87856aacb7a/",
@reserved_resources_record=
["/pulp/api/v3/repositories/rpm/rpm/0c8dce00-d27c-4687-a9e5-485c59069502/"],
@state="canceled",
@task_group="/pulp/api/v3/task-groups/c999d3f7-7d4e-4865-b651-52cd76a4a421/">
...
> pp api.tasks_api.list(state: "waiting")
#<PulpcoreClient::InlineResponse20011:0x000000000e1dbd08 @count=0, @results=[]>
# This seems like another bug that cancelled tasks are not removed from the worker queue. Maybe this is ok. I guess pulp will check the state of the task and skip?
$ rq info -u redis://localhost:6379/8
18038.com |██ 2
15874.com |██ 2
18036.com |██ 2
18037.com |██ 2
# Needs to manually do this:
$ rq empty 18038.com 15874.com 18036.com 18037.com -u redis://localhost:6379/8
2 jobs removed from 18038.com queue
2 jobs removed from 15874.com queue
2 jobs removed from 18036.com queue
2 jobs removed from 18037.com queue
# Reserve resources are still not getting clean up after cancelling the tasks.
pulpcore=# select * from core_taskreservedresource;
pulp_id | pulp_created | pulp_last_updated | resource_id | task_id
--------------------------------------+-------------------------------+-------------------------------+--------------------------------------+--------------------------------------
8b84ca9c-27a2-4443-8b40-0c1b8f8dbb8b | 2021-09-13 05:57:18.30388+00 | 2021-09-13 05:57:18.30396+00 | 2bd534fb-ac19-4bba-90ae-8c5f80169176 | 6f950eb2-25cf-48e7-a6c4-db2636c3e01a
3a76f7fd-0ec3-4122-8621-9ae4ea199bc4 | 2021-09-13 06:02:08.757547+00 | 2021-09-13 06:02:08.75758+00 | 21e57dc1-e24a-4e2c-b540-260a6e5723f4 | 35b53f2e-24e6-41dc-b85e-093c4ee7f402
63b59527-9007-409a-a896-40766ce0966a | 2021-09-13 06:20:20.983527+00 | 2021-09-13 06:20:20.983553+00 | 8cf60e5d-01a1-403b-a4ce-974c9aa71c2b | 645a06ba-7bfe-45e1-9bad-e10ffb98cecd
c26bab99-1a32-421c-9a64-9a856f70039b | 2021-09-13 06:21:19.257066+00 | 2021-09-13 06:21:19.257095+00 | f7cbfa62-a099-4fa1-bb38-f92c17a31838 | 47316602-a27e-4075-b218-2b968e2ca6d9
e0205f4a-0e2e-4fd2-b73b-9e8ad3bcc23b | 2021-09-13 06:21:19.964208+00 | 2021-09-13 06:21:19.964245+00 | 6434ac91-5584-4b41-a31a-839084469b75 | e1bb63e7-78be-4a2d-bbda-ffe00547f73a
...
pulpcore=# select * from core_reservedresource;
pulp_id | pulp_created | pulp_last_updated | resource | worker_id
--------------------------------------+-------------------------------+-------------------------------+-------------------------------------------------------------------------+--------------------------------------
2bd534fb-ac19-4bba-90ae-8c5f80169176 | 2021-09-13 05:57:18.243676+00 | 2021-09-13 05:57:18.243747+00 | /pulp/api/v3/repositories/rpm/rpm/f503830e-0b76-472d-ade8-8f071ff35529/ | 38c7a2e6-d69c-4fbc-95f7-91d1c22624d1
21e57dc1-e24a-4e2c-b540-260a6e5723f4 | 2021-09-13 06:02:08.754167+00 | 2021-09-13 06:02:08.754223+00 | /pulp/api/v3/repositories/rpm/rpm/28883edf-837d-415e-916b-c58b7727f685/ | c88fe686-d725-4c81-b075-08c7b56ef95f
8cf60e5d-01a1-403b-a4ce-974c9aa71c2b | 2021-09-13 06:20:20.980849+00 | 2021-09-13 06:20:20.980889+00 | /pulp/api/v3/repositories/rpm/rpm/0cb39f5b-bfda-47ca-b5d7-8274db2db21f/ | be9ad06a-adb2-428b-8885-c5312d105354
f7cbfa62-a099-4fa1-bb38-f92c17a31838 | 2021-09-13 06:21:19.253752+00 | 2021-09-13 06:21:19.253804+00 | /pulp/api/v3/repositories/rpm/rpm/1180b4dc-e451-448e-8178-3365c41ef91b/ | 9b80c7f7-5a8e-446e-a863-b3bdddf7bf7a
6434ac91-5584-4b41-a31a-839084469b75 | 2021-09-13 06:21:19.950034+00 | 2021-09-13 06:21:19.950084+00 | /pulp/api/v3/repositories/rpm/rpm/920e944f-ec92-4a33-aede-6347aef6f6ad/ | 1d94152b-11e7-4cd1-b3c6-b80820ec4b83
...
# Need to kill zombie reserve resources manually.
$ PULP_SETTINGS=/etc/pulp/settings.py pulpcore-manager shell
pulp: pulp_2to3_migration.app.plugin:INFO: Plugin pulp_deb is not installed in pulp3 therefore it will not be migrated from pulp2
Python 3.6.8 (default, Aug 13 2020, 07:46:32)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from pulpcore.app.models import Worker, ReservedResource
>>>
>>> # Are there missing workers?
>>> missing_workers = Worker.objects.missing_workers()
>>> missing_workers.count()
75
>>>
>>> # Look for resources that are hold by workers not considered online
>>> online_workers = Worker.objects.online_workers()
>>> zombies = ReservedResource.objects.exclude(worker__in=online_workers)
>>> zombies.count()
5
>>>
>>> # Dangerous action: Delete the zombie resources
>>> zombies.delete()
(10, {'core.TaskReservedResource': 5, 'core.ReservedResource': 5})
# The migration can proceed after cancelling the tasks manually.
$ satellite-maintain content prepare
...
Prepare content for Pulp 3:
Checking for valid Katello configuraton.
Starting task.
2021-09-06 13:17:03 +1000: Initial Migration steps complete.
...
There are a couple of related issues, which may possibly solve this one; however, that needs to be determined.
Fixes are available, but not yet in an official Satellite 6.9 release.
Bug 1962462 - fixed and targeted for upcoming 6.9.6
Bug 1976311 - fixed an proposed for future 6.9.z (likely 6.9.7)
Comment 9pulp-infra@redhat.com
2021-10-31 10:08:43 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.
Comment 10pulp-infra@redhat.com
2021-10-31 10:08:44 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.
Comment 11pulp-infra@redhat.com
2021-11-30 16:12:07 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.
Comment 12pulp-infra@redhat.com
2021-11-30 17:15:47 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.
Comment 13pulp-infra@redhat.com
2021-11-30 18:09:50 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.
Comment 14pulp-infra@redhat.com
2021-11-30 19:07:10 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.
Steps to retest.
1. Sync some large repos
2. Run migration: satellite-maintain content prepare
3. On another terminal, run abort: satellite-maintain content prepare-abort
4. Check the pulp resources and waiting tasks to ensure that it's been cleared (from comment#0)
5. Rerun migration until finish
Expected result:
Migration should complete without issues
Actual:
Migration did complete without issue
Verified on 6.9.8_01
Note: Abort migration or any time the migration gets stopped, the resources should be cleared and that's what occurred when I checked.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Satellite 6.9.8 Async Bug Fix Update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:0320