Description of problem: If the pulp 2 to 3 migration fails with an unexpected error like insufficient disk space. The waiting tasks which assigned to workers are not getting cancel and reserved resources are not getting release after restarting the pulpcore workers and pulpcore resource manager. Re-run the prepare script gets "Only one migration plan can run or be reset at a time". Run prepare-abort doesn't solve the issue too. $ satellite-maintain content prepare Running Prepare content for Pulp 3 ... Prepare content for Pulp 3: Checking for valid Katello configuraton. Starting task. 2021-09-13 16:03:43 +1000: Migrating content to Pulp 3 2269/2269WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. #<Thread:0x0000000012f8d560@/opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:471 run> terminated with exception (report_on_exception is true): Traceback (most recent call last): 15: from /opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context' 14: from /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/connectors/database.rb:25:in `block in start' 13: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:411:in `listen' 12: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:478:in `check_database_errors' 11: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:412:in `block in listen' 10: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/database/connecting.rb:269:in `synchronize' 9: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/connection_pool/threaded.rb:91:in `hold' 8: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:442:in `block (2 levels) in listen' 7: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:442:in `ensure in block (2 levels) in listen' 6: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:134:in `execute' 5: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:110:in `check_disconnect_errors' 4: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:134:in `block in execute' 3: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:146:in `execute_query' 2: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/database/logging.rb:38:in `log_connection_yield' 1: from /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:146:in `block in execute_query' /opt/theforeman/tfm/root/usr/share/gems/gems/sequel-5.7.1/lib/sequel/adapters/postgres.rb:146:in `async_exec': PG::UnableToSend: no connection to the server (Sequel::DatabaseDisconnectError) WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. rake aborted! ActiveRecord::StatementInvalid: PG::UnableToSend: server closed the connection unexpectedly This probably means the server terminated abnormally <================== Postgres server terminated unexpectedly due to out of disk space before or while processing the request. # Abort migration $ satellite-maintain content prepare-abort Running Abort all running Pulp 2 to Pulp 3 migration tasks ================================================================================ Abort all running Pulp 2 to Pulp 3 migration tasks: Cancelled 1 tasks. [OK] -------------------------------------------------------------------------------- # Run again $ satellite-maintain content prepare Running Prepare content for Pulp 3 ... Prepare content for Pulp 3: Checking for valid Katello configuraton. Starting task. 2021-09-13 16:32:57 +1000: Content migration starting. Migration failed, You will want to investigate: https://satellite.example.com/foreman_tasks/tasks/98c5e895-432d-4dee-806b-6074c71d1a15 rake aborted! ForemanTasks::TaskError: Task 98c5e895-432d-4dee-806b-6074c71d1a15: Pulp2to3MigrationClient::ApiError: Error message: the server returns an error HTTP status code: 400 Response headers: {"Date"=>"xxxxxxxxxxxxxx", "Server"=>"gunicorn/20.0.4", "Content-Type"=>"application/json", "Vary"=>"Accept,Cookie", "Allow"=>"POST, OPTIONS", "X-Frame-Options"=>"SAMEORIGIN", "Content-Length"=>"57", "Via"=>"1.1 satellite.example.com", "Connection"=>"close"} Response body: ["Only one migration plan can run or be reset at a time"] <=============== Failed # Waiting tasks not getting cancelled after restarting pulpcore workers $ systemctl restart pulpcore-resource-manager pulpcore-worker@* $ psql pulpcore=# select pulp_id, name, state, worker_id from core_task where state = 'waiting' and worker_id is not null; pulp_id | name | state | worker_id --------------------------------------+----------------------------------------------------------+---------+-------------------------------------- ce06bbdb-79ff-4599-a654-bd479a9bf9d6 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | e149a672-6b18-46c8-b101-84b68e058929 1db98110-8c05-402e-8ed7-22a150bea1d1 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 496f7cae-0a35-4b13-acce-490d5ae4451e e0f226c0-65bc-4c2b-84e3-096355bd9201 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | a4919996-c212-4e98-89a1-430325ffabe0 05546a0b-0995-4cab-a492-a67bad1055e9 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 6d4d278a-c761-40d9-8ba4-a4b4413ea739 3796774b-c3e0-4c9b-a5dc-ca67bbe3fc06 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | b86a1dc2-e573-4997-8a65-4030edaa1def a7a987bf-bc97-409e-85a7-111d7fe5ebc9 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | da250740-f85d-4720-8756-8bdea8a53baa c0328867-99bd-4008-948a-a193330036df | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 90dd9e6c-4336-4d81-baaf-584521da2ee7 149fc38d-0a9d-436b-b96b-d6b3f84122bc | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | a24fb0ce-bb2a-47d4-8038-422305b66810 985c1b42-26af-4465-b905-04fd8a56846c | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 5e484165-8485-4d01-8f36-835f661b4ad1 04b7db6c-0c73-4814-8565-2b5360385d0d | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 538340c7-3d0b-4a5e-bb37-96508f466b4e 7dd21f01-b9d9-4cce-85bd-ab1f61c9dbe3 | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | 2f37829d-6205-480f-9bd5-04aa0de587d0 52c4c1df-53f6-42da-8ddf-0b70aa0f509d | pulp_2to3_migration.app.migration.complex_repo_migration | waiting | c565339f-22e5-4185-b8da-58077803805c ... # Try to cancel the tasks manually $ foreman-rake console > api = ::Katello::Pulp3::Api::Core.new(SmartProxy.default_capsule) > api.tasks_api.list(state: "waiting").results.each { |task| pp api.cancel_task(task.pulp_href) } #<PulpcoreClient::TaskResponse:0x0000000010137300 @child_tasks=[], @created_resources=[], @name="pulp_2to3_migration.app.migration.complex_repo_migration", @parent_task="/pulp/api/v3/tasks/56152493-2013-431e-8b61-29400ae73bfc/", @progress_reports=[], @pulp_created=Sun, 12 Sep 2021 23:18:20 +0000, @pulp_href="/pulp/api/v3/tasks/884b97d4-a5d5-4798-8ab8-43ca50f26080/", @reserved_resources_record= ["/pulp/api/v3/repositories/rpm/rpm/ba5fd31c-315d-4569-bc37-33ef600abc99/"], @state="canceled", @task_group="/pulp/api/v3/task-groups/c999d3f7-7d4e-4865-b651-52cd76a4a421/"> #<PulpcoreClient::TaskResponse:0x00000000100d3d50 @child_tasks=[], @created_resources=[], @name="pulp_2to3_migration.app.migration.complex_repo_migration", @parent_task="/pulp/api/v3/tasks/56152493-2013-431e-8b61-29400ae73bfc/", @progress_reports=[], @pulp_created=Sun, 12 Sep 2021 23:18:20 +0000, @pulp_href="/pulp/api/v3/tasks/f5605606-2ca9-40a0-9d82-77e71cc9bdea/", @reserved_resources_record= ["/pulp/api/v3/repositories/rpm/rpm/aace68b8-e71b-4ef9-8a65-a5c12cb662b5/"], @state="canceled", @task_group="/pulp/api/v3/task-groups/c999d3f7-7d4e-4865-b651-52cd76a4a421/"> #<PulpcoreClient::TaskResponse:0x000000000ffa9bf0 @child_tasks=[], @created_resources=[], @name="pulp_2to3_migration.app.migration.complex_repo_migration", @parent_task="/pulp/api/v3/tasks/56152493-2013-431e-8b61-29400ae73bfc/", @progress_reports=[], @pulp_created=Sun, 12 Sep 2021 23:18:20 +0000, @pulp_href="/pulp/api/v3/tasks/b6743e03-a308-40f9-8d12-fb541b90ea07/", @reserved_resources_record= ["/pulp/api/v3/repositories/rpm/rpm/3b584cf2-27b7-4f36-955e-77e0f5ea187d/"], @state="canceled", @task_group="/pulp/api/v3/task-groups/c999d3f7-7d4e-4865-b651-52cd76a4a421/"> #<PulpcoreClient::TaskResponse:0x000000000fed45e0 @child_tasks=[], @created_resources=[], @name="pulp_2to3_migration.app.migration.complex_repo_migration", @parent_task="/pulp/api/v3/tasks/56152493-2013-431e-8b61-29400ae73bfc/", @progress_reports=[], @pulp_created=Sun, 12 Sep 2021 23:18:19 +0000, @pulp_href="/pulp/api/v3/tasks/fe7aa645-6cdb-4b44-9319-f87856aacb7a/", @reserved_resources_record= ["/pulp/api/v3/repositories/rpm/rpm/0c8dce00-d27c-4687-a9e5-485c59069502/"], @state="canceled", @task_group="/pulp/api/v3/task-groups/c999d3f7-7d4e-4865-b651-52cd76a4a421/"> ... > pp api.tasks_api.list(state: "waiting") #<PulpcoreClient::InlineResponse20011:0x000000000e1dbd08 @count=0, @results=[]> # This seems like another bug that cancelled tasks are not removed from the worker queue. Maybe this is ok. I guess pulp will check the state of the task and skip? $ rq info -u redis://localhost:6379/8 18038.com |██ 2 15874.com |██ 2 18036.com |██ 2 18037.com |██ 2 # Needs to manually do this: $ rq empty 18038.com 15874.com 18036.com 18037.com -u redis://localhost:6379/8 2 jobs removed from 18038.com queue 2 jobs removed from 15874.com queue 2 jobs removed from 18036.com queue 2 jobs removed from 18037.com queue # Reserve resources are still not getting clean up after cancelling the tasks. pulpcore=# select * from core_taskreservedresource; pulp_id | pulp_created | pulp_last_updated | resource_id | task_id --------------------------------------+-------------------------------+-------------------------------+--------------------------------------+-------------------------------------- 8b84ca9c-27a2-4443-8b40-0c1b8f8dbb8b | 2021-09-13 05:57:18.30388+00 | 2021-09-13 05:57:18.30396+00 | 2bd534fb-ac19-4bba-90ae-8c5f80169176 | 6f950eb2-25cf-48e7-a6c4-db2636c3e01a 3a76f7fd-0ec3-4122-8621-9ae4ea199bc4 | 2021-09-13 06:02:08.757547+00 | 2021-09-13 06:02:08.75758+00 | 21e57dc1-e24a-4e2c-b540-260a6e5723f4 | 35b53f2e-24e6-41dc-b85e-093c4ee7f402 63b59527-9007-409a-a896-40766ce0966a | 2021-09-13 06:20:20.983527+00 | 2021-09-13 06:20:20.983553+00 | 8cf60e5d-01a1-403b-a4ce-974c9aa71c2b | 645a06ba-7bfe-45e1-9bad-e10ffb98cecd c26bab99-1a32-421c-9a64-9a856f70039b | 2021-09-13 06:21:19.257066+00 | 2021-09-13 06:21:19.257095+00 | f7cbfa62-a099-4fa1-bb38-f92c17a31838 | 47316602-a27e-4075-b218-2b968e2ca6d9 e0205f4a-0e2e-4fd2-b73b-9e8ad3bcc23b | 2021-09-13 06:21:19.964208+00 | 2021-09-13 06:21:19.964245+00 | 6434ac91-5584-4b41-a31a-839084469b75 | e1bb63e7-78be-4a2d-bbda-ffe00547f73a ... pulpcore=# select * from core_reservedresource; pulp_id | pulp_created | pulp_last_updated | resource | worker_id --------------------------------------+-------------------------------+-------------------------------+-------------------------------------------------------------------------+-------------------------------------- 2bd534fb-ac19-4bba-90ae-8c5f80169176 | 2021-09-13 05:57:18.243676+00 | 2021-09-13 05:57:18.243747+00 | /pulp/api/v3/repositories/rpm/rpm/f503830e-0b76-472d-ade8-8f071ff35529/ | 38c7a2e6-d69c-4fbc-95f7-91d1c22624d1 21e57dc1-e24a-4e2c-b540-260a6e5723f4 | 2021-09-13 06:02:08.754167+00 | 2021-09-13 06:02:08.754223+00 | /pulp/api/v3/repositories/rpm/rpm/28883edf-837d-415e-916b-c58b7727f685/ | c88fe686-d725-4c81-b075-08c7b56ef95f 8cf60e5d-01a1-403b-a4ce-974c9aa71c2b | 2021-09-13 06:20:20.980849+00 | 2021-09-13 06:20:20.980889+00 | /pulp/api/v3/repositories/rpm/rpm/0cb39f5b-bfda-47ca-b5d7-8274db2db21f/ | be9ad06a-adb2-428b-8885-c5312d105354 f7cbfa62-a099-4fa1-bb38-f92c17a31838 | 2021-09-13 06:21:19.253752+00 | 2021-09-13 06:21:19.253804+00 | /pulp/api/v3/repositories/rpm/rpm/1180b4dc-e451-448e-8178-3365c41ef91b/ | 9b80c7f7-5a8e-446e-a863-b3bdddf7bf7a 6434ac91-5584-4b41-a31a-839084469b75 | 2021-09-13 06:21:19.950034+00 | 2021-09-13 06:21:19.950084+00 | /pulp/api/v3/repositories/rpm/rpm/920e944f-ec92-4a33-aede-6347aef6f6ad/ | 1d94152b-11e7-4cd1-b3c6-b80820ec4b83 ... # Need to kill zombie reserve resources manually. $ PULP_SETTINGS=/etc/pulp/settings.py pulpcore-manager shell pulp: pulp_2to3_migration.app.plugin:INFO: Plugin pulp_deb is not installed in pulp3 therefore it will not be migrated from pulp2 Python 3.6.8 (default, Aug 13 2020, 07:46:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> from pulpcore.app.models import Worker, ReservedResource >>> >>> # Are there missing workers? >>> missing_workers = Worker.objects.missing_workers() >>> missing_workers.count() 75 >>> >>> # Look for resources that are hold by workers not considered online >>> online_workers = Worker.objects.online_workers() >>> zombies = ReservedResource.objects.exclude(worker__in=online_workers) >>> zombies.count() 5 >>> >>> # Dangerous action: Delete the zombie resources >>> zombies.delete() (10, {'core.TaskReservedResource': 5, 'core.ReservedResource': 5}) # The migration can proceed after cancelling the tasks manually. $ satellite-maintain content prepare ... Prepare content for Pulp 3: Checking for valid Katello configuraton. Starting task. 2021-09-06 13:17:03 +1000: Initial Migration steps complete. ...
There are a couple of related issues, which may possibly solve this one; however, that needs to be determined. Fixes are available, but not yet in an official Satellite 6.9 release. Bug 1962462 - fixed and targeted for upcoming 6.9.6 Bug 1976311 - fixed an proposed for future 6.9.z (likely 6.9.7)
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.
Steps to retest. 1. Sync some large repos 2. Run migration: satellite-maintain content prepare 3. On another terminal, run abort: satellite-maintain content prepare-abort 4. Check the pulp resources and waiting tasks to ensure that it's been cleared (from comment#0) 5. Rerun migration until finish Expected result: Migration should complete without issues Actual: Migration did complete without issue Verified on 6.9.8_01 Note: Abort migration or any time the migration gets stopped, the resources should be cleared and that's what occurred when I checked.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Satellite 6.9.8 Async Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0320