Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2203353

Summary:	6.13 upgrade fails with deadlock detected when migrating pulpcore database
Product:	Red Hat Satellite	Reporter:	matt jia <mjia>
Component:	Upgrades	Assignee:	Evgeni Golov <egolov>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Satellite QE Team <sat-qe-bz-list>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.13.1	CC:	ahumbe, dalley, egolov, ehelms, ekohlvan, jbhatia, jentrena, pdwyer, risantam, rlavi, saydas
Target Milestone:	Unspecified	Keywords:	Triaged, Upgrades, WorkAround
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-27 15:09:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2211502
Bug Blocks:

Description matt jia 2023-05-12 04:53:50 UTC

Description of problem:

Some customers are hitting below error when upgrading 6.13:

~~~
2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:     raise dj_exc_value.with_traceback(traceback) from exc_value
2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:   File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:     return self.cursor.execute(sql, params)
2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: django.db.utils.OperationalError: deadlock detected
2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 107920 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102571.
2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: Process 102571 waits for AccessShareLock on relation 2369756 of database 2369498; blocked by process 107920.
2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: HINT:  See server log for query details.
2023-05-11 18:05:53 [ERROR ] [configure] 'pulpcore-manager migrate --noinput' returned 1 instead of one of [0]
2023-05-11 18:05:53 [ERROR ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: change from 'notrun' to ['0'] failed: 'pulpcore-manager migrate --noinput' returned 1 instead of one of [0]
2023-05-11 18:05:53 [DEBUG ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]: Evaluated in 18.01 seconds
~~~

Similar to this upstream issue:

https://community.theforeman.org/t/katello-4-7-0-rc2-upgrade/31451

Version-Release number of selected component (if applicable):


How reproducible:

Hard


Steps to Reproduce:

It only happens on the customer side.

Actual results:

Upgrade fails with the pulp migration error.

Expected results:

Upgrade should have no error.

Additional info:

The workaround from the upstream is valid:

~~~
foreman-maintain service stop
systemctl start postgresql
systemctl start redis
PULP_SETTINGS=/etc/pulp/settings.py DJANGO_SETTINGS_MODULE=pulpcore.app.settings pulpcore-manager migrate --noinput
satellite-installer
~~~

Comment 1 Eric Helms 2023-05-15 12:59:40 UTC

An additional traceback:

$ PULP_SETTINGS=/etc/pulp/settings.py pulpcore-manager migrate --noinput
Operations to perform:
  Apply all migrations: ansible, auth, certguard, container, contenttypes, core, file, rpm, sessions
Running migrations:
  Applying core.0090_char_to_text_field...Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL:  Process 31321 waits for AccessExclusiveLock on relation 1373490 of database 647534; blocked by process 11607.
Process 11607 waits for AccessShareLock on relation 647792 of database 647534; blocked by process 31321.
HINT:  See server log for query details.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/pulpcore-manager", line 33, in <module>
    sys.exit(load_entry_point('pulpcore==3.21.6', 'console_scripts', 'pulpcore-manager')())
  File "/usr/lib/python3.9/site-packages/pulpcore/app/manage.py", line 11, in manage
    execute_from_command_line(sys.argv)
  File "/usr/lib/python3.9/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/usr/lib/python3.9/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/lib/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/lib/python3.9/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/usr/lib/python3.9/site-packages/django/core/management/base.py", line 89, in wrapped
    res = handle_func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/django/core/management/commands/migrate.py", line 244, in handle
    post_migrate_state = executor.migrate(
  File "/usr/lib/python3.9/site-packages/django/db/migrations/executor.py", line 117, in migrate
    state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
  File "/usr/lib/python3.9/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
    state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
  File "/usr/lib/python3.9/site-packages/django/db/migrations/executor.py", line 227, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/usr/lib/python3.9/site-packages/django/db/migrations/migration.py", line 126, in apply
    operation.database_forwards(self.app_label, schema_editor, old_state, project_state)
  File "/usr/lib/python3.9/site-packages/django/db/migrations/operations/fields.py", line 244, in database_forwards
    schema_editor.alter_field(from_model, from_field, to_field)
  File "/usr/lib/python3.9/site-packages/django/db/backends/base/schema.py", line 608, in alter_field
    self._alter_field(model, old_field, new_field, old_type, new_type,
  File "/usr/lib/python3.9/site-packages/django/db/backends/postgresql/schema.py", line 194, in _alter_field
    self.execute(self._delete_index_sql(model, index_name))
  File "/usr/lib/python3.9/site-packages/django/db/backends/base/schema.py", line 145, in execute
    cursor.execute(sql, params)
  File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.OperationalError: deadlock detected
DETAIL:  Process 31321 waits for AccessExclusiveLock on relation 1373490 of database 647534; blocked by process 11607.
Process 11607 waits for AccessShareLock on relation 647792 of database 647534; blocked by process 31321.
HINT:  See server log for query details.

Comment 2 Eric Helms 2023-05-15 13:00:18 UTC

Another traceback in case it helps:

2023-05-08 15:48:21 EDT LOG:  duration: 21045.299 ms  statement: ALTER TABLE "core_task" ALTER COLUMN "reserved_resources
_record" TYPE text[] USING "reserved_resources_record"::text[]
2023-05-08 15:48:22 EDT ERROR:  deadlock detected
2023-05-08 15:48:22 EDT DETAIL:  Process 17216 waits for AccessExclusiveLock on relation 1373490 of database 647534; blocked by process 11610.
        Process 11610 waits for AccessShareLock on relation 647792 of database 647534; blocked by process 17216.
        Process 17216: DROP INDEX IF EXISTS "core_taskschedule_name_f9ade8e8_like"
        Process 11610: SELECT "core_taskschedule"."pulp_id", "core_taskschedule"."pulp_created", "core_taskschedule"."pulp_last_updated", "core_taskschedule"."name", "core_taskschedule"."next_dispatch", "core_taskschedule"."dispatch_interval", "core_taskschedule"."task_name", "core_taskschedule"."last_task_id" FROM "core_taskschedule" LEFT OUTER JOIN "core_task" ON ("core_taskschedule"."last_task_id" = "core_task"."pulp_id") WHERE ("core_taskschedule"."next_dispatch" <= '2023-05-08T19:48:07.457888+00:00'::timestamptz AND ("core_taskschedule"."last_task_id" IS NULL OR "core_task"."state" IN ('skipped', 'completed', 'failed', 'canceled')))

Comment 5 Eric Helms 2023-05-18 20:39:05 UTC

This error makes me wonder if we need to / can add something that ensures all services are stopped before performing the database migration [1]? I recall we encountered this problem with Pulp 2 and ended up with using transitions to help solve it [2].


[1] https://github.com/theforeman/puppet-pulpcore/blob/master/manifests/database.pp#L29
[2] https://github.com/theforeman/puppet-pulp/blob/d064b340cae3ca8c2ab61d02179020b711b0cdc8/manifests/database.pp#LL11C1-L11C1

Comment 6 Evgeni Golov 2023-06-30 09:27:08 UTC

Let's look at the sosreport in the attached case and establish a timeline:

% zgrep -h "Running foreman-maintain command" var/log/foreman-maintain/foreman-maintain.log* | sort -h
I, [2023-05-08 18:13:22-0700 #906990]  INFO -- : Running foreman-maintain command with arguments [["upgrade", "run", "--target-version", "6.12"]]
I, [2023-05-08 18:14:03-0700 #907509]  INFO -- : Running foreman-maintain command with arguments [["packages", "-h"]]
I, [2023-05-08 18:14:10-0700 #907835]  INFO -- : Running foreman-maintain command with arguments [["packages", "is-locked"]]
I, [2023-05-08 18:15:37-0700 #910780]  INFO -- : Running foreman-maintain command with arguments [["packages", "lock", "--assumeyes"]]
I, [2023-05-10 10:48:43-0700 #44651]  INFO -- : Running foreman-maintain command with arguments [["service", "status"]]
I, [2023-05-10 15:22:20-0700 #52575]  INFO -- : Running foreman-maintain command with arguments [["upgrade", "list-versions"]]
I, [2023-05-10 15:24:42-0700 #52724]  INFO -- : Running foreman-maintain command with arguments [["upgrade", "list-versions"]]
I, [2023-05-10 15:25:02-0700 #53112]  INFO -- : Running foreman-maintain command with arguments [["upgrade", "check", "--target-version", "6.13"]]
I, [2023-05-10 15:27:21-0700 #54311]  INFO -- : Running foreman-maintain command with arguments [["upgrade", "run", "--target-version", "6.13"]]
I, [2023-05-10 15:59:17-0700 #62626]  INFO -- : Running foreman-maintain command with arguments [["service", "status"]]
I, [2023-05-10 16:22:32-0700 #66250]  INFO -- : Running foreman-maintain command with arguments [["maintenance-mode", "status"]]
I, [2023-05-10 16:22:46-0700 #66365]  INFO -- : Running foreman-maintain command with arguments [["service", "stop"]]
I, [2023-05-10 16:23:02-0700 #66880]  INFO -- : Running foreman-maintain command with arguments [["maintenance-mode", "status"]]
I, [2023-05-10 16:23:13-0700 #66995]  INFO -- : Running foreman-maintain command with arguments [["service", "start"]]
I, [2023-05-11 10:43:10-0700 #89377]  INFO -- : Running foreman-maintain command with arguments [["packages", "unlock"]]
I, [2023-05-11 10:43:36-0700 #89498]  INFO -- : Running foreman-maintain command with arguments [["packages", "lock"]]
I, [2023-05-11 12:55:55-0700 #91909]  INFO -- : Running foreman-maintain command with arguments [["service", "status"]]
I, [2023-05-11 12:56:04-0700 #92362]  INFO -- : Running foreman-maintain command with arguments [["maintenance-mode", "status"]]
I, [2023-05-11 12:58:34-0700 #92648]  INFO -- : Running foreman-maintain command with arguments [["upgrade", "run", "--target-version", "6.13", "-y"]]
I, [2023-05-11 15:16:52-0700 #95286]  INFO -- : Running foreman-maintain command with arguments [["packages", "-h"]]
I, [2023-05-11 15:16:58-0700 #95520]  INFO -- : Running foreman-maintain command with arguments [["packages", "is-locked"]]
I, [2023-05-11 15:22:32-0700 #100330]  INFO -- : Running foreman-maintain command with arguments [["packages", "lock", "--assumeyes"]]
I, [2023-05-11 16:18:33-0700 #101236]  INFO -- : Running foreman-maintain command with arguments [["packages", "-h"]]
I, [2023-05-11 16:18:38-0700 #101467]  INFO -- : Running foreman-maintain command with arguments [["packages", "is-locked"]]
I, [2023-05-11 16:18:39-0700 #101577]  INFO -- : Running foreman-maintain command with arguments [["packages", "unlock", "--assumeyes"]]
I, [2023-05-11 16:21:41-0700 #103116]  INFO -- : Running foreman-maintain command with arguments [["packages", "lock", "--assumeyes"]]
I, [2023-05-11 16:31:39-0700 #103374]  INFO -- : Running foreman-maintain command with arguments [["packages", "-h"]]
I, [2023-05-11 16:31:45-0700 #103607]  INFO -- : Running foreman-maintain command with arguments [["packages", "is-locked"]]
I, [2023-05-11 16:31:46-0700 #103717]  INFO -- : Running foreman-maintain command with arguments [["packages", "unlock", "--assumeyes"]]
I, [2023-05-11 16:33:19-0700 #104830]  INFO -- : Running foreman-maintain command with arguments [["packages", "lock", "--assumeyes"]]
I, [2023-05-11 18:03:34-0700 #106149]  INFO -- : Running foreman-maintain command with arguments [["upgrade", "run", "--target-version", "6.13", "-y"]]
I, [2023-05-11 18:04:22-0700 #106594]  INFO -- : Running foreman-maintain command with arguments [["packages", "-h"]]
I, [2023-05-11 18:04:27-0700 #106826]  INFO -- : Running foreman-maintain command with arguments [["packages", "is-locked"]]
I, [2023-05-11 18:06:02-0700 #107945]  INFO -- : Running foreman-maintain command with arguments [["packages", "lock", "--assumeyes"]]

In detail, that means:

- 2023-05-08 18:13 upgrade to 6.12

- 2023-05-10 15:27 first attempt to upgrade to 6.13, which failed (at around 2023-05-10 15:46:17) with

  Running scriptlet: rubygem-smart_proxy_remote_execution_ssh-0.10.1-   235/482 
mv: cannot move '/usr/share/foreman-proxy/.ssh' to '/var/lib/foreman-proxy/ssh/.ssh': File exists
error: %prein(rubygem-smart_proxy_remote_execution_ssh-0.10.1-1.el8sat.noarch) scriptlet failed, exit status 1

Error in PREIN scriptlet in rpm package rubygem-smart_proxy_remote_execution_ssh
  Upgrading        : python39-pulp-container-2.14.3-1.el8pc.noarch      236/482 
error: rubygem-smart_proxy_remote_execution_ssh-0.10.1-1.el8sat.noarch: install failed

- 2023-05-11 12:58 another attempt to upgrade to 6.13, still failing:

mv: cannot move '/usr/share/foreman-proxy/.ssh' to '/var/lib/foreman-proxy/ssh/.ssh': File exists
error: %prein(rubygem-smart_proxy_remote_execution_ssh-0.10.1-1.el8sat.noarch) scriptlet failed, exit status 1

Error in PREIN scriptlet in rpm package rubygem-smart_proxy_remote_execution_ssh
  Verifying        : rubygem-smart_proxy_remote_execution_ssh-0.10.1-1.el   1/2 
  Verifying        : rubygem-smart_proxy_remote_execution_ssh-0.7.3-1.el8   2/2 
Installed products updated.
Error: Transaction failed

- 2023-05-11 18:03 another attempt to upgrade
  this now correctly upgraded to rubygem-smart_proxy_remote_execution_ssh-0.10.1-1.el8sat.noarch
  but then failed on /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: change from 'notrun' to ['0'] failed: 'pulpcore-manager migrate --noinput' returned 1 instead of one of [0]

However, if we look at PostgreSQL and Installer logs, we see AccessExclusiveLock errors *before* that run (but after the last failed attempt):

% grep -r AccessExclusiveLock var/
var/lib/pgsql/data/log/postgresql-Thu.log:2023-05-11 15:20:42 PDT DETAIL:  Process 100014 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 67704.
var/lib/pgsql/data/log/postgresql-Thu.log:2023-05-11 15:21:01 PDT DETAIL:  Process 100151 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 67682.
var/lib/pgsql/data/log/postgresql-Thu.log:2023-05-11 16:20:11 PDT DETAIL:  Process 102858 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102575.
var/lib/pgsql/data/log/postgresql-Thu.log:2023-05-11 16:33:12 PDT DETAIL:  Process 104808 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102575.
var/lib/pgsql/data/log/postgresql-Thu.log:2023-05-11 18:05:52 PDT DETAIL:  Process 107920 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102571.
var/log/foreman-installer/satellite.20230511-161833.log:2023-05-11 15:20:42 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 100014 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 67704.
var/log/foreman-installer/satellite.20230511-161833.log:2023-05-11 15:20:42 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 100014 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 67704.
var/log/foreman-installer/satellite.20230511-161833.log:2023-05-11 15:21:02 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 100151 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 67682.
var/log/foreman-installer/satellite.20230511-161833.log:2023-05-11 15:21:02 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 100151 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 67682.
var/log/foreman-installer/satellite.20230511-163139.log:2023-05-11 16:20:11 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 102858 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102575.
var/log/foreman-installer/satellite.20230511-163139.log:2023-05-11 16:20:11 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 102858 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102575.
var/log/foreman-installer/satellite.20230511-180422.log:2023-05-11 16:33:13 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 104808 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102575.
var/log/foreman-installer/satellite.20230511-180422.log:2023-05-11 16:33:13 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 104808 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102575.
var/log/foreman-installer/satellite.log:2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 107920 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102571.
var/log/foreman-installer/satellite.log:2023-05-11 18:05:53 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 107920 waits for AccessExclusiveLock on relation 3049568 of database 2369498; blocked by process 102571.


This is interesting, but also if we look again at the maintain logs, the "service stop" was not executed for the "2023-05-11 18:03" run:
% grep "Stop applicable services" var/log/foreman-maintain/foreman-maintain.log*
var/log/foreman-maintain/foreman-maintain.log-20230511:I, [2023-05-10 15:42:28-0700 #54311]  INFO -- : --- Execution step 'Stop applicable services' [service-stop] started ---
var/log/foreman-maintain/foreman-maintain.log-20230511:I, [2023-05-10 15:42:35-0700 #54311]  INFO -- : --- Execution step 'Stop applicable services' finished ---
var/log/foreman-maintain/foreman-maintain.log-20230511:I, [2023-05-10 16:22:46-0700 #66365]  INFO -- : --- Execution step 'Stop applicable services' [service-stop] started ---
var/log/foreman-maintain/foreman-maintain.log-20230511:I, [2023-05-10 16:22:47-0700 #66365]  INFO -- : --- Execution step 'Stop applicable services' finished ---
% grep "Procedures before migrating to Satellite 6.13" var/log/foreman-maintain/foreman-maintain.log*
var/log/foreman-maintain/foreman-maintain.log:I, [2023-05-11 12:58:46-0700 #92648]  INFO -- : === Scenario 'Procedures before migrating to Satellite 6.13' started ===
var/log/foreman-maintain/foreman-maintain.log:I, [2023-05-11 12:58:46-0700 #92648]  INFO -- : === Scenario 'Procedures before migrating to Satellite 6.13' finished ===
var/log/foreman-maintain/foreman-maintain.log:I, [2023-05-11 18:03:46-0700 #106149]  INFO -- : === Scenario 'Procedures before migrating to Satellite 6.13' started ===
var/log/foreman-maintain/foreman-maintain.log:I, [2023-05-11 18:03:46-0700 #106149]  INFO -- : === Scenario 'Procedures before migrating to Satellite 6.13' finished ===
var/log/foreman-maintain/foreman-maintain.log-20230511:I, [2023-05-10 15:40:28-0700 #54311]  INFO -- : === Scenario 'Procedures before migrating to Satellite 6.13' started ===
var/log/foreman-maintain/foreman-maintain.log-20230511:I, [2023-05-10 15:42:35-0700 #54311]  INFO -- : === Scenario 'Procedures before migrating to Satellite 6.13' finished ===

Now, I still don't completely understand *why* the services weren't stopped in the "2023-05-11 18:03" run, but it explains why the deadlock happened as Pulpcore was running at that point (but shouldn't).
The Installer runs without foreman-maintain also didn't stop the services and thus ran into the issue.

Comment 7 Evgeni Golov 2023-07-03 08:20:19 UTC

So looking further at the logs and at the f-m data in var/lib/foreman-maintain/data.yml, I think what happened is:
f-m thought it got interrupted in the middle of an upgrade (which, strictly speaking, is correct, as yum failing to upgrade rubygem-smart_proxy_remote_execution_ssh aborts the rest of the scenario), but then when resuming the operation thought the other steps were already done and thus did not stop the services again.

I'd argue this is a bug -- certain steps of the scenario should happen regardless whether they already happened or not.

The workaround for users running into this would be to manually stop the services (satellite-maintain service stop) and then perform the upgrade once more.

Comment 8 Sayan Das 2023-07-05 14:49:14 UTC

*** Bug 2219589 has been marked as a duplicate of this bug. ***

Comment 9 Eric Helms 2023-07-19 19:34:20 UTC

This is addressed via https://bugzilla.redhat.com/show_bug.cgi?id=2211502 since that change moves the service stop out of the pre-migration phase and into the main migration phase which will ensure services are stopped every time. Therefore I am going to recommend that https://bugzilla.redhat.com/show_bug.cgi?id=2211502 be considered for 6.13.z.

Comment 13 Eric Helms 2023-09-27 15:09:43 UTC

Closing given https://bugzilla.redhat.com/show_bug.cgi?id=2211502 addresses the concern.