Description of problem: This is not https://bugzilla.redhat.com/show_bug.cgi?id=2062526 , this bug is reproducible on 6.12.2 with the BZ fixed. There is yet another Capsule sync deadlock similar to those in bz2062526. When: - a Capsule has synchronized some bigger repo already, in particular several "Content View clones" of one root repo, - re-syncing the root repo on Satellite updates some packages wrt. changelog OR some modularity(*), OR packages changelog is updated on Satellite in either way(**) - (alternatively some modularity data are updated(***) for the root repo) - CVs are re-published to propagate the change - Capsule sync is triggered (to re-sync multiple copies of the same root repo concurrently) then this second Caps sync can hit deadlock like below. Some details/comments (*) in particular, either https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/tasks/synchronizing.py#L1609-L1611 or https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/tasks/synchronizing.py#L1612-L1614 is evaluated to True (**) *maybe* via rpm-trim-changelogs per https://access.redhat.com/solutions/6977656 (expected but not verified) (***) such that https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/tasks/synchronizing.py#L1595 is True and we get bit *different* backtrace, once seen at customer The deadlock: A) postgres logs: 2023-02-16 16:02:02 CET ERROR: deadlock detected 2023-02-16 16:02:02 CET DETAIL: Process 162993 waits for ShareLock on transaction 1339691; blocked by process 163038. Process 163038 waits for ShareLock on transaction 1339695; blocked by process 162993. Process 162993: UPDATE "rpm_package" SET "name" = 'abrt-tui', "epoch" = '0', "version" = '2.10.9', "release" = '11.el8', "arch" = 'x86_64', "pkgId" = 'a426056fa72116383a0ba578cce54973bebbb7b260b9a81ca688bbfd0510f12e', "checksum_type" = 'sha256', "summary" = 'abrt''s command line interface', "description" = 'This package contains a simple command line client for processing abrt reports in command line environment.', "url" = 'https://abrt.readthedocs.org/', "changelogs" = '[["Matej Habrnal <mhabrnal> 2.10.9-2", 1524830400, "- fix python requires in spec file"], ["Matej Habrnal <mhabrnal> 2.10.9-3", 1527163200, "- Remove dependency on deprecated nss-pem\n- Resovles: #1578427"], ["Matej Habrnal <mhabrnal> 2.10.9-4", 1529064000, "- ccpp: add %h and %e parameter into abrt-hook-ccpp\n- Resovles: #1587891"], ["Matej Habrnal <mhabrnal> 2.10.9-5", 1529064001, "- Set PYTHON to python3 during install"], ["Matej Habrnal <mhabrnal> 2.10.9-6", 1529064002, "- Switch hardcoded python3 s Process 163038: UPDATE "core_content" SET "timestamp_of_interest" = '2023-02-16T15:02:01.284462+00:00'::timestamptz WHERE "core_content"."pulp_id" IN ('23732995-37ca-43fd-b308-d5299c88efdd'::uuid, '55eff18d-a5a2-4ea1-ab2c-f3344012aad3'::uuid, 'cfb6d256-c5a2-4709-a95b-b0b3f05dbe19'::uuid, 'b9ee99a5-1f5f-4127-856d-eecbd31dbfdc'::uuid, 'ecf46842-64d5-4058-8ea1-8daedd79aa88'::uuid, 'edb59bbf-d818-4a8a-b53c-81d8bd4fb097'::uuid, '9dd4e490-10f7-4edb-9299-9078e279fe8f'::uuid, '58863e1f-e9af-4e37-b612-6de6bfd86017'::uuid, 'ad1fd971-c4e9-4a46-b65a-73820bc4f7cc'::uuid, '7837d364-9311-4d27-9094-4c2597ca0a93'::uuid, '42d41049-1fd9-4f46-ad3e-48912e79b71f'::uuid, '60ea16b2-2e23-4c72-84b6-b398ff2fe10b'::uuid, 'de9124db-96b2-43d4-83c6-f4e3b2f4c388'::uuid, '75fff501-e806-4ad9-8c4c-f326446353b0'::uuid, 'ef1dd7da-4b6f-4986-ab87-731f2e125f2c'::uuid, '9c3e34bd-9bac-4d5c-9e35-532d17b66e02'::uuid, 'da2be453-a607-4965-b498-e7a8301a7e1c'::uuid, '7d903cb6-7687-43fd-ac9e-78407ce41309'::uuid, '7f93d2b6-28a2-48da-bce1-9b737e14ec1b'::uuid, '3b685bed-4c2e 2023-02-16 16:02:02 CET HINT: See server log for query details. 2023-02-16 16:02:02 CET CONTEXT: while locking tuple (14,2) in relation "rpm_package" B) pulp backtrace: pulp [232a8148-a40f-40f2-bc10-da02d5a95454]: pulpcore.tasking.pulpcore_worker:INFO: Task a38ba600-e1dc-4527-b68e-4b15f6f98ff5 failed (deadlock detected DETAIL: Process 162993 waits for ShareLock on transaction 1339691; blocked by process 163038. Process 163038 waits for ShareLock on transaction 1339695; blocked by process 162993. HINT: See server log for query details. CONTEXT: while locking tuple (14,2) in relation "rpm_package" ) pulp [232a8148-a40f-40f2-bc10-da02d5a95454]: pulpcore.tasking.pulpcore_worker:INFO: File "/usr/lib/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py", line 410, in _perform_task result = func(*args, **kwargs) File "/usr/lib/python3.9/site-packages/pulp_rpm/app/tasks/synchronizing.py", line 568, in synchronize repo_version = dv.create() or repo.latest_version() File "/usr/lib/python3.9/site-packages/pulpcore/plugin/stages/declarative_version.py", line 161, in create loop.run_until_complete(pipeline) File "/usr/lib64/python3.9/asyncio/base_events.py", line 647, in run_until_complete return future.result() File "/usr/lib/python3.9/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline await asyncio.gather(*futures) File "/usr/lib/python3.9/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__ await self.run() File "/usr/lib/python3.9/site-packages/pulp_rpm/app/tasks/synchronizing.py", line 1623, in run await sync_to_async(result.save)() File "/usr/lib/python3.9/site-packages/asgiref/sync.py", line 414, in __call__ ret = await asyncio.wait_for(future, timeout=None) File "/usr/lib64/python3.9/asyncio/tasks.py", line 442, in wait_for return await fut File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/lib/python3.9/site-packages/asgiref/sync.py", line 455, in thread_handler return func(*args, **kwargs) File "/usr/lib/python3.9/site-packages/pulpcore/app/models/base.py", line 173, in save return super().save(*args, **kwargs) File "/usr/lib64/python3.9/contextlib.py", line 79, in inner return func(*args, **kwds) File "/usr/lib/python3.9/site-packages/django_lifecycle/mixins.py", line 134, in save save(*args, **kwargs) File "/usr/lib/python3.9/site-packages/django/db/models/base.py", line 739, in save self.save_base(using=using, force_insert=force_insert, File "/usr/lib/python3.9/site-packages/django/db/models/base.py", line 776, in save_base updated = self._save_table( File "/usr/lib/python3.9/site-packages/django/db/models/base.py", line 858, in _save_table updated = self._do_update(base_qs, using, pk_val, values, update_fields, File "/usr/lib/python3.9/site-packages/django/db/models/base.py", line 912, in _do_update return filtered._update(values) > 0 File "/usr/lib/python3.9/site-packages/django/db/models/query.py", line 802, in _update return query.get_compiler(self.db).execute_sql(CURSOR) File "/usr/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1559, in execute_sql cursor = super().execute_sql(result_type) File "/usr/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql cursor.execute(sql, params) File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers return executor(sql, params, many, context) File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) File "/usr/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) Version-Release number of selected component (if applicable): 6.12.1 and newer How reproducible: "it depends on reproducer" (some dirty ones have 100% and beyond) Steps to Reproduce: there are a few independent options, some unconfirmed (now): Reproducer #1: dirty 100% reproducer based on code hack (thanks @dalley! ): 1.1: have a CV with a big repo (RHEL8 AppStream used, any similar works fine) promoted to 5+ LEs, sync it to Capsule 1.2: republish metadata of the CV (hammer content-view version republish-repositories) 1.3: In /usr/lib/python3.9/site-packages/pulp_rpm/app/tasks/synchronizing.py, edit line 1606 to: incorrect_changelogs = ( normalized_result_changelogs != d_content.content.changelogs or True # ADD THE "or True"!!! ) 1.4: restart pulp services on Capsule 1.5 initiate Caps sync with skip metadata check Reproducer #2: (unconfirmed): Satellite-unrelated, pure pulp: 2.1: have 5+ repos with same feed URL from some source (of a bigger repo like RHEL8 AppStream) 2.2: sync the repos concurrently 2.3: bulk update packages changelogs on the remote repo (i.e. sort of (quite speculating) pulpcore-manager: from pulp_rpm.app.models import Package for p in Package.objects.all(): p.changelogs = p.changelogs[:5] p.save() (lots of changelogs have 10 entries, so truncating to 5 will update many packages; optionally filter the cycle per a repo in interest) 2.4: republish the remote repo where the "manual rpm changelog trim" happened 2.5: concurrently sync the 5+ repos from thet remote repo Reproducer #3 (also unconfirmed): rpm-trim-changelogs ! 3.1: Have a bigger repo in more CVs/LEs, sync it to Capsule 3.2: apply workaround from https://access.redhat.com/solutions/6977656 wrt rpm-trim-changelog 3.3: republish CVs 3.4: sync the Capsule Reproducer #4: anything that gets backtrace with https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/tasks/synchronizing.py#L1597 / updating modulemd content type (some variant of Reproducer #2, I guess). Actual results: Reproducer #1 led to 10+ deadlocks for a CV published to 11 LEs Expected results: No deadlock. Additional info:
> Reproducer #3 (also unconfirmed): rpm-trim-changelogs ! > 3.1: Have a bigger repo in more CVs/LEs, sync it to Capsule > 3.2: apply workaround from https://access.redhat.com/solutions/6977656 > wrt rpm-trim-changelog > 3.3: republish CVs > 3.4: sync the Capsule This is a valid reproducer, particular steps: 3.1: Have a CV with RHEL8 AppStream repo, published to 13 non-Library LEs (all associated with the Capsule) 3.2: Sync the Capsule (*without* any code hack) 3.3: trim changelogs on Satellite to less than the default 10: sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' /usr/bin/pulpcore-manager rpm-trim-changelogs --changelog-limit 5 3.4: Sync the Capsule (*without* any code hack) .. 12 deadlocks hit in my case And this makes a huge problem for customers upgrading to 6.12. Since: - on 6.11, changelogs are unlimited (both on Sat and Caps) - upgrade to 6.12 does call "pulpcore-manager rpm-trim-changelogs" within /usr/share/gems/gems/foreman_maintain-1.1.9/definitions/procedures/pulpcore/trim_rpm_changelogs.rb - BUT ONLY on Satellite - Caps upgrade to 6.12 does *not* contain this step; while foreman-maintain.log on Satellite does contain D, [2023-02-17 11:53:54+0100 #1516672] DEBUG -- : Running command sudo PULP_SETTINGS=/etc/pulp/settings.py DJANGO_SETTINGS_MODULE=pulpcore.app.settings pulpcore-manager rpm-trim-changelogs with stdin nil the Capsule does not contain this step - so first Capsule synces after upgrade to 6.12 will hit this BZ (whenever a bigger repo is promoted to a CV multiple times) Partha, could you please confirm if I am right in above "huge problem" and if it makes sense to raise doc BZ to add to Caps upgrade manual the step "run rpm-trim-changelogs manually"?
> Reproducer #4: anything that gets backtrace with > https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/tasks/synchronizing. > py#L1597 / updating modulemd content type (some variant of Reproducer #2, I > guess). Grant: in customers' data, I noticed both backtraces (for updating rpm package record and also modulemd record) - please let me know if you can come up with a (simplier) reproducer here or if you need some reproducer from me, to have also this variant of deadlock fixed.
As chatted with Grant already, the PR https://github.com/pulp/pulp_rpm/pull/3002 did some improvement, but sadly it is still easy to reproduce the deadlock. Particular reproducer: 1) Have CV with RHEL8 AppStream repo promoted to 10+ LEs (for pulp: have 10+ identical clones of RHEL8 AppStream repo) 2) Sync that 10+ LEs to Capsule 3) trim changelogs (just to emulate a change in synced content which sometimes happen for RH content and definitely for custom content): sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' /usr/bin/pulpcore-manager rpm-trim-changelogs --changelog-limit 9 # since 6.12.1, changelogs are trimmed to max 10 4) optionally, check current changelogs sizes of all packages: echo -e "from pulp_rpm.app.models import Package\nmax = 0\nfor p in Package.objects.all().only('changelogs').iterator():\n print(len(p.changelogs))\n" | sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' DJANGO_SETTINGS_MODULE='pulpcore.app.settings' pulpcore-manager shell | sort -n | uniq -c (output like : 2 0 59 1 126 2 153 3 201 4 28309 5 means 28k packages with 5 changelog entries, 201 packages with 4 changelogs etc) - here you can verify how changelogs sizes are updated (on Sat or Caps) over time 5) Republish the repositories in CV: hammer content-view version republish-repositories --id CVVersionID --force true 6) Sync the Capsule - it will get updated most of its packages due to the trim changelog 7) optionally, repeat from step 3 with changelog-limit 8 etc., for another round of Caps sync Same backtrace like in #c0 is seen, followed by SharedLock postgres: 2023-03-09 18:22:32 CET ERROR: deadlock detected 2023-03-09 18:22:32 CET DETAIL: Process 12427 waits for ShareLock on transaction 947829; blocked by process 12415. Process 12415 waits for ShareLock on transaction 947832; blocked by process 12427. Process 12427: UPDATE "rpm_package" SET "name" = 'tuned-utils', "epoch" = '0', "version" = '2.14.0', "release" = '3.el8', "arch" = 'noarch', "pkgId" = '614f0748b4a4d05192bb48a6026387207a62543d22279ca27955d4da64451ce8', "checksum_type" = 'sha256', "summary" = 'Various tuned utilities', "description" = 'This package contains utilities that can help you to fine tune and debug your system and manage tuned profiles.', "url" = 'http://www.tuned-project.org/', "changelogs" = '[["Jaroslav \u0160karvada <jskarvad> - 2.13.0-6", 1584964800, "- realtime: added conditional support for managed_irq\n Resolves: rhbz#1797025"], ["Jaroslav \u0160karvada <jskarvad> - 2.14.0-0.1.rc1", 1591617600, "- new release\n - rebased tuned to latest upstream\n resolves: rhbz#1792264\n - oracle: turned off NUMA balancing\n resolves: rhbz#1782233\n - man: documented the possibility to apply multiple profiles\n resolves: rhbz#1794337\n - cpu-partitioning: disabled kernel.timer_migration\n resolves: rhbz#1797629\n - p Process 12415: UPDATE "core_content" SET "timestamp_of_interest" = '2023-03-09T17:22:31.127840+00:00'::timestamptz WHERE "core_content"."pulp_id" IN ('6997658d-2278-4ece-ade2-44d5441292fc'::uuid, '4564232f-3245-4e62-8dd2-d16b2047e050'::uuid, 'a2cea5a8-2dc9-4f20-bbd0-177d4996ca8b'::uuid, 'b15e64f8-7f8e-4431-b49f-af81e9173082'::uuid, '335b4eb2-afc9-4fd4-b36a-8f9bb34135a4'::uuid, '87726ca2-5c06-40e3-bce2-3a5760f8f22d'::uuid, '2bafc3f1-d4d1-4de1-973b-62b32108012f'::uuid, '9ab66627-4b3c-435f-881a-bfdf8a3b0fcb'::uuid, '33b85b02-6bd1-4f02-93cb-714913318602'::uuid, '289f9b4b-1df4-4d5f-b55a-8bdece6169a9'::uuid, 'a11098b0-7c6d-4a58-9010-3bec40472b59'::uuid, '69d3b657-139d-4c91-a3c7-7f2a845de70e'::uuid, 'e87a2676-db0b-441b-b3b8-5f4e43d14f39'::uuid, '660e2968-acd5-499f-8e82-efd245e94e36'::uuid, 'ae3f2820-cf82-4f9a-b592-f5730e9bd933'::uuid, '553180ef-2516-4c31-99e2-c6272720b69c'::uuid, '2756c578-1360-410b-b48f-8b4d7b4e6333'::uuid, '43de33b8-7fb1-41ce-a1dd-d0842c69b508'::uuid, 'ad4c5393-33e4-4d85-b370-7e810b341d9e'::uuid, '823179b1-083d 2023-03-09 18:22:32 CET HINT: See server log for query details. 2023-03-09 18:22:32 CET CONTEXT: while locking tuple (6492,5) in relation "rpm_package" 2023-03-09 18:22:32 CET STATEMENT: UPDATE "rpm_package" SET "name" = 'tuned-utils', "epoch" = '0', "version" = '2.14.0', "release" = '3.el8', "arch" = 'noarch', "pkgId" = '614f0748b4a4d05192bb48a6026387207a62543d22279ca27955d4da64451ce8', "checksum_type" = 'sha256', "summary" = 'Various tuned utilities', "description" = 'This package contains utilities that can help you to fine tune and ..
*** Bug 2176584 has been marked as a duplicate of this bug. ***
Just FYI it has been spotted on Sat613/Caps13 also (very slightly different backtrace, otherwise same).
Grant++ ++ for providing a patch that survived all my testing :) No deadlock in either 8 Capsule synchronizations of a scaled test. This diff was applied over python39-pulp-rpm-3.18.9-1.el8pc.noarch (but I discourage to use it as is, this is rather for tracking purposes): --- pulp_rpm/app/tasks/synchronizing.py.orig 2023-06-09 14:39:50.229365570 +0200 +++ pulp_rpm/app/tasks/synchronizing.py.grant-patch-2 2023-06-09 18:09:39.419209068 +0200 @@ -1572,10 +1572,12 @@ class RpmQueryExistingContents(Stage): content_q_by_type[model_type] = content_q_by_type[model_type] | unit_q d_content_by_nat_key[d_content.content.natural_key()].append(d_content) - modified_rslts_by_pid = {} # hold the to-be-saved results by-pid - for model_type, content_q in content_q_by_type.items(): + modified_rslts_by_pk = {} # hold the to-be-saved results by-pk + modified_fields_by_pk = defaultdict(list) + query_types = content_q_by_type.keys() + for model_type in query_types: async for result in sync_to_async_iterable( - model_type.objects.filter(content_q).iterator() + model_type.objects.filter(content_q_by_type[model_type]).iterator() ): for d_content in d_content_by_nat_key[result.natural_key()]: # ============ The below lines are added vs. pulpcore ============ @@ -1583,13 +1585,15 @@ class RpmQueryExistingContents(Stage): # Fix saved snippet if malformed in DB, covers #2735 if result.snippet != d_content.content.snippet: result.snippet = d_content.content.snippet - modified_rslts_by_pid[result.pulp_id] = result + modified_rslts_by_pk[result.pk] = result + modified_fields_by_pk[result.pk].append("snippet") if model_type == ModulemdDefaults: # Fix saved snippet if malformed in DB, covers #2735 if result.snippet != d_content.content.snippet: result.snippet = d_content.content.snippet - modified_rslts_by_pid[result.pulp_id] = result + modified_rslts_by_pk[result.pk] = result + modified_fields_by_pk[result.pk].append("snippet") if model_type == Package: # changelogs coming out of the database are list[list], @@ -1605,25 +1609,29 @@ class RpmQueryExistingContents(Stage): if incorrect_changelogs: # Covers a class of issues with changelog data on the CDN result.changelogs = d_content.content.changelogs + modified_fields_by_pk[result.pk].append("changelogs") + if incorrect_modular_relation: # Covers #2643 result.is_modular = True + modified_fields_by_pk[result.pk].append("is_modular") + duplicated_files = len(result.files) != len(d_content.content.files) if duplicated_files: d_content.content.files = result.files if incorrect_changelogs or incorrect_modular_relation: log.debug("Updated data for package {}".format(result.nevra)) - modified_rslts_by_pid[result.pulp_id] = result + modified_rslts_by_pk[result.pk] = result + # ================================================================== d_content.content = result # Save results in guaranteed-pid-order - modified_rslts_pids = sorted(modified_rslts_by_pid.keys()) - for pid in modified_rslts_pids: - await sync_to_async(modified_rslts_by_pid[pid].save)() + modified_rslts_pks = sorted(modified_rslts_by_pk.keys()) + for pk in modified_rslts_pks: + await sync_to_async(modified_rslts_by_pk[pk].save)(update_fields=modified_fields_by_pk[pk]) # touch all affected content, **excluding** anything we just saved - query_types = sorted(content_q_by_type.keys(), key=lambda t: t.__name__) for query_type in query_types: try: # touch() in order to mark "last seen time" for the batch
Verified. Tested on Satellite 6.14.0 Snap 9.0 with Version: python39-pulp-rpm-3.19.9-1.el8pc.noarch Followed steps mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2170535#c7 Tested 10 runs of the scenario, all capsule syncs succeeded without deadlock, also checked postgres logs and pulp/syslogs to verify no 'deadlock detected'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.14 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6818