Description of problem: BZ to track the deadlock work that is occurring on the activation key and server snapshot path during registration: 2017-08-09 15:35:30.511 EDT ERROR: deadlock detected 2017-08-09 15:35:30.511 EDT DETAIL: Process 12475 waits for ShareLock on transaction 949432; blocked by process 12488. Process 12488 waits for ShareLock on transaction 949435; blocked by process 12475. Process 12475: SELECT rhn_server.snapshot_server(1000012238, 'Package profile changed') Process 12488: SELECT rhn_entitlements.entitle_server(1000012239, 'enterprise_entitled') 2017-08-09 15:35:30.511 EDT HINT: See server log for query details. 2017-08-09 15:35:30.511 EDT CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."rhnservergroup" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x" SQL statement "insert into rhnSnapshotServerGroup (snapshot_id, server_group_id) ( select snapshot_id_v, sgm.server_group_id from rhnServerGroupMembers sgm where sgm.server_id = server_id_in )" PL/pgSQL function rhn_server.snapshot_server(numeric,character varying) line 37 at SQL statement 2017-08-09 15:35:30.511 EDT STATEMENT: SELECT rhn_server.snapshot_server(1000012238, 'Package profile changed') Turning off snapshots removes the deadlock but current work is looking at rhnServerSnapshopGroup within lock_counts() Currently seeing on Satellite 5.7 but would expect this to affect 5.8 too.
Created attachment 1311903 [details] Replacement for the postgresql lock_counts() proc After a lot of digging, it appears that the act of storing a server-snapshot is locking rows in rhnservergroup as a side-effect. This caused DB locks to be acquired out-of-order, and hence the deadlock. This file adds the rhnSnapshotServerGroup to the list of locks managed by lock_counts(), and enforces lock-acquisition in the right order. It should address the deadlocks between snapshot_server() and entitle_server() Apply by downloading lock_counts.sql onto the Sat5 instance, and running the foolowing command: # spacewalk-sql --select-mode lock_counts.sql
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3443