1480367 – [5.7] rhn_server.snapshot_server deadlock on Activation Key path

Bug 1480367 - [5.7] rhn_server.snapshot_server deadlock on Activation Key path

Summary: [5.7] rhn_server.snapshot_server deadlock on Activation Key path

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite 5
Classification:	Red Hat
Component:	Server
Sub Component:
Version:	570
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Jan Dobes
QA Contact:	Jan Hutař
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1481848 sat5-deadlock
TreeView+	depends on / blocked

Reported:	2017-08-10 20:11 UTC by Shannon Hughes
Modified:	2021-09-09 12:30 UTC (History)
CC List:	6 users (show)
Fixed In Version:	spacewalk-schema-2.3.2-35 satellite-schema-5.7.0.37-1 spacewalk-backend-2.3.3-55
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1481848 (view as bug list)
Environment:
Last Closed:	2017-12-13 07:56:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Replacement for the postgresql lock_counts() proc (2.15 KB, text/x-vhdl) 2017-08-10 20:31 UTC, Grant Gainey	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:3443	0	normal	SHIPPED_LIVE	Satellite 5.7 bug fix update	2017-12-12 18:53:00 UTC

Description Shannon Hughes 2017-08-10 20:11:55 UTC

Description of problem:

BZ to track the deadlock work that is occurring on the activation key and server snapshot path during registration: 

2017-08-09 15:35:30.511 EDT ERROR:  deadlock detected
2017-08-09 15:35:30.511 EDT DETAIL:  Process 12475 waits for ShareLock on transaction 949432; blocked by process 12488.
        Process 12488 waits for ShareLock on transaction 949435; blocked by process 12475.
        Process 12475: SELECT rhn_server.snapshot_server(1000012238, 'Package profile changed')
        Process 12488: SELECT rhn_entitlements.entitle_server(1000012239, 'enterprise_entitled')
2017-08-09 15:35:30.511 EDT HINT:  See server log for query details.
2017-08-09 15:35:30.511 EDT CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."rhnservergroup" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"
        SQL statement "insert into rhnSnapshotServerGroup (snapshot_id, server_group_id) (
                        select        snapshot_id_v, sgm.server_group_id
                        from        rhnServerGroupMembers sgm
                        where        sgm.server_id = server_id_in
                )"
        PL/pgSQL function rhn_server.snapshot_server(numeric,character varying) line 37 at SQL statement
2017-08-09 15:35:30.511 EDT STATEMENT:  SELECT rhn_server.snapshot_server(1000012238, 'Package profile changed')


Turning off snapshots removes the deadlock but current work is looking at rhnServerSnapshopGroup within lock_counts()

Currently seeing on Satellite 5.7 but would expect this to affect 5.8 too.

Comment 1 Grant Gainey 2017-08-10 20:31:01 UTC

Created attachment 1311903 [details]
Replacement for the postgresql lock_counts() proc

After a lot of digging, it appears that the act of storing a server-snapshot is locking rows in rhnservergroup as a side-effect. This caused DB locks to be acquired out-of-order, and hence the deadlock.

This file adds the rhnSnapshotServerGroup to the list of locks managed by lock_counts(), and enforces lock-acquisition in the right order. It should address the deadlocks between snapshot_server() and entitle_server()

Apply by downloading lock_counts.sql onto the Sat5 instance, and running the foolowing command:

# spacewalk-sql --select-mode lock_counts.sql

Comment 17 errata-xmlrpc 2017-12-13 07:56:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3443

Note You need to log in before you can comment on or make changes to this bug.