Bug 1406929 - backend python server snapshots are causing deadlock during registration
Summary: backend python server snapshots are causing deadlock during registration
Keywords:
Status: CLOSED DUPLICATE of bug 1459963
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server
Version: 570
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Tomáš Kašpárek
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: 465198
TreeView+ depends on / blocked
 
Reported: 2016-12-21 20:49 UTC by Shannon Hughes
Modified: 2021-09-09 12:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-19 12:24:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Shannon Hughes 2016-12-21 20:49:39 UTC
Description of problem:

Calls to the snapshot_server python method during the registration process is causing deadlocks to occur. there are two different stack traces (package profile updates and base channel changes): 

[Wed Dec 14 20:57:34 2016] [error] SQL Error generated: (99999, 'ERROR:  deadlock detected', 'DETAIL:  Process 113435 waits for ShareLock on transaction 514987968; blocked by process 114077.\\nProcess 114077 waits for ShareLock on transaction 514987959; blocked by process 113435.\\nHINT:  See server log for query details.\\nCONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."rhnservergroup" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"\\nSQL statement "insert into rhnSnapshotServerGroup (snapshot_id, server_group_id) (\\n\\t\\tselect\\tsnapshot_id_v, sgm.server_group_id\\n\\t\\tfrom\\trhnServerGroupMembers sgm\\n\\t\\twhere\\tsgm.server_id = server_id_in\\n\\t)"\\nPL/pgSQL function rhn_server.snapshot_server(numeric,character varying) line 37 at SQL statement\\n', TransactionRollbackError('deadlock detected\\nDETAIL:  Process 113435 waits for ShareLock on transaction 514987968; blocked by process 114077.\\nProcess 114077 waits for ShareLock on transaction 514987959; blocked by process 113435.\\nHINT:  See server log for query details.\\nCONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."rhnservergroup" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"\\nSQL statement "insert into rhnSnapshotServerGroup (snapshot_id, server_group_id) (\\n\\t\\tselect\\tsnapshot_id_v, sgm.server_group_id\\n\\t\\tfrom\\trhnServerGroupMembers sgm\\n\\t\\twhere\\tsgm.server_id = server_id_in\\n\\t)"\\nPL/pgSQL function rhn_server.snapshot_server(numeric,character varying) line 37 at SQL statement\\n',))
[Wed Dec 14 20:57:34 2016] [error]
[Wed Dec 14 20:57:34 2016] [error] Exception Handler Information
[Wed Dec 14 20:57:34 2016] [error] Traceback (most recent call last):
[Wed Dec 14 20:57:34 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/apacheRequest.py", line 122, in call_function
[Wed Dec 14 20:57:34 2016] [error]     response = func(*params)
[Wed Dec 14 20:57:34 2016] [error]   File "/usr/share/rhn/server/handlers/xmlrpc/registration.py", line 934, in update_packages
[Wed Dec 14 20:57:34 2016] [error]     server.save_packages()
[Wed Dec 14 20:57:34 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/rhnServer/server_wrapper.py", line 75, in save_packages
[Wed Dec 14 20:57:34 2016] [error]     ret = self.save_packages_byid(self.server["id"], schedule=schedule)
[Wed Dec 14 20:57:34 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/rhnServer/server_packages.py", line 246, in save_packages_byid
[Wed Dec 14 20:57:34 2016] [error]     snapshot_server(sysid, "Package profile changed")
[Wed Dec 14 20:57:34 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/rhnServer/server_lib.py", line 235, in snapshot_server
[Wed Dec 14 20:57:34 2016] [error]     return rhnSQL.Procedure("rhn_server.snapshot_server")(server_id, reason)
[Wed Dec 14 20:57:34 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 112, in __call__
[Wed Dec 14 20:57:34 2016] [error]     result = Function.__call__(self, *args)
[Wed Dec 14 20:57:34 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 89, in __call__
[Wed Dec 14 20:57:34 2016] [error]     raise sql_base.SQLSchemaError(error_code, e.pgerror, e)


[Thu Dec 15 23:11:36 2016] [error] SQL Error generated: (99999, 'ERROR:  deadlock detected', 'DETAIL:  Process 96762 waits for ShareLock on transaction 516714149; blocked by process 96798.\\nProcess 96798 waits for ShareLock on transaction 516714170; blocked by process 96762.\\nHINT:  See server log for query details.\\nCONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."rhnservergroup" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"\\nSQL statement "insert into rhnSnapshotServerGroup (snapshot_id, server_group_id) (\\n\\t\\tselect\\tsnapshot_id_v, sgm.server_group_id\\n\\t\\tfrom\\trhnServerGroupMembers sgm\\n\\t\\twhere\\tsgm.server_id = server_id_in\\n\\t)"\\nPL/pgSQL function rhn_server.snapshot_server(numeric,character varying) line 37 at SQL statement\\n', TransactionRollbackError('deadlock detected\\nDETAIL:  Process 96762 waits for ShareLock on transaction 516714149; blocked by process 96798.\\nProcess 96798 waits for ShareLock on transaction 516714170; blocked by process 96762.\\nHINT:  See server log for query details.\\nCONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."rhnservergroup" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"\\nSQL statement "insert into rhnSnapshotServerGroup (snapshot_id, server_group_id) (\\n\\t\\tselect\\tsnapshot_id_v, sgm.server_group_id\\n\\t\\tfrom\\trhnServerGroupMembers sgm\\n\\t\\twhere\\tsgm.server_id = server_id_in\\n\\t)"\\nPL/pgSQL function rhn_server.snapshot_server(numeric,character varying) line 37 at SQL statement\\n',))
[Thu Dec 15 23:11:36 2016] [error]
[Thu Dec 15 23:11:36 2016] [error] Exception Handler Information
[Thu Dec 15 23:11:36 2016] [error] Traceback (most recent call last):
[Thu Dec 15 23:11:36 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/apacheRequest.py", line 122, in call_function
[Thu Dec 15 23:11:36 2016] [error]     response = func(*params)
[Thu Dec 15 23:11:36 2016] [error]   File "/usr/share/rhn/server/handlers/xmlrpc/up2date.py", line 145, in subscribeChannels
[Thu Dec 15 23:11:36 2016] [error]     server_lib.snapshot_server(self.server_id, 'Base Channel Updated')
[Thu Dec 15 23:11:36 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/rhnServer/server_lib.py", line 235, in snapshot_server
[Thu Dec 15 23:11:36 2016] [error]     return rhnSQL.Procedure("rhn_server.snapshot_server")(server_id, reason)
[Thu Dec 15 23:11:36 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 112, in __call__
[Thu Dec 15 23:11:36 2016] [error]     result = Function.__call__(self, *args)
[Thu Dec 15 23:11:36 2016] [error]   File "/usr/lib/python2.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 89, in __call__
[Thu Dec 15 23:11:36 2016] [error]     raise sql_base.SQLSchemaError(error_code, e.pgerror, e)

The corresponding blocked processes in postgresql logs, 

2016-12-15 23:11:51.134 UTC DETAIL:  Process 97571 waits for ShareLock on transaction 516714497; blocked by process 97581.
        Process 97581 waits for ShareLock on transaction 516714504; blocked by process 97571.
        Process 97571: SELECT rhn_server.snapshot_server(1000019666, 'Base Channel Updated')
        Process 97581: SELECT rhn_entitlements.entitle_server(1000024715, 'enterprise_entitled')
2016-12-15 23:11:51.134 UTC HINT:  See server log for query details.
2016-12-15 23:11:51.134 UTC CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."rhnservergroup" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"
        SQL statement "insert into rhnSnapshotServerGroup (snapshot_id, server_group_id) (
                        select  snapshot_id_v, sgm.server_group_id
                        from    rhnServerGroupMembers sgm
                        where   sgm.server_id = server_id_in
                )"
        PL/pgSQL function rhn_server.snapshot_server(numeric,character varying) line 37 at SQL statement
2016-12-15 23:11:51.134 UTC STATEMENT:  SELECT rhn_server.snapshot_server(1000019666, 'Base Channel Updated')
2016-12-15 23:11:51.141 UTC FATAL:  sorry, too many clients already
2016-12-15 23:11:51.143 UTC FATAL:  sorry, too many clients already
2016-12-15 23:11:51.316 UTC ERROR:  deadlock detected
2016-12-15 23:11:51.316 UTC DETAIL:  Process 96696 waits for ShareLock on transaction 516714497; blocked by process 97581.
        Process 97581 waits for ShareLock on transaction 516714509; blocked by process 96696.
        Process 96696: SELECT rhn_server.snapshot_server(1000022266, 'Base Channel Updated')
        Process 97581: SELECT rhn_entitlements.entitle_server(1000024715, 'enterprise_entitled')
2016-12-15 23:11:51.316 UTC HINT:  See server log for query details.
2016-12-15 23:11:51.316 UTC CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."rhnservergroup" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"
        SQL statement "insert into rhnSnapshotServerGroup (snapshot_id, server_group_id) (
                        select  snapshot_id_v, sgm.server_group_id
                        from    rhnServerGroupMembers sgm
                        where   sgm.server_id = server_id_in
                )"
        PL/pgSQL function rhn_server.snapshot_server(numeric,character varying) line 37 at SQL statement
2016-12-15 23:11:51.316 UTC STATEMENT:  SELECT rhn_server.snapshot_server(1000022266, 'Base Channel Updated')


It appears we are locking by making the call to lock_counts in the rhn_entitlements.entitle_server: 

      perform lock_counts(server_id_in);

but when we call the rhn_server.snapshot_server I do not see us locking down the incoming sid with a call to lock_counts and then fails on insert into rhnSnapshotServerGroup

For now going to try setting enable_snapshots = 0 so the snapshot proc will be skipped until I can get a reproducer to test the rhn_server.shapshot_server change

Comment 2 Grant Gainey 2017-06-19 12:24:07 UTC
This appears to be another path to one of the issues being addressed by 1459963 (q.v.). Since that BZ is already ON_QA, closing this as a dup.

*** This bug has been marked as a duplicate of bug 1459963 ***


Note You need to log in before you can comment on or make changes to this bug.