Bug 1458880 - [5.8] DEADLOCK: reactivation-key, snapshot_server(), register/delete
Summary: [5.8] DEADLOCK: reactivation-key, snapshot_server(), register/delete
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server
Version: 580
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Grant Gainey
QA Contact: Jan Hutař
URL:
Whiteboard:
Depends On:
Blocks: sat58-errata 1459963 1459964 sat5-deadlock
TreeView+ depends on / blocked
 
Reported: 2017-06-05 17:56 UTC by Grant Gainey
Modified: 2017-12-08 17:04 UTC (History)
9 users (show)

Fixed In Version: spacewalk-schema-2.5.1-50-sat, satellite-schema-5.8.0.33-1-sat, spacewalk-backend-2.5.3-147-sat
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1459963 (view as bug list)
Environment:
Last Closed: 2017-09-06 12:27:53 UTC
Target Upstream Version:


Attachments (Terms of Use)
Proposed patch (17.54 KB, patch)
2017-06-05 18:10 UTC, Grant Gainey
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1402935 None CLOSED [5.8] lock_counts fix has several issues 2019-08-06 17:14:33 UTC
Red Hat Product Errata RHSA-2017:2645 normal SHIPPED_LIVE Moderate: satellite and spacewalk security and bug fix update 2017-09-06 16:26:43 UTC

Internal Links: 1402935

Description Grant Gainey 2017-06-05 17:56:34 UTC
This BZ covers a new potential-deadlock path found during the testing of bug 1402935 . The deadlock can be reproduced in 5.7, so is not a 5.8 regression. Summary from the comments in 1402935:

* Running the reproducer-scripts from bug 1388160 on 4 clients
* Running rhnreg_ks --user --pass in a loop on a 5th machine

Expected: reactivations and registrations without deadlocks.
Actual:

===
Jan Hutař 2017-05-30 16:40:31 EDT
Grant, might this be related please?

==> /var/opt/rh/rh-postgresql95/lib/pgsql/data/pg_log/postgresql-Tue.log <==
2017-05-30 10:39:07.320 EDT ERROR:  deadlock detected
2017-05-30 10:39:07.320 EDT DETAIL:  Process 21183 waits for ShareLock on transaction 450090; blocked by process 21476.
	Process 21476 waits for ShareLock on transaction 450097; blocked by process 21183.
	Process 21183: SELECT rhn_channel.subscribe_server(1000015983, '123', 0)
	Process 21476: select * from delete_server($1)  as result
2017-05-30 10:39:07.320 EDT HINT:  See server log for query details.
2017-05-30 10:39:07.320 EDT CONTEXT:  while locking tuple (0,2) in relation "rhnservergroup"
	SQL statement "select sg.created                     from rhnservergroup sg
	           inner join rhnserver s on s.org_id = sg.org_id
	     where s.id = server_id_in
	       for update"
	PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
	SQL statement "SELECT lock_counts(server_id_in)"
	PL/pgSQL function rhn_channel.subscribe_server(numeric,numeric,numeric,numeric,numeric) line 16 at PERFORM
2017-05-30 10:39:07.320 EDT STATEMENT:  SELECT rhn_channel.subscribe_server(1000015983, '123', 0)
2017-05-30 10:39:23.471 EDT ERROR:  deadlock detected
2017-05-30 10:39:23.471 EDT DETAIL:  Process 21337 waits for ShareLock on transaction 450880; blocked by process 21599.
	Process 21599 waits for ShareLock on transaction 450883; blocked by process 21337.
	Process 21337: select * from delete_server($1)  as result
	Process 21599: SELECT rhn_channel.unsubscribe_server(1000016009, 116, 0)
2017-05-30 10:39:23.471 EDT HINT:  See server log for query details.
2017-05-30 10:39:23.471 EDT CONTEXT:  while locking tuple (0,2) in relation "rhnservergroup"
	SQL statement "select sg.created                     from rhnservergroup sg
	           inner join rhnserver s on s.org_id = sg.org_id
	     where s.id = server_id_in
	       for update"
	PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
	SQL statement "SELECT lock_counts(server_id_in)"
	PL/pgSQL function delete_server(numeric) line 27 at PERFORM
2017-05-30 10:39:23.471 EDT STATEMENT:  select * from delete_server($1)  as result
2017-05-30 10:39:23.471 EDT ERROR:  current transaction is aborted, commands ignored until end of transaction block
2017-05-30 10:39:23.471 EDT STATEMENT:  select 'c3p0 ping' from dual


I have been registering 4 systems in a loop using activation key (into tiny custom channel) and deleting big bunch of previously registered profiles (mix of username/password and activation key registrations).

And forgot to mention that activation key included system group.
===

Comments from the ensuing investigation:

===
Jan Hutař 2017-06-01 06:52:37 EDT

DB log:


2017-05-31 05:29:12.282 EDT ERROR:  deadlock detected
2017-05-31 05:29:12.282 EDT DETAIL:  Process 6997 waits for ShareLock on transaction 651256; blocked by process 6658.
        Process 6658 waits for ShareLock on transaction 651261; blocked by process 6997.
        Process 6997: SELECT rhn_channel.unsubscribe_server(1000017821, 169, 0)
        Process 6658: select * from rhn_channel.unsubscribe_server($1, $2, 0) as result
2017-05-31 05:29:12.282 EDT HINT:  See server log for query details.
2017-05-31 05:29:12.282 EDT CONTEXT:  while locking tuple (0,2) in relation "rhnservergroup"
        SQL statement "select sg.created                     from rhnservergroup sg
                   inner join rhnserver s on s.org_id = sg.org_id
             where s.id = server_id_in
               for update"
        PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
        SQL statement "SELECT lock_counts(server_id_in)"
        PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
2017-05-31 05:29:12.282 EDT STATEMENT:  SELECT rhn_channel.unsubscribe_server(1000017821, 169, 0)


and tomcat log is (not sure if all of this is relevant, but this is what timestamps say):


2017-05-31 05:29:11,281 [TP-Processor10] ERROR com.redhat.rhn.frontend.xmlrpc.BaseHandler - Error calling method: 
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor789.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
        at java.lang.reflect.Method.invoke(Method.java:508)
        at com.redhat.rhn.frontend.xmlrpc.BaseHandler.invoke(BaseHandler.java:174)
        at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123)
        at com.redhat.rhn.frontend.xmlrpc.RhnXmlRpcServer.execute(RhnXmlRpcServer.java:54)
        at com.redhat.rhn.frontend.xmlrpc.XmlRpcServlet.doPost(XmlRpcServlet.java:162)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:643)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129)
        at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.LocalizedEnvironmentFilter.doFilter(LocalizedEnvironmentFilter.java:67)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.EnvironmentFilter.doFilter(EnvironmentFilter.java:101)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.SessionFilter.doFilter(SessionFilter.java:57)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:97)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:299)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769)
        at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698)
        at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
        at java.lang.Thread.run(Thread.java:785)
Caused by: 
com.redhat.rhn.common.db.WrappedSQLException: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
        at com.redhat.rhn.common.translation.SqlExceptionTranslator.postgreSqlException(SqlExceptionTranslator.java:54)
        at com.redhat.rhn.common.translation.SqlExceptionTranslator.sqlException(SqlExceptionTranslator.java:44)
        at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:487)
        at com.redhat.rhn.common.db.datasource.CallableMode.execute(CallableMode.java:39)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1557)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1531)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1488)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1449)
        at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.subscribeToNewChannels(UpdateChildChannelsCommand.java:156)
        at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.store(UpdateChildChannelsCommand.java:103)
        at com.redhat.rhn.frontend.xmlrpc.system.SystemHandler.setChildChannels(SystemHandler.java:381)
        ... 40 more
Caused by: 
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1827)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:508)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:384)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:377)
        at com.mchange.v2.c3p0.impl.NewProxyCallableStatement.execute(NewProxyCallableStatement.java:2706)
        at com.redhat.rhn.common.db.NamedPreparedStatement.execute(NamedPreparedStatement.java:117)
        at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:474)
        ... 48 more
2017-05-31 05:29:11,282 [TP-Processor10] ERROR com.redhat.rhn.frontend.xmlrpc.BaseHandler - Caused by: 
com.redhat.rhn.common.db.WrappedSQLException: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
        at com.redhat.rhn.common.translation.SqlExceptionTranslator.postgreSqlException(SqlExceptionTranslator.java:54)
        at com.redhat.rhn.common.translation.SqlExceptionTranslator.sqlException(SqlExceptionTranslator.java:44)
        at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:487)
        at com.redhat.rhn.common.db.datasource.CallableMode.execute(CallableMode.java:39)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1557)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1531)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1488)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1449)
        at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.subscribeToNewChannels(UpdateChildChannelsCommand.java:156)
        at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.store(UpdateChildChannelsCommand.java:103)
        at com.redhat.rhn.frontend.xmlrpc.system.SystemHandler.setChildChannels(SystemHandler.java:381)
        at sun.reflect.GeneratedMethodAccessor789.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
        at java.lang.reflect.Method.invoke(Method.java:508)
        at com.redhat.rhn.frontend.xmlrpc.BaseHandler.invoke(BaseHandler.java:174)
        at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123)
        at com.redhat.rhn.frontend.xmlrpc.RhnXmlRpcServer.execute(RhnXmlRpcServer.java:54)
        at com.redhat.rhn.frontend.xmlrpc.XmlRpcServlet.doPost(XmlRpcServlet.java:162)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:643)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129)
        at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.LocalizedEnvironmentFilter.doFilter(LocalizedEnvironmentFilter.java:67)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.EnvironmentFilter.doFilter(EnvironmentFilter.java:101)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.SessionFilter.doFilter(SessionFilter.java:57)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:97)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:299)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769)
        at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698)
        at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
        at java.lang.Thread.run(Thread.java:785)
Caused by:
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1827)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:508)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:384)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:377)
        at com.mchange.v2.c3p0.impl.NewProxyCallableStatement.execute(NewProxyCallableStatement.java:2706)
        at com.redhat.rhn.common.db.NamedPreparedStatement.execute(NamedPreparedStatement.java:117)
        at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:474)
        ... 48 more
2017-05-31 05:29:11,282 [TP-Processor10] ERROR com.redhat.rhn.frontend.xmlrpc.BaseHandler - Rolling back transaction
May 31, 2017 5:29:11 AM redstone.xmlrpc.XmlRpcDispatcher writeError
WARNING: redstone.xmlrpc.XmlRpcFault: unhandled internal exception: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
===

===
Grant Gainey 2017-06-01 16:24:23 EDT

Using the scripts from bug 1388160, everything is fine - until I start using SSM to delete profiles at the same time.  Investigating SSM-delete-servers codepath, to see if there's a place where Java is locking out-of-order.

Jan - ignore the "can we get 5.8 to look at 9.2?" from #c17 - instead, can you point your clients at a *5.7* satellite, and re-run your reproducer? If this happens against 5.7, then it is not this BZ, and not a 5.8 regression - would still need its own BZ, tho. In the meantime, I will attempt same, assuming I can get an env to work before I have to leave tonight.

[reply] [−] Private Comment 19 Grant Gainey 2017-06-01 16:32:28 EDT
I take it back - it wasn't SSM-delete that caused deadlocks, it was registering new systems (which is going down the python-path, and which I could very easily believe is not locking everything in the right order. Investigation Continues.
===

===
Grant Gainey 2017-06-02 11:52:30 EDT
Current status:

* can reproduce frequent deadlocks on 5.7 using the scripts from bug 1388160 alone, on 4 clients.
* explicitly locking reactivation-path in python appears to address these
* adding vanilla-rhnreg_ks to the 4-clients-deadlocks.sh causes ~5-7% of the registrations to cause a deadlock
* currently appears to be caused by something on the rhn_server.snapshot_server() path (all systems have Provisioning and are snapshotting at registration time)

Current thoughts:

* This is not a 5.8 regression
* In fact, things appear to be *better* on 5.8 than 5.7. Somehow.
* spacewalk-backend/python appears to be the problem path
* explicit locking of the reactivation-key path helps
* we're not quite done yet

Investigation continues. Suggest that the current state is that we do *not* hold up 5.8's schedule for this issue, since it is reproduceable in 5.7 (and possibly in 5.6, based on a parallel investigation)
===


Current Results:

* Locking is needed, in python, on the reactivation-key path
* Locking is needed, in stored-procedure, in rhn_server.snapshot_server()

Doing both of these things resolves the deadlocks observed during this investigation, on both 5.8 and 5.7.

Comment 1 Grant Gainey 2017-06-05 18:10:06 UTC
Created attachment 1285092 [details]
Proposed patch

Proposed patch teaches python reactivation-key path and rhn_server.snapshot_server() to explicitly call lock_counts()

Comment 21 Radovan Drazny 2017-08-29 08:52:46 UTC
Re-verified on spacewalk-backend-2.5.3-151.el6sat and satellite-schema-5.8.0.33-1.el6sat, both Oracle and PG servers.

Comment 22 errata-xmlrpc 2017-09-06 12:27:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2645


Note You need to log in before you can comment on or make changes to this bug.