Bug 1459963 - [5.7] DEADLOCK: reactivation-key, snapshot_server(), register/delete
Summary: [5.7] DEADLOCK: reactivation-key, snapshot_server(), register/delete
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server
Version: 570
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Grant Gainey
QA Contact: Ales Dujicek
URL:
Whiteboard:
: 1406929 (view as bug list)
Depends On: 1458880
Blocks: 465198 1459964
TreeView+ depends on / blocked
 
Reported: 2017-06-08 16:58 UTC by Grant Gainey
Modified: 2020-09-10 10:41 UTC (History)
8 users (show)

Fixed In Version: satellite-schema-5.7.0.31-1-sat, spacewalk-backend-2.3.3-52-sat
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1458880
: 1459964 (view as bug list)
Environment:
Last Closed: 2017-07-19 14:50:36 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1500612 None CLOSED [5.7] DEADLOCK: reactivation-key, snapshot_server(), register/delete POSTGRESQL fix was not shipped complete 2019-04-30 22:09:36 UTC
Red Hat Product Errata RHBA-2017:1771 normal SHIPPED_LIVE Red Hat Satellite 5.7 bug fix update 2017-07-19 18:50:09 UTC

Internal Links: 1500612

Description Grant Gainey 2017-06-08 16:58:27 UTC
+++ This bug was initially created as a clone of Bug #1458880 +++

This BZ covers a new potential-deadlock path found during the testing of bug 1402935 . The deadlock can be reproduced in 5.7, so is not a 5.8 regression. Summary from the comments in 1402935:

* Running the reproducer-scripts from bug 1388160 on 4 clients
* Running rhnreg_ks --user --pass in a loop on a 5th machine

Expected: reactivations and registrations without deadlocks.
Actual:

===
Jan Hutař 2017-05-30 16:40:31 EDT
Grant, might this be related please?

==> /var/opt/rh/rh-postgresql95/lib/pgsql/data/pg_log/postgresql-Tue.log <==
2017-05-30 10:39:07.320 EDT ERROR:  deadlock detected
2017-05-30 10:39:07.320 EDT DETAIL:  Process 21183 waits for ShareLock on transaction 450090; blocked by process 21476.
	Process 21476 waits for ShareLock on transaction 450097; blocked by process 21183.
	Process 21183: SELECT rhn_channel.subscribe_server(1000015983, '123', 0)
	Process 21476: select * from delete_server($1)  as result
2017-05-30 10:39:07.320 EDT HINT:  See server log for query details.
2017-05-30 10:39:07.320 EDT CONTEXT:  while locking tuple (0,2) in relation "rhnservergroup"
	SQL statement "select sg.created                     from rhnservergroup sg
	           inner join rhnserver s on s.org_id = sg.org_id
	     where s.id = server_id_in
	       for update"
	PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
	SQL statement "SELECT lock_counts(server_id_in)"
	PL/pgSQL function rhn_channel.subscribe_server(numeric,numeric,numeric,numeric,numeric) line 16 at PERFORM
2017-05-30 10:39:07.320 EDT STATEMENT:  SELECT rhn_channel.subscribe_server(1000015983, '123', 0)
2017-05-30 10:39:23.471 EDT ERROR:  deadlock detected
2017-05-30 10:39:23.471 EDT DETAIL:  Process 21337 waits for ShareLock on transaction 450880; blocked by process 21599.
	Process 21599 waits for ShareLock on transaction 450883; blocked by process 21337.
	Process 21337: select * from delete_server($1)  as result
	Process 21599: SELECT rhn_channel.unsubscribe_server(1000016009, 116, 0)
2017-05-30 10:39:23.471 EDT HINT:  See server log for query details.
2017-05-30 10:39:23.471 EDT CONTEXT:  while locking tuple (0,2) in relation "rhnservergroup"
	SQL statement "select sg.created                     from rhnservergroup sg
	           inner join rhnserver s on s.org_id = sg.org_id
	     where s.id = server_id_in
	       for update"
	PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
	SQL statement "SELECT lock_counts(server_id_in)"
	PL/pgSQL function delete_server(numeric) line 27 at PERFORM
2017-05-30 10:39:23.471 EDT STATEMENT:  select * from delete_server($1)  as result
2017-05-30 10:39:23.471 EDT ERROR:  current transaction is aborted, commands ignored until end of transaction block
2017-05-30 10:39:23.471 EDT STATEMENT:  select 'c3p0 ping' from dual


I have been registering 4 systems in a loop using activation key (into tiny custom channel) and deleting big bunch of previously registered profiles (mix of username/password and activation key registrations).

And forgot to mention that activation key included system group.
===

Comments from the ensuing investigation:

===
Jan Hutař 2017-06-01 06:52:37 EDT

DB log:


2017-05-31 05:29:12.282 EDT ERROR:  deadlock detected
2017-05-31 05:29:12.282 EDT DETAIL:  Process 6997 waits for ShareLock on transaction 651256; blocked by process 6658.
        Process 6658 waits for ShareLock on transaction 651261; blocked by process 6997.
        Process 6997: SELECT rhn_channel.unsubscribe_server(1000017821, 169, 0)
        Process 6658: select * from rhn_channel.unsubscribe_server($1, $2, 0) as result
2017-05-31 05:29:12.282 EDT HINT:  See server log for query details.
2017-05-31 05:29:12.282 EDT CONTEXT:  while locking tuple (0,2) in relation "rhnservergroup"
        SQL statement "select sg.created                     from rhnservergroup sg
                   inner join rhnserver s on s.org_id = sg.org_id
             where s.id = server_id_in
               for update"
        PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
        SQL statement "SELECT lock_counts(server_id_in)"
        PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
2017-05-31 05:29:12.282 EDT STATEMENT:  SELECT rhn_channel.unsubscribe_server(1000017821, 169, 0)


and tomcat log is (not sure if all of this is relevant, but this is what timestamps say):


2017-05-31 05:29:11,281 [TP-Processor10] ERROR com.redhat.rhn.frontend.xmlrpc.BaseHandler - Error calling method: 
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor789.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
        at java.lang.reflect.Method.invoke(Method.java:508)
        at com.redhat.rhn.frontend.xmlrpc.BaseHandler.invoke(BaseHandler.java:174)
        at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123)
        at com.redhat.rhn.frontend.xmlrpc.RhnXmlRpcServer.execute(RhnXmlRpcServer.java:54)
        at com.redhat.rhn.frontend.xmlrpc.XmlRpcServlet.doPost(XmlRpcServlet.java:162)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:643)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129)
        at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.LocalizedEnvironmentFilter.doFilter(LocalizedEnvironmentFilter.java:67)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.EnvironmentFilter.doFilter(EnvironmentFilter.java:101)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.SessionFilter.doFilter(SessionFilter.java:57)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:97)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:299)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769)
        at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698)
        at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
        at java.lang.Thread.run(Thread.java:785)
Caused by: 
com.redhat.rhn.common.db.WrappedSQLException: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
        at com.redhat.rhn.common.translation.SqlExceptionTranslator.postgreSqlException(SqlExceptionTranslator.java:54)
        at com.redhat.rhn.common.translation.SqlExceptionTranslator.sqlException(SqlExceptionTranslator.java:44)
        at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:487)
        at com.redhat.rhn.common.db.datasource.CallableMode.execute(CallableMode.java:39)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1557)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1531)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1488)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1449)
        at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.subscribeToNewChannels(UpdateChildChannelsCommand.java:156)
        at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.store(UpdateChildChannelsCommand.java:103)
        at com.redhat.rhn.frontend.xmlrpc.system.SystemHandler.setChildChannels(SystemHandler.java:381)
        ... 40 more
Caused by: 
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1827)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:508)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:384)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:377)
        at com.mchange.v2.c3p0.impl.NewProxyCallableStatement.execute(NewProxyCallableStatement.java:2706)
        at com.redhat.rhn.common.db.NamedPreparedStatement.execute(NamedPreparedStatement.java:117)
        at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:474)
        ... 48 more
2017-05-31 05:29:11,282 [TP-Processor10] ERROR com.redhat.rhn.frontend.xmlrpc.BaseHandler - Caused by: 
com.redhat.rhn.common.db.WrappedSQLException: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
        at com.redhat.rhn.common.translation.SqlExceptionTranslator.postgreSqlException(SqlExceptionTranslator.java:54)
        at com.redhat.rhn.common.translation.SqlExceptionTranslator.sqlException(SqlExceptionTranslator.java:44)
        at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:487)
        at com.redhat.rhn.common.db.datasource.CallableMode.execute(CallableMode.java:39)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1557)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1531)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1488)
        at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1449)
        at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.subscribeToNewChannels(UpdateChildChannelsCommand.java:156)
        at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.store(UpdateChildChannelsCommand.java:103)
        at com.redhat.rhn.frontend.xmlrpc.system.SystemHandler.setChildChannels(SystemHandler.java:381)
        at sun.reflect.GeneratedMethodAccessor789.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
        at java.lang.reflect.Method.invoke(Method.java:508)
        at com.redhat.rhn.frontend.xmlrpc.BaseHandler.invoke(BaseHandler.java:174)
        at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123)
        at com.redhat.rhn.frontend.xmlrpc.RhnXmlRpcServer.execute(RhnXmlRpcServer.java:54)
        at com.redhat.rhn.frontend.xmlrpc.XmlRpcServlet.doPost(XmlRpcServlet.java:162)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:643)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129)
        at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.LocalizedEnvironmentFilter.doFilter(LocalizedEnvironmentFilter.java:67)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.EnvironmentFilter.doFilter(EnvironmentFilter.java:101)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.SessionFilter.doFilter(SessionFilter.java:57)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at com.redhat.rhn.frontend.servlets.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:97)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:299)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769)
        at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698)
        at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
        at java.lang.Thread.run(Thread.java:785)
Caused by:
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1827)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:508)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:384)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:377)
        at com.mchange.v2.c3p0.impl.NewProxyCallableStatement.execute(NewProxyCallableStatement.java:2706)
        at com.redhat.rhn.common.db.NamedPreparedStatement.execute(NamedPreparedStatement.java:117)
        at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:474)
        ... 48 more
2017-05-31 05:29:11,282 [TP-Processor10] ERROR com.redhat.rhn.frontend.xmlrpc.BaseHandler - Rolling back transaction
May 31, 2017 5:29:11 AM redstone.xmlrpc.XmlRpcDispatcher writeError
WARNING: redstone.xmlrpc.XmlRpcFault: unhandled internal exception: ERROR: deadlock detected
  Detail: Process 6611 waits for ShareLock on transaction 651256; blocked by process 6658.
Process 6658 waits for AccessExclusiveLock on tuple (1,10) of relation 58289 of database 16384; blocked by process 6997.
Process 6997 waits for ShareLock on transaction 651259; blocked by process 6611.
  Hint: See server log for query details.
  Where: while locking tuple (0,2) in relation "rhnservergroup"
SQL statement "select sg.created                     from rhnservergroup sg
           inner join rhnserver s on s.org_id = sg.org_id
     where s.id = server_id_in
       for update"
PL/pgSQL function lock_counts(numeric) line 16 at SQL statement
SQL statement "SELECT lock_counts(server_id_in)"
PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM
===

===
Grant Gainey 2017-06-01 16:24:23 EDT

Using the scripts from bug 1388160, everything is fine - until I start using SSM to delete profiles at the same time.  Investigating SSM-delete-servers codepath, to see if there's a place where Java is locking out-of-order.

Jan - ignore the "can we get 5.8 to look at 9.2?" from #c17 - instead, can you point your clients at a *5.7* satellite, and re-run your reproducer? If this happens against 5.7, then it is not this BZ, and not a 5.8 regression - would still need its own BZ, tho. In the meantime, I will attempt same, assuming I can get an env to work before I have to leave tonight.

[reply] [−] Private Comment 19 Grant Gainey 2017-06-01 16:32:28 EDT
I take it back - it wasn't SSM-delete that caused deadlocks, it was registering new systems (which is going down the python-path, and which I could very easily believe is not locking everything in the right order. Investigation Continues.
===

===
Grant Gainey 2017-06-02 11:52:30 EDT
Current status:

* can reproduce frequent deadlocks on 5.7 using the scripts from bug 1388160 alone, on 4 clients.
* explicitly locking reactivation-path in python appears to address these
* adding vanilla-rhnreg_ks to the 4-clients-deadlocks.sh causes ~5-7% of the registrations to cause a deadlock
* currently appears to be caused by something on the rhn_server.snapshot_server() path (all systems have Provisioning and are snapshotting at registration time)

Current thoughts:

* This is not a 5.8 regression
* In fact, things appear to be *better* on 5.8 than 5.7. Somehow.
* spacewalk-backend/python appears to be the problem path
* explicit locking of the reactivation-key path helps
* we're not quite done yet

Investigation continues. Suggest that the current state is that we do *not* hold up 5.8's schedule for this issue, since it is reproduceable in 5.7 (and possibly in 5.6, based on a parallel investigation)
===


Current Results:

* Locking is needed, in python, on the reactivation-key path
* Locking is needed, in stored-procedure, in rhn_server.snapshot_server()

Doing both of these things resolves the deadlocks observed during this investigation, on both 5.8 and 5.7.

--- Additional comment from Grant Gainey on 2017-06-05 14:10 EDT ---

Proposed patch teaches python reactivation-key path and rhn_server.snapshot_server() to explicitly call lock_counts()

--- Additional comment from Jan Hutař on 2017-06-05 16:55 EDT ---

I was able to get this deadlock on PostgreSQL only and basically these are the steps:

1. register ~1000 profiles to your satellite (I had a mix of profiles registered
   via username/password and activation key, but I believe that is not important)
2. save deadlock.sh/reactivationkey.py/updatechannels.py combo to all your
   clients (I had 4 client VMs with 1GB RAM and 1 Satellite VM with 4GB RAM all
   on my laptop)
3. create activation key "1-shell"
     channel set to some custom channel
     join to some system group
4. clone rhel-x86_64-server-6 to shell-rhel-x86_64-server-6 (I believe content
   does not matter)
5. clone rhn-tools-rhel-x86_64-server-6 twice as clone-1-rhel-x86_64-server-6
   and clone-2-rhel-x86_64-server-6 as a childs of previously cloned base
   channel
6. update satellite hostname in both reactivationkey.py/updatechannels.py
7. register all your client systems and give them Provisioning add-on entitlement

--- Additional comment from Jan Hutař on 2017-06-05 17:02:17 EDT ---

And of course I have forgot about most important step:

8. start deadlock.sh on all your clients and once it is running - it should be
   printing something like:

# ./deadlock.sh 
ACT KEY: 1-shell,re-1-d4ddf783dc2182e5bdf4ed21476831d2.
ACT KEY: 1-shell,re-1-25d94990a0887dfe976ea4332b1133bd.
[...]

So, once that is running, select all previously registered profiles, add them to SSM and use SSM to delete these profiles.

Now watch PostgreSQL and Tomcat log. In my case, deleting only 25 profiles caused the deadlock once.

Comment 3 Grant Gainey 2017-06-19 12:24:07 UTC
*** Bug 1406929 has been marked as a duplicate of this bug. ***

Comment 4 Ales Dujicek 2017-06-28 06:47:55 UTC
if client calls
# rhnreg_ks --force --activationkey re-1-08e2cce03ff8e4cf232b037d2306ce30

then I get tracback email on Satellite:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/spacewalk/server/apacheRequest.py", lin
e 122, in call_function
    response = func(*params)
  File "/usr/share/rhn/server/handlers/xmlrpc/registration.py", line 527, in new
_system
    architecture, data)
  File "/usr/share/rhn/server/handlers/xmlrpc/registration.py", line 322, in cre
ate_system
    do_lock_counts(self, tokens_obj.get_server_id())
NameError: global name 'do_lock_counts' is not defined


and I cannot generate reactivation keys from webui (API works), no idea if it is also caused by this or there is a different problem


satellite-schema-5.7.0.31-1.el6sat.noarch
spacewalk-backend-2.3.3-51.el6sat.noarch

Comment 11 errata-xmlrpc 2017-07-19 14:50:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1771


Note You need to log in before you can comment on or make changes to this bug.