Bug 1481848
Summary: | [5.8] rhn_server.snapshot_server deadlock on Activation Key path | ||
---|---|---|---|
Product: | Red Hat Satellite 5 | Reporter: | Grant Gainey <ggainey> |
Component: | Server | Assignee: | Grant Gainey <ggainey> |
Status: | CLOSED ERRATA | QA Contact: | Martin Korbel <mkorbel> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 580 | CC: | adujicek, cshereme, ggainey, jdostal, jhutar, lhellebr, mkorbel, satqe-list, shughes, tkasparek, tlestach, xdmoon |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | spacewalk-schema-2.5.1-54-sat satellite-schema-5.8.0.38-1-sat spacewalk-backend-2.5.3-154-sat | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1480367 | Environment: | |
Last Closed: | 2017-12-13 07:59:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1480367 | ||
Bug Blocks: | 1450111, 1482563 |
Description
Grant Gainey
2017-08-15 21:13:45 UTC
Testing against 5.8 and postgresql95, using the updated lock_count.sql from https://bugzilla.redhat.com/show_bug.cgi?id=1480367#c1 , shows a greatly-reduced frequency of deadlocks - but they still happen. Investigation for the 5.8 fix continues. After updating the test 5.8 instance with the fixes from the soon-to-be-released https://bugzilla.redhat.com/show_bug.cgi?id=1458880 *and* the lock_counts change from 1480367#c1 , situation is improved dramatically - no deadlocks after two hours of testing with previous recreators. A separate stress-script that forces setBaseChannel API calls from multiple sources as fast as we can call it, was able to produce 2 deadlocks in 30 minutes of stress. The tracebacks generated involve snapshots, but as (separately) invoked from SystemHandler.setBaseChannel(). This *may* be the https://bugzilla.redhat.com/show_bug.cgi?id=1469744 (which has not yet had its own reproducer). NOTE: I can only create this, very occasionally, when stressing a 5.8 system that has had both the lock_counts fix for 1480367 *and* the changes for 1458880. The stress-script I was running was 4 instances of the script from https://bugzilla.redhat.com/show_bug.cgi?id=1469744#c6 The postgresql tracebacks in the setBaseChannel case are === 2017-08-30 15:06:18.906 EDT ERROR: deadlock detected 2017-08-30 15:06:18.906 EDT DETAIL: Process 29117 waits for ShareLock on transaction 246381; blocked by process 29115. Process 29115 waits for ShareLock on transaction 246377; blocked by process 29117. Process 29117: select * from rhn_channel.unsubscribe_server($1, $2, 0) as result Process 29115: select * from rhn_channel.unsubscribe_server($1, $2, 0) as result 2017-08-30 15:06:18.906 EDT HINT: See server log for query details. 2017-08-30 15:06:18.906 EDT CONTEXT: while locking tuple (0,1) in relation "rhnsnapshotservergroup" SQL statement "select ssg.snapshot_id from rhnSnapshotServerGroup ssg inner join rhnservergroup sg on sg.id = ssg.server_group_id inner join rhnserver s on s.org_id = sg.org_id where s.id = server_id_in for update" PL/pgSQL function lock_counts(numeric) line 17 at SQL statement SQL statement "SELECT lock_counts(server_id_in)" PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM 2017-08-30 15:06:18.906 EDT STATEMENT: select * from rhn_channel.unsubscribe_server($1, $2, 0) as result 2017-08-30 15:06:18.917 EDT ERROR: current transaction is aborted, commands ignored until end of transaction block 2017-08-30 15:06:18.917 EDT STATEMENT: select 'c3p0 ping' from dual === and the matching tomcat tracebacks from catalina.out are here: === 2017-08-30 15:06:18,926 [TP-Processor5] ERROR com.redhat.rhn.frontend.xmlrpc.BaseHandler - Error calling method: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor676.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:508) at com.redhat.rhn.frontend.xmlrpc.BaseHandler.invoke(BaseHandler.java:174) at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123) at com.redhat.rhn.frontend.xmlrpc.RhnXmlRpcServer.execute(RhnXmlRpcServer.java:54) at com.redhat.rhn.frontend.xmlrpc.XmlRpcServlet.doPost(XmlRpcServlet.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:643) at javax.servlet.http.HttpServlet.service(HttpServlet.java:723) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.redhat.rhn.frontend.servlets.LocalizedEnvironmentFilter.doFilter(LocalizedEnvironmentFilter.java:67) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.redhat.rhn.frontend.servlets.EnvironmentFilter.doFilter(EnvironmentFilter.java:101) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.redhat.rhn.frontend.servlets.SessionFilter.doFilter(SessionFilter.java:57) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.redhat.rhn.frontend.servlets.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:97) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:299) at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190) at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) at java.lang.Thread.run(Thread.java:785) Caused by: com.redhat.rhn.common.db.WrappedSQLException: ERROR: deadlock detected Detail: Process 29117 waits for ShareLock on transaction 246381; blocked by process 29115. Process 29115 waits for ShareLock on transaction 246377; blocked by process 29117. Hint: See server log for query details. Where: while locking tuple (0,1) in relation "rhnsnapshotservergroup" SQL statement "select ssg.snapshot_id from rhnSnapshotServerGroup ssg inner join rhnservergroup sg on sg.id = ssg.server_group_id inner join rhnserver s on s.org_id = sg.org_id where s.id = server_id_in for update" PL/pgSQL function lock_counts(numeric) line 17 at SQL statement SQL statement "SELECT lock_counts(server_id_in)" PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM at com.redhat.rhn.common.translation.SqlExceptionTranslator.postgreSqlException(SqlExceptionTranslator.java:54) at com.redhat.rhn.common.translation.SqlExceptionTranslator.sqlException(SqlExceptionTranslator.java:44) at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:487) at com.redhat.rhn.common.db.datasource.CallableMode.execute(CallableMode.java:39) at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1557) at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1531) at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1488) at com.redhat.rhn.manager.system.SystemManager.unsubscribeServerFromChannel(SystemManager.java:1449) at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.unsubscribeFromOldChannels(UpdateChildChannelsCommand.java:194) at com.redhat.rhn.manager.system.UpdateChildChannelsCommand.store(UpdateChildChannelsCommand.java:107) at com.redhat.rhn.manager.system.UpdateBaseChannelCommand.store(UpdateBaseChannelCommand.java:101) at com.redhat.rhn.frontend.xmlrpc.system.SystemHandler.setBaseChannel(SystemHandler.java:417) ... 40 more Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected Detail: Process 29117 waits for ShareLock on transaction 246381; blocked by process 29115. Process 29115 waits for ShareLock on transaction 246377; blocked by process 29117. Hint: See server log for query details. Where: while locking tuple (0,1) in relation "rhnsnapshotservergroup" SQL statement "select ssg.snapshot_id from rhnSnapshotServerGroup ssg inner join rhnservergroup sg on sg.id = ssg.server_group_id inner join rhnserver s on s.org_id = sg.org_id where s.id = server_id_in for update" PL/pgSQL function lock_counts(numeric) line 17 at SQL statement SQL statement "SELECT lock_counts(server_id_in)" PL/pgSQL function rhn_channel.unsubscribe_server(numeric,numeric,numeric,numeric,numeric,numeric) line 19 at PERFORM at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1827) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:508) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:384) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:377) at com.mchange.v2.c3p0.impl.NewProxyCallableStatement.execute(NewProxyCallableStatement.java:2706) at com.redhat.rhn.common.db.NamedPreparedStatement.execute(NamedPreparedStatement.java:117) at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:474) ... 49 more === Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3445 |