Bug 1165251
Summary: | deadlock when multiple system registrations update entitlement counts in rhnPrivateChannelFamily | ||
---|---|---|---|
Product: | Red Hat Satellite 5 | Reporter: | Stephen Herr <sherr> |
Component: | Server | Assignee: | Tomáš Kašpárek <tkasparek> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Pavel Studeník <pstudeni> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 570 | CC: | ahumbe, byodlows, cperry, dyordano, jnikolak, kshravag, miguel, pstudeni, satqe-list, tlestach, tpapaioa, xdmoon |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | spacewalk-backend-2.3.3-20 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1122625 | Environment: | |
Last Closed: | 2016-04-08 07:26:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1122625 | ||
Bug Blocks: | 1127215 |
Description
Stephen Herr
2014-11-18 15:51:38 UTC
I got following traceback when 4 systems were registered simultaneously to satellite and during registration I remove 500 systems by web UI. Reproducer with spacewalk-backend-2.0.3-30.el6sat.noarch >> rhnrek_ks .. Error Message: ERROR: deadlock detected Error Class Code: 90 Error Class Info: Unable to entitle system Explanation: An error has occurred while processing your request. If this problem persists please enter a bug report at bugzilla.redhat.com. If you choose to submit the bug report, please be sure to include details of what you were trying to do when this error occurred and details on how to reproduce this problem. >> tail -f /var/lib/pgsql/data/pg_log/postgresql-* ... 2014-11-25 08:36:18.372 EST DETAIL: Process 5480 waits for ShareLock on transaction 22287017; blocked by process 3823. Process 3823 waits for ShareLock on transaction 22287030; blocked by process 5480. Process 5480: SELECT rhn_entitlements.entitle_server(1000029179, E'enterprise_entitled') Process 3823: select * from delete_server($1) as result 2014-11-25 08:36:18.372 EST HINT: See server log for query details. 2014-11-25 08:36:18.372 EST CONTEXT: SQL statement "select sg.group_type, sg.org_id, sg.current_members, sg.max_members from rhnServerGroup sg where sg.id = $1 for update of sg" PL/pgSQL function "insert_into_servergroup" line 19 at SQL statement SQL statement "SELECT rhn_server.insert_into_servergroup ( $1 , $2 )" PL/pgSQL function "entitle_server" line 39 at PERFORM >> tail -f /var/lib/pgsql/data/pg_log/postgresql-* 2014-11-25 08:41:50.373 EST ERROR: deadlock detected 2014-11-25 08:41:50.373 EST DETAIL: Process 7775 waits for ExclusiveLock on tuple (3,16) of relation 20231 of database 16384; blocked by process 9542. Process 9542 waits for ShareLock on transaction 22298652; blocked by process 9540. Process 9540 waits for ShareLock on transaction 22298642; blocked by process 7775. Process 7775: select * from delete_server($1) as result Process 9542: SELECT rhn_channel.subscribe_server(1000029782, E'101', 0) Process 9540: SELECT rhn_entitlements.entitle_server(1000029781, E'enterprise_entitled') 2014-11-25 08:41:50.373 EST HINT: See server log for query details. 2014-11-25 08:41:50.373 EST CONTEXT: SQL statement "update rhnPrivateChannelFamily set current_members = current_members -1 where org_id in ( select org_id from rhnServer where id = $1 ) and channel_family_id in ( select rcfm.channel_family_id from rhnChannelFamilyMembers rcfm, rhnServerChannel rsc where rsc.server_id = $1 and rsc.channel_id = rcfm.channel_id and not exists ( select 1 from rhnChannelFamilyVirtSubLevel cfvsl, rhnSGTypeVirtSubLevel sgtvsl, rhnServerEntitlementView sev, rhnVirtualInstance vi where vi.virtual_system_id = $1 and vi.host_system_id = sev.server_id and sev.label in ('virtualization_host', 'virtualization_host_platform') and sev.server_group_type_id = sgtvsl.server_group_type_id and sgtvsl.virt_sub_level_id = cfvsl.virt_sub_level_id and cfvsl.channel_family_id = rcfm.channel_family_id ) )" PL/pgSQL function "delete_server_channels" line 2 at SQL statement SQL statement "SELECT rhn_channel.delete_server_channels( $1 )" PL/pgSQL function "delete_server" line 24 at PERFORM 2014-11-25 08:41:50.373 EST STATEMENT: select * from delete_server($1) as result 2014-11-25 08:41:50.558 EST ERROR: current transaction is aborted, commands ignored until end of transaction block 2014-11-25 08:41:50.558 EST STATEMENT: select 'c3p0 ping' from dual 2014-11-25 08:41:50.632 EST ERROR: current transaction is aborted, commands ignored until end of transaction block 2014-11-25 08:41:50.632 EST STATEMENT: UPDATE rhnSsmOperation SET status = $1, modified = current_timestamp WHERE id = $2 AND user_id = $3 2014-11-25 08:41:50.633 EST ERROR: current transaction is aborted, commands ignored until end of transaction block 2014-11-25 08:41:50.633 EST STATEMENT: select 'c3p0 ping' from dual So this bug was for the deadlock that could happen when you were registering with activation keys / no activation keys. However it's very timely that you were able to reproduce the deadlock with this fix while deleting systems, since we have another bug open right now for that issue and I just passed this patch off to them as a hotfix. I'll comment in bug 1159914 that the hotfix probably doesn't work and more investigation is needed. Verified with packages spacewalk-java-2.3.8-81.el6sat.noarch spacewalk-schema-2.3.2-10.el6sat.noarch with postgresql 9.2 With the release of Red Hat Satellite 5.7 on January 12th 2015 this bug is being moved to a Closed Current Release state. The Satellite 5.7 GA Errata: - https://rhn.redhat.com/errata/RHSA-2015-0033.html Satellite 5.7 Release Notes: - https://access.redhat.com/documentation/en-US/Red_Hat_Satellite/5.7/html-single/Release_Notes/index.html Satellite Customer Portal Blog announcement for release: - https://access.redhat.com/blogs/1169563/posts/1315743 Cliff NOTE: This bug has not been re-verified (moved to RELEASE_PENDING) prior to release. We assume that the bug has indeed been fixed and not regressed since we initially verified it. Please re-open in the future if needed. Hi, Determining the root cause of the deadlock and a workaround or hotfix is urgently needed. We did the following to reproduce: 1) Ran latest satellite 5.7 with latest packages all updated 2) Run external postegres database 3) Upon restarting postgres and satellite. The first deadlock is seen. Wed Mar 16 09:56:12 CDT 2016 blocked_pid | blocked_user | blocking_statement | blocking_duration | blocking_pid | blocking_user | blocked_statement | blocked_duration -------------+--------------+------------------------------------------------------------------------------+-------------------+--------------+--------------- 29915 | satadmin | SELECT rhn_channel.update_family_counts(1020, 2) | 00:00:00.542909 | 30147 | satadmin | SELECT rhn_channel.subscribe_server(1000330104, This appears to match with this bug, and customer is still getting this error. Thanks. There's some misunderstanding. This Bug has been fixed, properly QAed and shipped as part of Satellite 5.7 GA. We're not reopening bugs that have been shipped. Please, open a separate bug, if needed. I'm closing this bug back with CURRENTRELEASE. |