Bug 1093265

Summary: Database deadlock errors when attempting to apply alert definitions updates
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: DatabaseAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: jshaughn, lkrejci, myarboro
Target Milestone: ER02Keywords: Triaged
Target Release: JON 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
If a user made a change to an alert template, saved it, realized that something was missed, made a second change to the same alert template and then resaved it, a database deadlock error could occur if the updates followed in quick succession. The fix adds a longer button unavailability timeout to prevent concurrent updates.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-11 14:05:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Excerpt from server log showing deadlock messages/exceptions/stacks none

Description Larry O'Leary 2014-05-01 05:10:51 UTC
Created attachment 891392 [details]
Excerpt from server log showing deadlock messages/exceptions/stacks

Description of problem:
Dead lock errors are reported resulting in database failures being reported.

From the stack trace (in the log excerpt) it appears these happen when one or more alert definitions are being updated by an alert template.

Caused by: java.sql.BatchUpdateException: ORA-00060: deadlock detected while waiting for resource


DBAs analysis provided the following report:


Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning and Real Application Testing options
System name:    Linux
Node name:      nummela
Release:        2.6.32-358.6.2.el6.x86_64
Version:        #1 SMP Tue May 14 15:48:21 EDT 2013
Machine:        x86_64
Redo thread mounted by this instance: 1
Oracle process number: 232
Unix process pid: 49078, image: oracle@nummela


*** 2014-04-24 13:29:48.469
*** SESSION ID:(426.30254) 2014-04-24 13:29:48.469
*** CLIENT ID:() 2014-04-24 13:29:48.469
*** SERVICE NAME:(SYS$USERS) 2014-04-24 13:29:48.469
*** MODULE NAME:(JDBC Thin Client) 2014-04-24 13:29:48.469
*** ACTION NAME:() 2014-04-24 13:29:48.469



*** 2014-04-24 13:29:48.469
DEADLOCK DETECTED ( ORA-00060 )

[Transaction Deadlock]

The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TM-0019b065-00000000       232     537    SX             98      73    SX   SSX
TM-0019b0cb-00000000        98      73    SX            232     537    SX   SSX

session 426: DID 0001-00E8-01508E45     session 62: DID 0001-0062-015559F7
session 62: DID 0001-0062-015559F7      session 426: DID 0001-00E8-01508E45

Rows waited on:
  Session 426: no row
  Session 62: no row

----- Information for the OTHER waiting sessions -----
Session 62:
  sid: 62 ser: 1049 audsid: 2329692 user: 109/JON
    flags: (0x1100041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 87 O/S info: user: orac, term: UNKNOWN, ospid: 38349
    image: oracle@nummela
  client details:
    O/S info: user: jbossadm, term: unknown, ospid: 1234
    machine: jboss-on-01.example.com program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
  current SQL:
  delete from RHQ_CONFIG where id=:1

----- End of information for the OTHER waiting sessions -----

Information for THIS session:

----- Current SQL Statement for this session (sql_id=gc2srj09d9kss) -----
delete from RHQ_CONFIG where id=:1

*** 2014-04-24 13:29:48.567
Attempting to break deadlock by signaling ORA-00060

Version-Release number of selected component (if applicable):
3.2.0.GA

Additional info:
This JBoss ON system is made up of two servers. At the time of the deadlock, server 02 did not report any errors and did not appear to be attempting any database purge or other jobs.

Comment 1 Jay Shaughnessy 2014-05-07 18:30:14 UTC
Although probably not the issue given the error we're seeing, all Oracle installs for 3.2.0 should apply:

  <no-tx-separate-pools>true</no-tx-separate-pools>

To their RHQDS declaration in the standalone-full.conf.  This is fixed with the application of the 3.2.1 CP.

I'd recommend making this change and re-evaluating.

Comment 2 Jay Shaughnessy 2014-05-07 19:03:00 UTC
One other thought is that I don't think we prevent the GUI updating a template in quick succession.  Maybe not even that quick, just while the prior update is still in progress.  The log almost looks like 3 very quick invocations of updateTemplate were invoked...

Interestingly, the failure does not seem to be while applying the template changes to the potentially large number of affected resources, but rather while updating the template itself.

Also, if this is easily repeated in the errant environment, try turning on debug level server logging to gather some extra information.

Comment 3 Larry O'Leary 2014-05-08 16:38:18 UTC
This issue occurs when the same alert template is updated more then once within a short period of time. In other words:
 - make a change to an alert template
 - save it
 - realize you missed something and make a second change to the same alert template
 - save it

It isn't clear how updates are persisted to the database but it is clear that the same template can be updated while the first update is not yet saved. This results in a database deadlock that will prevent the update action from completing.

Comment 5 Jay Shaughnessy 2014-05-16 16:41:23 UTC
I think the solution here is to disable the UI button until the prior request is completed.

Comment 8 Jay Shaughnessy 2014-08-01 19:22:32 UTC
master commit f5bf43c71e26bd98d88a55686bf32487580d806a
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Aug 1 15:20:42 2014 -0400

    Added some longer button disablement to avoid concurrent updates.
    If this "cheap" fix isn't sufficient we'd have to look at something
    server-side, and more substantial.

Comment 9 Lukas Krejci 2014-08-29 16:52:49 UTC
release/jon3.3.x:
commit c015d106cc97ab3e95775cf61b0f220a8ff63c43
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Aug 1 15:20:42 2014 -0400

    [1093265] Added some longer button disablement to avoid concurrent updates.
    If this "cheap" fix isn't sufficient we'd have to look at something
    server-side, and more substantial.
    
    (cherry picked from commit f5bf43c71e26bd98d88a55686bf32487580d806a)
    Signed-off-by: Lukas Krejci <lkrejci>

Comment 10 Simeon Pinder 2014-09-03 20:32:01 UTC
Moving to ON_QA as available for test with the following brew build:
https://brewweb.devel.redhat.com//buildinfo?buildID=381194