Bug 815964

Summary: Monitoring Probes gives a error deadlock message (ORA-00060) RHNSAT.RHN_SYNCH_PROBE_STATE
Product: [Community] Spacewalk Reporter: Stephen Herr <sherr>
Component: ServerAssignee: Stephen Herr <sherr>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.8CC: ahumbe, cperry, ezivanov, mmello, plambri, pmutha, xdmoon
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: spacewalk-web-1.8.34-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 674071 Environment:
Last Closed: 2012-11-01 16:21:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 653464, 674071    
Bug Blocks: 871344    

Description Stephen Herr 2012-04-24 22:04:38 UTC
+++ This bug was initially created as a clone of Bug #674071 +++

Description of problem:

When running a lot of monitoring probes with large tables (many data), RHN probes show a deadlock error message. 

Version-Release number of selected component (if applicable):
Red Hat Network Satellite 5.3 

How reproducible:
100%

Steps to Reproduce:
1. Create a lot of probes
2. Run
  
Actual results:

RHN Probes shows deadlock messages

Expected results:

RHN Probes run as expected

Additional info:

====== ISSUE 
Customer is continuing to see "deadlock detected" messages from Taskomatic:

***
INFO   | jvm 1    | 2011/01/14 08:17:03 | Caused by: 
INFO   | jvm 1    | 2011/01/14 08:17:03 | java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource
INFO   | jvm 1    | 2011/01/14 08:17:03 | ORA-06512: at "RHNSAT.RHN_SYNCH_PROBE_STATE", line 4
INFO   | jvm 1    | 2011/01/14 08:17:03 | ORA-06512: at line 1
INFO   | jvm 1    | 2011/01/14 08:17:03 | 
INFO   | jvm 1    | 2011/01/14 08:17:03 |       ... 18 more
INFO   | jvm 1    | 2011/01/14 08:17:03 | 2011-01-14 08:17:03,145 [DefaultQuartzScheduler_Worker-0] ERROR com.redhat.rhn.taskomatic.core.SchedulerKernel - org.quartz.JobExecutionException: com.redhat.rhn.common.db.WrappedSQLException: ORA-00060: deadlock detected while waiting for resource
INFO   | jvm 1    | 2011/01/14 08:17:03 | ORA-06512: at "RHNSAT.RHN_SYNCH_PROBE_STATE", line 4
INFO   | jvm 1    | 2011/01/14 08:17:03 | ORA-06512: at line 1
INFO   | jvm 1    | 2011/01/14 08:17:03 |  [See nested exception: com.redhat.rhn.common.db.WrappedSQLException: ORA-00060: deadlock detected while waiting for resource
INFO   | jvm 1    | 2011/01/14 08:17:03 | ORA-06512: at "RHNSAT.RHN_SYNCH_PROBE_STATE", line 4
INFO   | jvm 1    | 2011/01/14 08:17:03 | ORA-06512: at line 1


===== Analisys

Looking the oracle traces logs we could identify SQL which is causing the ORA-00060 deadlock. 

DEADLOCK DETECTED ( ORA-00060 )
  44   Current SQL Statement:
  45 
  46       UPDATE PROBE_STATE
  47       SET    LAST_CHECK = TO_DATE(:p1, 'YYYY-MM-DD HH24:MI:SS'),
  48              STATE = :p2,
  49              OUTPUT = :p3
  50       WHERE  SCOUT_ID = :p4
  51       AND    PROBE_ID = :p5
  52 
  53 End of information on OTHER waiting sessions.
  54 Current SQL statement for this session:
  55 UPDATE RHN_PROBE_STATE SET STATE = 'PENDING', OUTPUT = 'Awaiting update' WHERE LAST_CHECK < ( SELECT ( SYSDATE - GREATEST(15 / 60 / 24, ((3 * RHN_DEPLOYED_PROB     E.CHECK_INTERVAL_MINUTES) / 60 / 24))) FROM RHN_DEPLOYED_PROBE WHERE RHN_DEPLOYED_PROBE.RECID = RHN_PROBE_STATE.PROBE_ID )

========================================

Querying customer DB internally 

SQL> SELECT probe_type as "Probe Type", count(*) as "Total Probes" from rhn_probe GROUP BY probe_type;

Probe Type	Total Probes
--------------- ------------
check			3237
suite			 124

SQL>

Comment 1 Stephen Herr 2012-04-24 22:07:45 UTC
fixed in Spacewalk master: 6f79cf500064b4bf0d5055fef4756351da912ad3

Comment 2 Stephen Herr 2012-09-05 15:09:30 UTC
and b7b676e05053795e7c8202bd7484a51cc49781c6

Comment 3 Stephen Herr 2012-09-05 15:33:10 UTC
and 6da40f3e97b78d807546dca6b315e2c05b80ab04

Comment 4 Jan Pazdziora 2012-10-30 19:25:22 UTC
Moving ON_QA. Packages that address this bugzilla should now be available in yum repos at http://yum.spacewalkproject.org/nightly/

Comment 5 Jan Pazdziora 2012-11-01 16:21:10 UTC
Spacewalk 1.8 has been released: https://fedorahosted.org/spacewalk/wiki/ReleaseNotes18