Bug 846353

Summary: ORA-02049 during upgrade from JON-3.1.0.GA to JON-3.1.1.ER1 with Oracle
Product: [Other] RHQ Project Reporter: Filip Brychta <fbrychta>
Component: Core ServerAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: urgent Docs Contact:
Priority: high    
Version: JON 3.1.1CC: hrupp, jsanda, jshaughn, mazz
Target Milestone: ---   
Target Release: JON 3.1.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 848384 (view as bug list) Environment:
Last Closed: 2013-09-03 15:02:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 848384    
Attachments:
Description Flags
complete logs none

Description Filip Brychta 2012-08-07 14:35:39 UTC
Created attachment 602762 [details]
complete logs

Description of problem:
There is a following exception: 

'java.sql.BatchUpdateException: ORA-02049: timeout: distributed transaction waiting for lock'

in server log during upgrade from JON-3.1.0.GA to JON-3.1.1.ER1.
Installation takes much longer than usual.
 

Version-Release number of selected component (if applicable):
JON-3.1.1.ER1

How reproducible:
3 of 3

Steps to Reproduce:
1. JON-3.1.0.GA with eap plugin installed and running, EAP-5.1.2 running (profile ALL), RHQ agent and eap imported
2. follow upgrade procedure http://docs.redhat.com/docs/en-US/JBoss_Operations_Network/3.1/html/Installation_Guide/upgrading.html 
note: plugins were copied before JON-3.1.1.ER1 was started - 
cp jon-plugin-pack-eap-3.1.1.ER1/* jon-server-3.1.1.ER1/plugins/
3.check logs
  
Actual results:
'java.sql.BatchUpdateException: ORA-02049: timeout: distributed transaction waiting for lock' exception in server log

Expected results:
no exceptions

Additional info:
complete logs attached

Comment 2 Heiko W. Rupp 2012-08-10 13:14:54 UTC
So to recap, you had a JON 3.1.0.GA server set up and then tried to upgrade to a 3.1.1.ER1.

Which plugins were installed on 3.1.0? Did you on upgrade load the plugin packs before starting the 3.1.1 server?

Was an as7/eap6 in inventory of the 3.1.0 server?

Comment 3 Filip Brychta 2012-08-10 14:36:56 UTC
(In reply to comment #2)
> So to recap, you had a JON 3.1.0.GA server set up and then tried to upgrade
> to a 3.1.1.ER1.
> 
Yes

> Which plugins were installed on 3.1.0? 
jon-plugin-pack-eap-3.1.0.GA

>Did you on upgrade load the plugin
> packs before starting the 3.1.1 server?
yes

> Was an as7/eap6 in inventory of the 3.1.0 server?
EAP-5.1.2 was running and imported to inventory

More accurate:
1- JON-3.1.0.GA with jon-plugin-pack-eap-3.1.0.GA set up, EAP-5.1.2 runnig with 'all' profile (./run.sh -c all) and imported to inventory, the rhq agent resource was imported as well
2- the rhq agent was prepared for upgrade (running in the background according to upgrade manual)
3- follow upgrade manual:
  - stop 3.1.0.GA server
  - cp plugins to $RHQ_SERVER_HOME/plugins (cp jon-plugin-pack-eap-3.1.1.ER1/* jon-server-3.1.1.ER1/plugins/)
  - start 3.1.1.ER1 server and finish upgrade

Comment 4 John Mazzitelli 2012-08-13 15:28:45 UTC
can I assume this setting in your rhq-server.properties is "1" ???

# The number of concurrent threads used to deploy plugins.
# Currently, it is not recommended to increase this value.
rhq.server.plugin-deployer-threads=1

This needs to be 1. Nothing larger.

Comment 6 Filip Brychta 2012-08-14 07:17:43 UTC
(In reply to comment #4)
> can I assume this setting in your rhq-server.properties is "1" ???
> 
> # The number of concurrent threads used to deploy plugins.
> # Currently, it is not recommended to increase this value.
> rhq.server.plugin-deployer-threads=1
> 
> This needs to be 1. Nothing larger.

Yes, i did not touch this property.

Comment 7 Jay Shaughnessy 2012-08-15 13:22:19 UTC
it d36c9fbdf488fbc6d7e8e6b31bad506e1d0d150d
or: Jay Shaughnessy <jshaughn>
:   Wed Aug 15 09:20:44 2012 -0400

[Bug 846353 - ORA-02049 during upgrade from JON-3.1.0.GA to JON-3.1.1.ER1 wit
After a lot of investigation it seems that the problem occurs occasionally
when we remove obsolete properties from resource configuration.  It does
not happen every time, on fact it's fairly rare, although for the same DB,
and the same plugin update, it is repeatable.  This makes it seem like the
locking issue is due mainly to unpredictable locking at the db level, and
likely the fact that we occasionally hit a page lock due to some other
prior update to the config table.  Since the config table stores so many
different types of data it's not obvious how we would identify the conflict.

The approach taken was to try and reduce possible conflict by increasing the
granularity of metadata update transactions.  Prior to this change we used a
single encompassing transaction for a plugin update, that means all types
were updated under one umbrella transaction.  Not one transaction, because
we already use nested transactions in several places, but using one umbrella
transaction increases the chance of that transaction holding a lock that
could affect a nested transaction.

We still maintain the umbrella transaction but this commit breaks it up
such that a nested transaction is used for the update of each resource
type in the plugin.

That means each type update will not hold any locks when it has completed.
This change seems to be working as the AS7 plugin now updates successfully.

Additionally:
- added some more INFO level logging to give some basic progress during a
  plugin update.
- added some more debug logging as well
- removed a bunch of unnecessary em.flush calls
- used the return value of some em.merge calls to ensure using the up to
  date entity.

Cherry pick of master c9eb53bb8dab7393d9b5383fbe5bab99088487ed.


Test Notes:
Upgrades of as many plugin plugins as possible from older versions to latest (4.5 versions) for oracle and postgres.  If possible, data in inventory will make the test even more robust, although it was not necessary for the original issue.

Comment 8 John Sanda 2012-08-22 05:50:27 UTC
Moving to ON_QA. The JON 3.1.1 ER3 build is available at https://brewweb.devel.redhat.com/buildinfo?buildID=230321.

Comment 9 Filip Brychta 2012-08-27 09:14:23 UTC
Verified on JON 3.1.1 ER3 for described scenario. More complex scenarios to do.

Comment 10 Heiko W. Rupp 2013-09-03 15:02:32 UTC
Bulk closing of old issues in VERIFIED state.