Red Hat Bugzilla – Bug 846353
ORA-02049 during upgrade from JON-3.1.0.GA to JON-3.1.1.ER1 with Oracle
Last modified: 2013-09-03 11:02:32 EDT
Created attachment 602762 [details]
Description of problem:
There is a following exception:
'java.sql.BatchUpdateException: ORA-02049: timeout: distributed transaction waiting for lock'
in server log during upgrade from JON-3.1.0.GA to JON-3.1.1.ER1.
Installation takes much longer than usual.
Version-Release number of selected component (if applicable):
3 of 3
Steps to Reproduce:
1. JON-3.1.0.GA with eap plugin installed and running, EAP-5.1.2 running (profile ALL), RHQ agent and eap imported
2. follow upgrade procedure http://docs.redhat.com/docs/en-US/JBoss_Operations_Network/3.1/html/Installation_Guide/upgrading.html
note: plugins were copied before JON-3.1.1.ER1 was started -
cp jon-plugin-pack-eap-3.1.1.ER1/* jon-server-3.1.1.ER1/plugins/
'java.sql.BatchUpdateException: ORA-02049: timeout: distributed transaction waiting for lock' exception in server log
complete logs attached
So to recap, you had a JON 3.1.0.GA server set up and then tried to upgrade to a 3.1.1.ER1.
Which plugins were installed on 3.1.0? Did you on upgrade load the plugin packs before starting the 3.1.1 server?
Was an as7/eap6 in inventory of the 3.1.0 server?
(In reply to comment #2)
> So to recap, you had a JON 3.1.0.GA server set up and then tried to upgrade
> to a 3.1.1.ER1.
> Which plugins were installed on 3.1.0?
>Did you on upgrade load the plugin
> packs before starting the 3.1.1 server?
> Was an as7/eap6 in inventory of the 3.1.0 server?
EAP-5.1.2 was running and imported to inventory
1- JON-3.1.0.GA with jon-plugin-pack-eap-3.1.0.GA set up, EAP-5.1.2 runnig with 'all' profile (./run.sh -c all) and imported to inventory, the rhq agent resource was imported as well
2- the rhq agent was prepared for upgrade (running in the background according to upgrade manual)
3- follow upgrade manual:
- stop 3.1.0.GA server
- cp plugins to $RHQ_SERVER_HOME/plugins (cp jon-plugin-pack-eap-3.1.1.ER1/* jon-server-3.1.1.ER1/plugins/)
- start 3.1.1.ER1 server and finish upgrade
can I assume this setting in your rhq-server.properties is "1" ???
# The number of concurrent threads used to deploy plugins.
# Currently, it is not recommended to increase this value.
This needs to be 1. Nothing larger.
(In reply to comment #4)
> can I assume this setting in your rhq-server.properties is "1" ???
> # The number of concurrent threads used to deploy plugins.
> # Currently, it is not recommended to increase this value.
> This needs to be 1. Nothing larger.
Yes, i did not touch this property.
or: Jay Shaughnessy <email@example.com>
: Wed Aug 15 09:20:44 2012 -0400
[Bug 846353 - ORA-02049 during upgrade from JON-3.1.0.GA to JON-3.1.1.ER1 wit
After a lot of investigation it seems that the problem occurs occasionally
when we remove obsolete properties from resource configuration. It does
not happen every time, on fact it's fairly rare, although for the same DB,
and the same plugin update, it is repeatable. This makes it seem like the
locking issue is due mainly to unpredictable locking at the db level, and
likely the fact that we occasionally hit a page lock due to some other
prior update to the config table. Since the config table stores so many
different types of data it's not obvious how we would identify the conflict.
The approach taken was to try and reduce possible conflict by increasing the
granularity of metadata update transactions. Prior to this change we used a
single encompassing transaction for a plugin update, that means all types
were updated under one umbrella transaction. Not one transaction, because
we already use nested transactions in several places, but using one umbrella
transaction increases the chance of that transaction holding a lock that
could affect a nested transaction.
We still maintain the umbrella transaction but this commit breaks it up
such that a nested transaction is used for the update of each resource
type in the plugin.
That means each type update will not hold any locks when it has completed.
This change seems to be working as the AS7 plugin now updates successfully.
- added some more INFO level logging to give some basic progress during a
- added some more debug logging as well
- removed a bunch of unnecessary em.flush calls
- used the return value of some em.merge calls to ensure using the up to
Cherry pick of master c9eb53bb8dab7393d9b5383fbe5bab99088487ed.
Upgrades of as many plugin plugins as possible from older versions to latest (4.5 versions) for oracle and postgres. If possible, data in inventory will make the test even more robust, although it was not necessary for the original issue.
Moving to ON_QA. The JON 3.1.1 ER3 build is available at https://brewweb.devel.redhat.com/buildinfo?buildID=230321.
Verified on JON 3.1.1 ER3 for described scenario. More complex scenarios to do.
Bulk closing of old issues in VERIFIED state.