Bug 1141764 - RHQ is broken on oracle - ORA-02049: timeout
Summary: RHQ is broken on oracle - ORA-02049: timeout
Keywords:
Status: VERIFIED
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server, Database
Version: 4.13
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1141969
TreeView+ depends on / blocked
 
Reported: 2014-09-15 12:12 UTC by Filip Brychta
Modified: 2022-03-31 04:28 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
: 1141969 (view as bug list)
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)
server log (996.10 KB, application/octet-stream)
2014-09-16 08:23 UTC, Filip Brychta
no flags Details

Description Filip Brychta 2014-09-15 12:12:29 UTC
Description of problem:
This issue was discovered by nightly automation where bundle, drift and some operation tests failed because of ORA-02049: timeout. There are no problems when running automation with postgresql.

Last succesfull automation run on oracle was on rhq build dcd5159 so this issues is probably caused by commit 2d48aa4

Not sure which exact step is causing this problem but here is what I did to reproduce it manually:
1 - deploy a few bundles to group of platforms
2 - wait ~ 30 minutes

Result:
Step 1 was working without errors but after ~ 30 minutes ORA-02049: timeout exceptions were visible in server.log even without any action.

Searching for better repro steps..

Version-Release number of selected component (if applicable):
Version :	
4.13.0-SNAPSHOT
Build Number :	
58023b7

How reproducible:
2/2

Comment 4 Filip Brychta 2014-09-16 08:23:02 UTC
This issue is not related to bundles directly. Failed bundles tests are result of already "broken" RHQ server. Here is a scenario which breaks the RHQ:
1- have a rhq with 2 agents, remote agent is monitoring EAP6
2- import all resource
3- waint until all resources are UP
4- uninventory eap6 standalone resource
5- inventory it again
6- wait for a while

Result:
First 'ORA-02049: timeout' messages are thrown to server.log.
From this point the RHQ is broken and subseqeuent operations (schedule some opreation, deploy bundle...) hit ORA-02049: timeout exception as well even though those operations worked earlier


First timeout exceptions are caused by
03:54:55,221 ERROR [org.rhq.enterprise.server.resource.ResourceManagerBean] (RHQScheduler_Worker-5) Bulk named query delete error for 'ResourceOperationHistory.deleteByResources' for [10101]:

Important point here is that resource with given resourceId '10101' doesn't exist.

See complete server.log (DEBUG level) and search for 'ORA-02049: timeout'

Comment 5 Filip Brychta 2014-09-16 08:23:46 UTC
Created attachment 937907 [details]
server log

Comment 6 Filip Brychta 2014-09-16 11:57:43 UTC
I reproduced this issue even with DISTRIBUTED_LOCK_TIMEOUT=300 (default is 60)

Comment 7 Jay Shaughnessy 2014-09-16 20:04:47 UTC
Filip, this should be fixed with master commit ed8b9aa1192066945fdd843e33ca3d7a059a23cf.  If so please close, otherwise let me know, thanks, Jay

Comment 8 Filip Brychta 2014-09-17 07:05:54 UTC
Manual test is Ok. Waiting for automation results..

Comment 9 Filip Brychta 2014-09-17 13:21:07 UTC
Verified on
Version :	
4.13.0-SNAPSHOT
Build Number :	
afb91cb


Note You need to log in before you can comment on or make changes to this bug.