Bug 895743 - Back-port commit 3605ce3 for better logging and recovery attempt of stale resource situation similar to what was identified in bug 884338
Summary: Back-port commit 3605ce3 for better logging and recovery attempt of stale res...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Inventory
Version: JON 3.1.1
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: JON 3.2.0
Assignee: RHQ Project Maintainer
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On: 884338
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-15 22:12 UTC by Larry O'Leary
Modified: 2013-03-01 00:38 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-01 00:38:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 884338 0 medium CLOSED Agent's availability report is ignored due to bogus stale resource error 2021-02-22 00:41:40 UTC

Internal Links: 884338

Description Larry O'Leary 2013-01-15 22:12:09 UTC
Bug 884338 identified an issue in where resources were being treated as stale resources due to some kind of inventory sync issue or bug. However, there was not enough information to determine a root cause and after many hours of effort, the issue was not reproducible. However, during the investigation there were some improvements to logging that would help identify similar situations and also attempt to better handle those situations. Additionally, unit tests were added to cover some stale resource situations and to exercise this code.

We need to back-port commit 3605ce3398557277d8ddf4deb3ffaa83337b7c58 from master to the release branch so that the product can take advantage of these log, test, and recovery improvements. 



--- Additional comment from bug 884338 Jay Shaughnessy on 2012-12-18 10:09:57 EST ---


...

I have updated the code such that the error handling can tell whether we really have a stale resource problem, which is a rare but expected situation and can be resolved with an agent sync, or if this is an instance of the reported corruption.  In the latter case I have added code that logs differently and will attempt to repair the situation.  Unfortunately it will not help with determining the root cause.

I've committed the changes to master: 

commit 3605ce3398557277d8ddf4deb3ffaa83337b7c58
Author: Jay Shaughnessy <jshaughn>
Date:   Tue Dec 18 10:06:59 2012 -0500

I failed to reproduce this issue but:
- Add some better logging and some attempted repair code for this situation
- Add a couple more tests

Comment 1 Larry O'Leary 2013-02-08 15:57:42 UTC
We have had another case in where this issue has occurred in an upgraded environment. The issue was caused by resources that are in inventory not having a row in rhq_availability or not having a "latest" availability row in rhq_availability (end_time is null). 

I suggest that we also implement a database upgrade fix to handle situations where this can occur. The query which has worked to resolve this issue can be seen in knowledge solution 268103[1].


[1]: https://access.redhat.com/knowledge/solutions/268103

Comment 2 Jay Shaughnessy 2013-02-28 19:47:28 UTC
I don't think we need to backport given that 3.2 will be generated off of to-be-created branch off master and all of the changes are in master.  We may be able to close this. Asking Larry.

Comment 3 Larry O'Leary 2013-03-01 00:38:29 UTC
The reason it is here in a NEW state is due to comment 1. As the issue reported in the original bug has occurred several more times, the request was to get the query added to the db schema upgrade for 3.2. However, considering all of the confusion, I went a head and re-opened the original bug 884338 and added the same comment there as well as the duplicate bug 916373.

Considering those are covered there and the issue that introduced this commit is also being tested there, I will CLOSE this as NEXTRELEASE.


Note You need to log in before you can comment on or make changes to this bug.