Bug 887411
Summary: | committing/ignoring/unignoring resource causes agent sync to purge all resources | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | Larry O'Leary <loleary> | ||||||
Component: | Inventory | Assignee: | Larry O'Leary <loleary> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | JON 3.1.1 | CC: | fbrychta, mazz | ||||||
Target Milestone: | ER01 | ||||||||
Target Release: | JON 3.2.0 | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 891951 892780 (view as bug list) | Environment: | |||||||
Last Closed: | 2014-01-02 20:34:40 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 891951 | ||||||||
Bug Blocks: | 892780 | ||||||||
Attachments: |
|
Description
Larry O'Leary
2012-12-14 23:53:57 UTC
Created attachment 663870 [details]
Excerpt from agent log showing freshly imported resource with bad config
Has this behaviour changed in any recent releases? Targeting at 312 for triage i see this on JON 3.1.2 ER5. Note that I'm not sure how bad this is - it seems to happen only if the plugin config is bad on a newly imported server. So, yes, a full sync is requested, but I assume once the resource is in inventory, it won't keep happening. I'll double check that. However, even if it DOES keep happening, why keep a resource in inventory with bad plugin config anyway? You'll want to fix that - and once you get a good connection and the resource can be managed successfully, things should be fine. Note, I restarted the agent, put in breakpoints at appropriate locations and looked at the logs - I do not see this happening again. So it looks like this might only happen when you first import the resource. Even if I keep the plugin config invalid (the resource still shows red), I don't see this resync happen on restart. The problem with this is that resources are imported into inventory on a continuous basis. Each time, resulting in a complete re-sync of agent inventory. During the re-sync, other operations will fail. Additionally, the re-sync causes templates to be lost as indicated in Bug 884593. Don't think of this as the happy path of importing a single resource. Think of this in the sense of a production environment where things are automated and handled by remote API calls on a batch of resources or agents. Configuration gets applies to newly imported resources after they have been imported. Additionally, one can not control the EAP instance being up or down when the resource is picked up from a discovery queue via such automation. This looks like its bad behavior and has been in the code for a long time. And it is not related to the resource not being able to be started - I see this happen if you just import the platform (to get the initial inventory) and then import any single server resource (I tried with the RHQ Agent resource itself and see it happen). The entire agent side inventory is cleared out no matter what is committed. Starting from the UI code (ResourceGWTServiceImpl.importResources), we can trace the call chain pretty easily down into the remote agent call into InventoryManager.synchronizeInventory and finally into InventoryManager.purgeObsoleteResources DiscoveryBossBean.importResources DBB.checkStatus DBB.updateInventoryStatus DBB.scheduleAgentInventoryOperationJob ...quartz job triggers... DBB.updateAgentInventoryStatus(String,String) DBB.updateAgentInventoryStatus(List,List) ...remote call into agent... InventoryManager.synchronizeInventory ... InventoryManager.purgeObsoleteResources You will notice from the very top of that call chain (that is, from the UI on down), only a single resource is passed around (the resource being committed). But when you get to that last method listed above (IM.purgeObsoleteResources), that method apparently assumes its argument "Set<String> allUuids" contains all uuids from a full sync report (that is, all the uuids that the server actually has in inventory). But for the case when you manually commit a single resource, that's not the case - allUuids contains all the UUIDs from the sync report alright, but that report only has a single resource in it! So in the end, this means that purge method removes all resources from inventory (because it ends up removing the platform resource, too, since it isn't in allUuids). After this, it corrects itself when the agent re-syncs with the server. i think I have a patch for this. i'll check in after some more brief testing, but initial test shows it working correctly. Created attachment 665806 [details]
proposed fix for the problem
attaching proposed patch to fix the issue
git commit to master: d5564f3562ee960115cc533f029521000c870f45 note that the bug would have also occurred whenever you ignore or unignore a resource from the discovery queue (in addition to committing a resource). I adjusted the title of this bugzilla issue to reflect that. this is only in master, not in any other branch. not sure what status this issue should be in. but the issue is thought to be fixed with the commit to master and should be QA'ed. This missed the JBoss ON 3.1.2 code-freeze/cut-off so is being moved to 3.2. Committed to master: http://git.fedorahosted.org/cgit/rhq/rhq.git/diff/?id=d5564f3562ee960115cc533f029521000c870f45 commit d5564f3562ee960115cc533f029521000c870f45 Author: John Mazzitelli <mazz> Date: Wed Dec 19 12:39:24 2012 -0500 [BZ 887411] don't uninventory everything just because we commited some top level server As this is MODIFIED or ON_QA, setting milestone to ER1. Verified on Version: 3.2.0.ER4 Build Number: e413566:057b211 |