Bug 601792
Summary: | Down agent not syncing inventory properly on startup when uninventorying platform | ||||||
---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Jay Shaughnessy <jshaughn> | ||||
Component: | Agent | Assignee: | RHQ Project Maintainer <rhq-maint> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 1.4 | CC: | mazz | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-04-04 15:02:39 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jay Shaughnessy
2010-06-08 16:02:17 UTC
for the record, I remember many moons ago I had code that specifically checked if the platform resource was uninventoried and if so it had to do some special things. I can't remember where that was, but I do remember this at one point working (that is, the agent ends up being aware that the platform was uninventoried and it needs to consider everything NEW again). Created attachment 422282 [details]
command-trace.log
I uninventoried the platform, confirmed all resources are out of the DB and then restarted the agent. I turned on comm tracing and I uploaded the trace log (command-trace.log). Notice that the messages to go to the server are as follows (ignoring the identify/ping polling messages):
1) connectAgent
2) registerAgent
3) getLatestPlugins
4) getFailoverList
5) mergeAvailabilityReport
6) mergeInventoryReport
The merge of the avail report was reported as a success, even though I saw an NPE in the server log:
12:11:48,221 INFO [DiscoveryServerServiceImpl] Error processing availability report from [localhost]: javax.ejb.EJBException:java.lang.NullPointerException -> java.lang.NullPointerException:null
Which I think is to be expected - I seem to recall avail report processes rarely ever shows as an overt failure to the agent.
But clearly, the inventory isn't synced at that point. The inventory report does show a failure in the comm trace log as well as the server log - again, the inventory looks to not be synced.
I thought one of these initial messages got a inventory-sync object as a response so the agent can quickly sync up as soon as possible. Probably need to talk to ips or joseph about this.
It seems that child resources can successfully be uninventoried and that the agent syncs correctly in that case, on startup. So, the problem case is unlikely. Basically an agent would have to be down, the entire platform uninventoried, and then the agent would have to be brought up again (without --clean). Typically a platform uninventory is performed on a dead platform. The chances of the agent being brought up again is probably small. Given the unlikely scenario and the fact that there is a workaround, I'm going to drop the priority/severity. Here's my thoughts and findings. The issue here is a a combination of two things: 1) the agent has been shutdown during the uninventory 2) the PLATFORM itself is being uninventoried I've seen where things work (the agent can sync properly) if the agent is up OR if you uninventory something other than the platform. BUT if the agent is down and you uninventory the platform, trying to restart the agent unclean (that is, without --cleanconfig or --purgedata) the sync fails, the avail/inventory reports getting into the server from the agent cause errors. I do not think this is a major problem because of the following: If you are uninventorying the platform, you are probably doing it for one of two reasons: 1) you really don't want to manage that platform anymore, in which case its moot that the agent won't be able to start up unclean because you don't want to run the agent anyway! 2) you are cleaning out the inventory so you can "refresh" it by starting anew. In which case, you probably will want to (or at least won't feel its a hardship) to start the agent clean as well (thus you start with fresh inventory on both server and agent). Thus, in this case you will be starting the agent with --cleanconfig or --purgedata and this is OK and works. I have a feeling this never worked, but Jay seems to think it did work at one point. Just because I'm curious, I will try on our previous release to see what happens. But regardless, I do not think this is a major issue because, as I mention above, uninventorying the platform is a major deal and if you are doing that, you probably will want to restart the agent clean anyway (or you don't want to run the agent every again anyway). I just tried on a previous release (jon 2.3.1 to be exact) and this problem still occurs. I think this never really worked before. Here's the server logs (and note nothing showed up in the auto-discovery queue so I can't import the platform again, same as what happens with the latest code): 13:09:38,125 INFO [DiscoveryServerServiceImpl] Processed AV:[localhost][33][full] - need full=[false] in (32)ms ... 13:09:44,663 ERROR [DiscoveryServerServiceImpl] Fatal error occurred during merging of inventory report from agent [Agent[id=0,name=localhost,address=null,port=0,remote-endpoint=null,last-availability-report=null]]. javax.ejb.EJBTransactionRolledbackException: No entity found for query changing the subject line - I do not believe this is a regression and its specifically only occuring when uninventorying the platform itself. Triaged 21-Sept Closing, there has been a ton of work in this area and this is likely obsolete. |