Bug 601792

Summary:

Down agent not syncing inventory properly on startup when uninventorying platform

Product:

[Other] RHQ Project

Reporter:

Jay Shaughnessy <jshaughn>

Component:

Agent

Assignee:

RHQ Project Maintainer <rhq-maint>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Mike Foley <mfoley>

Severity:

medium

Docs Contact:

Priority:

low

Version:

1.4

CC:

mazz

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

All

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-04-04 15:02:39 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
command-trace.log	none

Description Jay Shaughnessy 2010-06-08 16:02:17 UTC

If an agent is down it should sync its inventory properly on startup to ensure the server has not performed an uninventory while it was down.

This does not seem to be working anymore.

To reproduce:

Inventory a platform and some child resources.
Shut down the agent
Uninventory the platform
Start up the agent

The agent does not seem to realize that the resources should be deleted from its inventory.  It sends bogus reports for the phantom resources, generating server-side errors. And discovery -f will not produce anything for import in the discovery queue.

restarting the agent --clean wipes out the known inventory and should workaround the issue.  

There does not appear to be a sync issue when the agent is up.

Comment 1 John Mazzitelli 2010-06-08 16:48:50 UTC

for the record, I remember many moons ago I had code that specifically checked if the platform resource was uninventoried and if so it had to do some special things. I can't remember where that was, but I do remember this at one point working (that is, the agent ends up being aware that the platform was uninventoried and it needs to consider everything NEW again).

Comment 2 John Mazzitelli 2010-06-08 17:18:04 UTC

Created attachment 422282 [details]
command-trace.log

I uninventoried the platform, confirmed all resources are out of the DB and then restarted the agent. I turned on comm tracing and I uploaded the trace log (command-trace.log). Notice that the messages to go to the server are as follows (ignoring the identify/ping polling messages):

1) connectAgent
2) registerAgent
3) getLatestPlugins
4) getFailoverList
5) mergeAvailabilityReport
6) mergeInventoryReport

The merge of the avail report was reported as a success, even though I saw an NPE in the server log:

12:11:48,221 INFO  [DiscoveryServerServiceImpl] Error processing availability report from [localhost]: javax.ejb.EJBException:java.lang.NullPointerException -> java.lang.NullPointerException:null

Which I think is to be expected - I seem to recall avail report processes rarely ever shows as an overt failure to the agent.

But clearly, the inventory isn't synced at that point. The inventory report does show a failure in the comm trace log as well as the server log - again, the inventory looks to not be synced.

I thought one of these initial messages got a inventory-sync object as a response so the agent can quickly sync up as soon as possible. Probably need to talk to ips or joseph about this.

Comment 3 Jay Shaughnessy 2010-06-08 17:59:34 UTC

It seems that child resources can successfully be uninventoried and that the agent syncs correctly in that case, on startup.

So, the problem case is unlikely.  Basically an agent would have to be down, the entire platform uninventoried, and then the agent would have to be brought up again (without --clean).  Typically a platform uninventory is performed on a dead platform. The chances of the agent being brought up again is probably small.

Given the unlikely scenario and the fact that there is a workaround, I'm going to drop the priority/severity.

Comment 4 John Mazzitelli 2010-06-08 18:01:23 UTC

Here's my thoughts and findings.

The issue here is a a combination of two things:

1) the agent has been shutdown during the uninventory
2) the PLATFORM itself is being uninventoried

I've seen where things work (the agent can sync properly) if the agent is up OR
if you uninventory something other than the platform. BUT if the agent is down
and you uninventory the platform, trying to restart the agent unclean (that is,
without --cleanconfig or --purgedata) the sync fails, the avail/inventory
reports getting into the server from the agent cause errors.

I do not think this is a major problem because of the following:

If you are uninventorying the platform, you are probably doing it for one of
two reasons:

1) you really don't want to manage that platform anymore, in which case its
moot that the agent won't be able to start up unclean because you don't want to
run the agent anyway!

2) you are cleaning out the inventory so you can "refresh" it by starting anew.
In which case, you probably will want to (or at least won't feel its a
hardship) to start the agent clean as well (thus you start with fresh inventory
on both server and agent). Thus, in this case you will be starting the agent
with --cleanconfig or --purgedata and this is OK and works.

I have a feeling this never worked, but Jay seems to think it did work at one
point. Just because I'm curious, I will try on our previous release to see what
happens. But regardless, I do not think this is a major issue because, as I
mention above, uninventorying the platform is a major deal and if you are doing
that, you probably will want to restart the agent clean anyway (or you don't
want to run the agent every again anyway).

Comment 5 John Mazzitelli 2010-06-08 18:12:16 UTC

I just tried on a previous release (jon 2.3.1 to be exact) and this problem still occurs. I think this never really worked before. Here's the server logs (and note nothing showed up in the auto-discovery queue so I can't import the platform again, same as what happens with the latest code):

13:09:38,125 INFO  [DiscoveryServerServiceImpl] Processed AV:[localhost][33][full] - need full=[false] in (32)ms
...
13:09:44,663 ERROR [DiscoveryServerServiceImpl] Fatal error occurred during merging of inventory report from agent [Agent[id=0,name=localhost,address=null,port=0,remote-endpoint=null,last-availability-report=null]].
javax.ejb.EJBTransactionRolledbackException: No entity found for query

Comment 6 John Mazzitelli 2010-06-08 18:15:36 UTC

changing the subject line - I do not believe this is a regression and its specifically only occuring when uninventorying the platform itself.

Comment 7 Corey Welton 2010-09-24 12:58:33 UTC

Triaged 21-Sept

Comment 8 Jay Shaughnessy 2014-04-04 15:02:39 UTC

Closing, there has been a ton of work in this area and this is likely obsolete.