When a new version of an application is deployed to EAP through JON bundled deployments, they are deploying it as myapp-1.0.1.war where the previous version was myapp-1.0.0.war. The problem they are running into is JON still has the previous version inventoried, but it is showing as down or unavailable since it has been removed. Expectation is to get this older version removed from JON too and show only active and most recent version of deployed war.
See also Bug 970784, which deals with the need to explicitly run a discovery after bundle deployment and also Bug 1050014 (bundle deploy in domain mode)
While those are specifically targeted at EAP6, we should try to find a more general mechanism
The bundle system itself has no idea that it's replacing an inventoried resource. So, I don't really see any ability for the bundle deploy to perform this action.
Moreover, agent-side code does not have authority to uninventory resources. So even if we detected the dead resource I'm not sure what we would do about it.
We've had this same situation in different manifestations. It's basically the same as when the RHQ server itself gets upgraded. There is not a way to automatically delete the old RHQ Server resource (assuming it had been imported).
In general, we let users decide when to uninventory resources. Uninventorying a resource is a decision to release all of the collected data for that resource. That's not a decision we would typically want to make automatically.
Although I can understand the annoyance in this particular use case I don't think automatic removal is a good idea in general, but more significantly, I don't immediately see a way of doing it anyway.
We're talking about being able to identify a DOWN resource as a resource that should be automatically uninventoried. This seems more like a job for some sort of custom reaper script, run specifically for the issue at hand.
Actually I would rather go the way that the new resource, even with a different key, gets merged into the existing one, so that new = old based on the context.
We should put some marker in that timeline thingy so that the user knows about it (and if we have the audit subsystem in the future, also add it to audit).
And if we do that eap6 only, then we could unify the bundle deployment for standalone to follow the one via api for domain and this way perhaps identify what gets deployed and react accordingly.
A solution could also be a server side plugin, that takes a "pattern" (resource of type and with other matching pattern like context root trait) and goes periodically through inventory to check if there are matching resource pairs and acts accordingly.
If we had the back-channel, we could hook into those "resource added" events and perhaps even intercept them, but that is more a long term effort.
Larry will also investigate about the more exact semantics here. On the other hand we may even offer both semantics if we already have a list of map (type, match expression , decision) with decision being none(default),delete_old,merge
Lately I feel like I'm against everything, but I don't really like this merge resource idea. The "dead" resource would need to somehow be recursively applied to the "live" resource. This feels like a very heavy, difficult solution to a problem that is fairly rare, and mainly a nuisance. It would possible need to be applied manually, which would not make it any easier for the user than doing an uninventory.
The best way to keep the legacy data is to ensure that the resource is discovered the same way each time, and therefore is the same resource for different versions of the same app.
I think the real problem here is that the user does not like the way we do discovery for web apps in AS7. They would like to be able to deploy different versions of the same logical app, with a version identifier in the name, and have discovery realize it's the same resource. That means we need to come up with the same resource key, independent of version. That makes sense to me, and it's why we have a version field on resources. But looking at the AS7 representation of the apps, or maybe it's our representation of AS7 deployments, I can see why this may be an issue. The Deployment or Subdeployment already has the versioned name, I think, before we actually create the Web Runtime resource that actually represents the app.
I'll research more and try to find out if a different reskey/discovery mechanism is a viable approach here, but I'm pessimistic.
Otherwise, I'd recommend some sort of server-side reaper plugin for users that find it too tedious to uninventory the resources manually. If they care about metrics gathered for older versions, then they should just keep those resources around.
I feel there may always be combinations where the plugin's discovery mechanism paired with the user's environment will result in issues like the one described. In this instance it's the fact that EAP represents web deployments via name in its DMR. So, if the user insists on incorporating the version into the WAR name, the plugin's discovery mechanism has no real ability to "know" that they are logically the same app. Because the names are different as opposed to the name being the same and the version being different (note, a separate issue (I think) is that version is currently set to null in the plugin discovery).
We could try and enhance the discovery mechanism to strip out versions and so forth, such that the resulting resource keys remained the same. But it could be brittle, we'd always be chasing the problem after it occurs, and a change in the user's approach or Wildfly's representation could again cause a problem. Furthermore, knowing they were logically the same doesn't immediately solve the problem, we'd still need to add new features to handle discovery changing resource names etc. Moreover, it may end up not even being a good thing to do, the app is in fact different from the prior version and probably should be tracked as a separate/new resource.
So, the problem seems to boil down to an annoyance. That, for some resource types some users may not want to see "dead" resources, despite the history they offer. And instead would like to see them automatically get uninventoried.
There are maybe a couple of ways to do this.
1) Via Some sort of Availability trigger
Today it's not possible to distinguish a DOWN resource from one that is physically missing, although perhaps for certain resource types support like this could be useful. For example, for an EAP deployment, it may be possible to determine the difference between "missing" and "down" (This is conjecture, I'm not sure if it is or not, but it seems maybe possible). One idea on how this could work:
- Add a new AvailabilityType like DEAD. DEAD would be reported instead of DOWN if the resource did not exist agent-side. Server-side, we'd support a system property like "rhq.server.resource-types.purge-dead-resources" that set the types for which dead resources would be automatically uninventoried. If enabled for the type we'd uninventory it, otherwise we'd convert to DOWN and store the availability. Or, we could do the reverse and have a property which prevented the uninventory. In this case uninventory would happen by default and no action would be needed by the user unless they wanted to keep dead resources (for the history, perhaps for comparison).
For example, when we upgrade an RHQ Server the old one is logically DEAD. If we found that the install directory was gone we could report it as DEAD as opposed to DOWN, and wipe it from inventory.
2) Via some sort of server-side reaper
We'd still need to support an environment variable, or some way of defining what needs to be purged. In this case "rhq.server.resource-types.purge-dead-resources". would have to match a type name with an expression: For example "JBossAS7:Deployment:my-app.*". For those types we'd have to query resources of the same parent whose names matched the regex pattern. Then, uninventory the DOWN resources assuming there was one in the set that was UP".
This seems horribly complex and needs a lot from the user. But basically if you knew you were deploying my-app-1.0 and then my-app-1.1 *and* removing my-app 1.0 agent-side. Then this would detect that my-app-1.0 should go away. It would probably happen as part of data purge.
Although I think this feature is pretty silly, and users should just manually uninventory what they don't want, if we have to do it I'd certainly lean towards option 1, or something better if someone can come up with it.
having a new avail type of DEAD sounds interesting. I would pursue that idea a bit more and see if you can flesh out more details and see how hard it would be to implement that and if it really solves the problem.
In Comment 4 I wrote, "We could try and enhance the discovery mechanism to strip out versions and so forth, such that the resulting resource keys remained the same. But it could be brittle, we'd always be chasing the problem after it occurs, and a change in the user's approach or Wildfly's representation could again cause a problem". But maybe this shouldn't be dismissed so quickly.
Although this is could be considered a general issue, the main complaint is about EAP Deployments. Particularly those being [bundle] provisioned automatically to new versions, and generated with standard maven version extensions. It may be that we should provide more of a point solution in our AS7 (maybe also AS5?) plugin discovery code that:
1) trims versions from the resource key and the resource name
2) uses that version to set the resource version
So, the resource name and, more importantly, the resource key, would not include the version. There are some questions/complications:
1) We'd still need to be able to assemble the proper address to make DMR requests.
2) We'd likely need to handle resource upgrade to change existing names/keys for existing Deployments,
I'm going to look into this further before looking at the more complicated feature described in Comment 4. That type of enhancement may not be necessary if this "point" change is sufficient.
I've created Pull Request 26 for review:
On further thought, I'm not sure any resource upgrade work is warranted here. The next versioned deployment would get the normalized resource key and after that it would work as desired.
A new RFE for option 1 in Comment 4 has been created as Bug 1093822.
The PR (https://github.com/rhq-project/rhq/pull/26) has been merged into master. Not yet setting MODIFIED pending demo and review...
The work here has been completed. Further issues should likely be brought up as new BZs.
Heiko Rupp <firstname.lastname@example.org> updated the status of jira JON3-42 to Resolved
master commit dec8bae46446d4cde46fe13ed76585c2cfc164b8
Author: Jay Shaughnessy <email@example.com>
Date: Fri Jun 27 16:18:35 2014 -0400
Adding one more thing to this feature, prevent discovery of siblings
resolving to the same resource key. In the somewhat unlikely case that
two distinct sibling deployments resolve down to the same logical
deployment, don't let it get past discovery. For example, if the user
has app-1.0.war and app-2.0.war and these are *really* different apps (and
they would probably have to be since EAP would stop deployment if they
had the same context). In this case both would be seen as app.war, and
that is a problem on the RHQ side. In this sutuation generate an
agent log warning that hopefully helps a user resolve the issue.
Note that resource upgrade already prevents an upgrade of siblings with
the same key, so this is an analogous change.
Actually, the commit in Comment 14 should have been for Bug 1112744.
master commit 697a1313eaed37b8a4c16f192680c3461795306b
Author: Jay Shaughnessy <firstname.lastname@example.org>
Date: Fri Jul 11 10:07:43 2014 -0400
I'm an idiot. Fixing stupid regression.
*** Bug 1117738 has been marked as a duplicate of this bug. ***
Moving to ON_QA as available to test with brew build of DR01: https://brewweb.devel.redhat.com//buildinfo?buildID=373993
Moving back to ASSIGNED for ER03 due to Bug 1136488.
QE work is documented here.
There are no dependent BZs ... this feature is "TEST COMPLETE"