Bug 534279 (RHQ-1090)
Summary: | have plugins deployed in database, not filesystem | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> |
Component: | Core Server | Assignee: | John Mazzitelli <mazz> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Pavel Kralik <pkralik> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | unspecified | CC: | jweiss, mvecera |
Target Milestone: | --- | Keywords: | Improvement |
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://jira.rhq-project.org/browse/RHQ-1090 | ||
Whiteboard: | |||
Fixed In Version: | 1.2 | Doc Type: | Enhancement |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 534577 | ||
Bug Blocks: |
Description
John Mazzitelli
2008-11-08 19:31:00 UTC
I found another reason to do this, albeit not your typical use case. I had the perftest plugin. I put it on a server cloud. I rebuilt a new one and deployed it to another server, but the filename was different (same plugin , just different file name). Well the filename got updated in teh database, but the other servers don't have that plugin with that filename so the agents failed to download them. The new agent update download servlet is working nicely, I think we could do something similar. Instead of the file content being on the filesystem, it could be in the DB. Or, what we could do is at startup (and have a periodic job thereafter), the server could read the database, slurp the contents out of the database and reconstitute the plugin files on the local filesystem. If the server's job notices a plugin changed, it can double check its list of plugins - if any are gone from the databse, it deletes them from the filesystem. It also writes the plugin jars to the filesystem that changed. The big question here is : do we really want to store megabytes of content in the database (we will have the same issue in the future with AS cluster stuff). Or would it e.g. make more sense to have one dedicated RHQ server in the cloud that is the designated server for such blob content, so that the agents connect to this dedicated one. Of course one can argue about this dedicated one failing and ... but if this server is down, I can imagine that customers have more important issue than just deploying a new plugin. I say no for a couple reasons: 1) we built HA for the explicit purpose to NOT require a single specific server to be up. Our HA system is a true cloud, servers can come up and down and the agents will be fine. As soon as we require a "special" server to always be running, we break that entire design and we are back to a single-point of failure on the RHQ Server again. Not good. If a server is down, the customer should be concerned about that server yes, but should not be concerned that the entire cloud has a problem. 2) That would require server-to-server communications - something we are avoiding at all costs. Servers do not talk to each other directly. Additionally: We are ALREADY storing megabytes of data in the database - in fact, I'll argue we are already designed to store GIGAbytes of data in the DB (the entire content subsystem and server-side content plugin system does this). The plugins are relatively much more smaller than this. We only have at most (right now) ~20 plugins with an average size in the kilobyte range each - we'll have more data in the rhq_config tables :) svn rev 2668 the server will now put the jar file contents in a blob column ("content") on the rhq_plugin table. the server will read that binary data and store the plugin file on the file system when appropriate (this is so I didn't have to do any major refactoring in the server-side plugin deployment code - it all works the same because we ensure the latest plugins in the database are loaded on the file system at start time). One caveat - might need to add more code to handle this edge-case that deals with hot-deployment: 1) server A starts up 2) someone deploys a new plugin P to another server in the cloud 3) an agent asks server A for updated plugins 4) server A will see plugin P in the database but server A does not yet have the plugin P on the filesystem To fix this, either: a) change the server code that handles agent plugin update requests to always stream the plugin content from the DB, not file system b) document that users must restart all servers if new plugins are deployed (i.e. hot deployment only works on the server where you copied the plugin to) c) change the server code to stream plugin content from DB to filesystem if an agent asks for a plugin that is not yet on the file system. I'm leaning towards c) because it allows filesystem streaming 99% of the time (which I think is faster plus it reduces the load on the database) but it still allows for hot-deployment of plugins across a cloud without the need to restart servers. However, if I do c), it means that if a server goes down during plugin download, an agent will get a failure and will need to retry the plugin download (which, btw, it now does, so this really isn't a problem anymore - the agent will simply retry the download from another server). BTW: a) sounds great in theory, but I just noticed that (while debugging using postgres) the JDBC driver's InputStream that I get back when reading the blob column is a java.io.ByteArrayInputStream with the entire blob data stored in that byte array in memory!!! Which totally defeats the purpose of JDBC BLOB streaming! So, doing a) has the very bad potential of causing an OutOfMemoryError if alot of agents simulteneously attempt to download very large plugins (like the JBossAS plugin). this is major priority - it affects RHQ more so when deploying an N server cloud (where N > 1) turns out the problem I mention above (with a) b) or c) solution) is already jira'ed as RHQ-1360 found out about another edge case that I need to figure out a solution to: 1) server A starts up 2) someone deploys an updated plugin P to another server in the cloud (the plugin already existed, P is just a modified version of that plugin) 3) an agent asks server A for updated plugins 4) server A will see plugin P in the database and server A has P on its filesystem, but its an old version and thus the agent will get that old version. Need a way for the server A to do an MD5 check before serving up a plugin from the filesystem. svn rev 2697 introduces more refactorings to get this to work better. there is now our own extension to the jboss url deployment scanner - prior to scanning the file system, we scan the database for new/updated plugins. if we find one, we write it to the file system, then let the original file-scanner do its thing (at this point, it works just like it did before). this code performs necessary date/time checking and md5 checking to determine if we accept the db plugin or if we assume the existing file-based plugin should be kept. I documented some use-cases we should test: http://support.rhq-project.org/display/RHQ/Design-DeployAgentPluginsInDatabase final major refactoring of ProductPluginDeployer should fix all remaining issues related to the db storage of plugin content. Tested all the 11 test cases in 2 servers/2 agent HA environment as specified above. RHEL5.3, x86_64, PostgreSQL8.2.4, java 1.6.0_11, JON RHQ SVN rev# 2894 This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1090 This bug is related to RHQ-1069 This bug is related to RHQ-1557 This bug incorporates RHQ-1237 |