Bug 1004050
| Summary: | Provide better error handling for multi-storage node deployment prior to server install | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Sanda <jsanda> | ||||
| Component: | Core Server | Assignee: | John Sanda <jsanda> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.9 | CC: | ahovsepy, hrupp | ||||
| Target Milestone: | --- | ||||||
| Target Release: | RHQ 4.10 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1021530 (view as bug list) | Environment: | |||||
| Last Closed: | 2014-04-23 12:31:41 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 951619 | ||||||
| Attachments: |
|
||||||
|
Description
John Sanda
2013-09-03 19:15:16 UTC
I have added error handling along with detailed logging to deal with this situation. The IndexOutOfBoundsException is now caught and the following is logged, 15:19:27,686 ERROR [org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean] (http-/0.0.0.0:7080-3) If this error occurred with a storage node that was deployed prior to installing the server, then this may indicate that the rhq.storage.nodes property in rhq-server.properties was not set correctly. All nodes deployed prior to server installation should be listed in the rhq.storage.nodes property. Please review the deployment documentation for additional details. The user can simply redeploy the storage node from the UI (or CLI). master commit hash: b30d3fe reopening. steps: 1. in jonHome/bin/rhq-storage.properties set rhq.storage.seeds=IP1,IP@ for both nothing else changed here 2. jonHome/bin/rhqctl install --storage on IP1 3. jonHome/bin/rhqctl install --storage --agent-preference="rhq.agent.server.bind-address=IP1" on IP2 4. jonHome/bin/rhqctl start on both As soon as nodes connected 5. jonHome/bin/rhqctl install --server --start on 1IP1 Actual result: Impossible to log in to server_gui Exception in server log: 10:21:23,753 ERROR [org.rhq.server.metrics.MetricsServer] (New I/O worker #5) An error occurred while inserting raw data MeasurementDataNumeric[name=Calculated.FreeDiskToDataSizeRatio, value=13706.69, scheduleId=10361, timestamp=1382019681146]: com.datastax.driver.core.exceptions.UnauthorizedException: User dwntmhcd has no MODIFY permission on <table rhq.raw_metrics> or any of its parents There is a different issue that occurred with the error reported in comment 2. Two storage nodes were properly configured and deployed prior to installing the server, but only one storage node was specified in the rhq.storage.nodes property. The second node subsequently went through the deployment process upon being imported into inventory. During the deployment process the node is bootstrapped into the cluster. Its data directories are purged to ensure we can bootstrap it. This explains the UnauthorizedException. We were sending writes to the nodes with credentials that it no longer knew about. This was caused by the node going through the deployment process. The errors could have been prevented if the user specified both nodes in rhq.storage.nodes. We can do better here and make things more robust by lifting the requirement to specify all nodes (already installed) in rhq.storage.nodes. The driver already knows about the nodes; so, we can go ahead and create the additional storage node entities that we discover from the driver. I think it is perfectly reasonable to do this. Even though the node was not listed in rhq.storage.nodes, the user has to go through a number of manual steps to get those nodes clustered which tells me that she knows what she is doing and likely just forgot to update rhq.storage.properties. Changes have been committed to master. The relevant commit hashes are, 93e856e1 4012733 de5d069 Created attachment 815371 [details]
storageConnection
verified installed 2 storages prior to server installation, connected to each other, then installed server without providing rhq.storage.nodes, so only one node was specified there. After server installation in saw both storages, stopped storage on server box, server was performing, no exceptions in server and/or storage log, restarted full rhq on server box and it was correctly re-connected to separated storage. Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10. |