Description of problem: storage node stays in bootstrap mode while joining to cluster if it had already been deployed - undeployed Version-Release number of selected component (if applicable): e2a1811 How reproducible: always Steps to Reproduce: 1. install and start rhq server, storage and agent on ip1 2. install and start storage and agent on ip2 3. undeploy storage in ip2 4. deploy storage on ip2 again Actual results: storage stays in bootstrap mode "forever" Expected results: storage goes through INSTALL -> ANNOUNCE -> BOOTSTRAP -> ADD_MAINTENANCE modes and gets normal cluster status. Additional info: Investigation results from John Sanda: once the C* bootstrap finishes we wait for a event notification from the driver that the node is up then we change its mode from bootstrap to add_maintenance, but that event is not firing. It happens with a node that was previously deployed.
We are running into https://issues.apache.org/jira/browse/CASSANDRA-5769. The event is not reported over the native, CQL protocol. We use this event notification to determine that this node has joined the cluster at which point we can initiate necessary cluster maintenance. Since the event has not fired, we are in a perpetual holding pattern. Even though we are close to releasing 4.9, I think upgrading C* makes sense for a couple reasons. First, it resolves this issue. Secondly, it gives us an opportunity to test upgrading our C* bit in the community before a JON release.
I have upgraded RHQ to use Cassandra 1.2.9 which includes the fix for CASSANDRA-5769. Even with that fix there are still scenarios in which the server could miss the event notification that advances the deployment beyond the bootstrap phase; consequently, I put some additional logic in place to continue the deployment if the bootstrap is successful and if the new node is part of the cluster.
Created attachment 792791 [details] storage-uninstall.png
Created attachment 792792 [details] storage-reinstalled.png
verified. please get screen-shots attached. for exception in re-installation screen-shot new bug is filed #1003545
Bulk closing of RHQ 4.9 verified items