Description of problem: [rhq@vp25q03ad-hadoop097 bin]$ ./nodetool -p 7299 status Datacenter: 176 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 17.176.208.117 15.93 GB 256 69.6% cad149ed-d5e1-4633-8e5a-6d6cb8a3da6b 208 UN 17.176.208.118 56.93 GB 256 64.9% c421b915-9bc5-46bd-b26f-e88c89f114bf 208 UN 17.176.208.119 53.17 GB 256 65.5% 7367d69c-8fa6-4162-8b18-963c0ae1a229 208 Logs: 21:07:37,792 WARN [org.rhq.server.metrics.StorageSession] (http-/0.0.0.0:7080-585) Encountered NoHostAvailableException due to following error(s): {} 21:07:37,792 INFO [org.rhq.enterprise.server.storage.StorageClusterMonitor] (http-/0.0.0.0:7080-513) Storage cluster is down 21:07:37,793 INFO [org.rhq.enterprise.server.storage.StorageClusterMonitor] (http-/0.0.0.0:7080-585) Storage cluster is down 21:07:37,793 INFO [org.rhq.enterprise.server.storage.StorageClusterMonitor] (http-/0.0.0.0:7080-459) Storage cluster is down Version-Release number of selected component (if applicable): 4.12 How reproducible: Unclear. Seems to have happened when I got some timeouts at startup. Startup took a long time, so I wonder if there is some sort of conflict. The error message looks really suspicious, though.
I had trouble running repair. It seems there is an installation issue with Cassandra. Over enough times running repair, things seemed to work okay once I ran repair over the weekend. I don't know the root cause, though. The Cassandra logs don't reveal much detail as to any IO errors or not. My suspicion is there is either a capacity or load issue, but since this happened as well with 4.9, I'm guessing not an RHQ issue.