+++ This bug was initially created as a clone of Bug #1029095 +++ Description of problem: Take snapshot operation scheduled by default to every Sunday at 12:30 AM (https://docs.jboss.org/author/display/RHQ/Backup+and+Restore) failed. Version-Release number of selected component (if applicable): Version : 3.2.0.ER5 Build Number : 2cb2bc9:225c796 How reproducible: 1/1 Steps to Reproduce: No exact repro steps available. I had following set up: machine1: Jon server, jon agent and storage node machine2: Jon agent and storage node Actual results: Found following problems with storage node on machine2. StorageNodeSnapshotFailure alert shown on StorageService resource and following exception in jon server log: 00:30:01,251 ERROR [org.rhq.enterprise.server.operation.ResourceOperationJob] (RHQScheduler_Worker-3) Failed to execute scheduled operation [ResourceOperationSchedule: resource=[Resource[id=10263, uuid=16c9d18a-bbae-431a-8445-85ac34d1801d, type={RHQStorage}StorageService, key=org.apache.cassandra.db:type=StorageService, name=Storage Service]],job-name=[rhq-resource-10263--1783761045-1384061400594], job-group=[rhq-resource-10263], operation-name=[takeSnapshot], subject=[Subject[id=1,name=admin]], description=[Run by StorageNodeManagerBean]]: org.rhq.enterprise.server.authz.PermissionException: The session ID for user [admin] is invalid!: invocation: method=public org.rhq.core.domain.util.PageList<org.rhq.core.domain.operation.ResourceOperationHistory> org.rhq.enterprise.server.operation.OperationManagerBean.findResourceOperationHistoriesByCriteria(org.rhq.core.domain.auth.Subject,org.rhq.core.domain.criteria.ResourceOperationHistoryCriteria),context-data={} at org.rhq.enterprise.server.authz.RequiredPermissionsInterceptor.buildPermissionException(RequiredPermissionsInterceptor.java:164) [rhq-server.jar:4.9.0.JON320ER5] . . Additional info: Password for storage nodes was updated and storage nodes were restarted several times on Friday. Manual execution of take snapshot operation under user rhqadmin was successful. Logs attached.
The password change for the storage cluster is not related to the error reported above. The management for the storage nodes is done over the JMX interface which does not require CQL crendentials. While the error was reported on a weekly maintenance job, it could occur on any operation response related to storage nodes. The code fix makes use of a tested API designed for re-attaching user sessions. For manually triggering this issue, invoke StorageNodeManager.runClusterMaintenance(); from the CLI. This is equivalent to the weekly job. For regression testing, any test cases that makes use of storage operations (adding, removing, maintenance) would exercise the code change. master branch commit: https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=d184f7ea4b7238032a6ed04d26ca0ac6776c5f25