Description of problem: The server installer generates a storage node username/password. There is currently no support for changing the password from either the UI or from the CLI. Users would have to fall back to using cqlsh or cassandra-cli to make the change. We need to provide support for allowing the user to change the password through JON. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
From the fact it is stored in the system config table, I assume it is a cluster wide setting. I'll add three input items to the storage cluster settings page (#Administration/Topology/StorageNodes/Settings): 1 for username 2 for password (characters will be obfuscated by "••••") 3 again for password (the value has to be same as 2) basically, the same approach as on #Administration/Security/Users/2 The user will need the MANAGE_SETTINGS right All the above is just for the UI <--> system settings part. As for the "system settings <--> storage nodes" synchronization, is still not resolved. I think, I'll need some help here.
Originally we were planning on adding support for allowing users to authenticate against Cassandra using their RHQ credentials. This work was proposed in its place. It has yet to be determined what if anything should be done.
49e14ceaf 14990dd8d 2bd19caf2 8d066cc46 71a7ee31c details: branch: release/jon3.2.x link: http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=49e14ceaf time: 2013-10-29 17:30:06 -0500 commit: 49e14ceaf3a6d7a83251750d162d5c6d48085a3c author: Jiri Kremser - jkremser message: [BZ 1016175] i18n, wrapping long line (cherry picked from commit 3e2934575a121ab292a6bf120ede3287b953959e) branch: release/jon3.2.x link: http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=14990dd8d time: 2013-10-29 17:29:41 -0500 commit: 14990dd8d09b4a8abc765eefe099c1d617f37bf6 author: Stefan Negrea - snegrea message: [BZ 1016175] Update and simplifiy some verbiage based on feedback. (cherry picked from commit d5cd83d5008374b87e590197aad459cf09dbc737) branch: release/jon3.2.x link: http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=2bd19caf2 time: 2013-10-29 17:29:13 -0500 commit: 2bd19caf280703079919edaeb1102e8be713ab93 author: Jiri Kremser - jkremser message: [BZ 1016175] Workaround to SmartGWT bug: password validators were changing the focus after each user input making it unusable. (cherry picked from commit 9f460f107d825ac3df5b5c485affc9761935b559) branch: release/jon3.2.x link: http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=8d066cc46 time: 2013-10-29 17:28:40 -0500 commit: 8d066cc46596c819f3218ff9ed7bd74a72385b03 author: Stefan Negrea - snegrea message: [BZ 1016175] Add support for changing the storage password Add code to update the storage session when credentials get update in the system settings table. Created a quartz job to trigger the refresh every 2 minutes. The process to refresh the session is as follows: 1) Get new credentials from the database 2) Create a new session with new credentials 3) Replace existing session with newly created one 4) Allow few minutes for existing session drainage 5) Shutdown old session Other changes: - Add code to update Cassandra password via a direct CQL query using the existing open session. - Update log level and text for the storage cluster credentials update job. - Make the username read-only; adding more intuitive description to the other properties. (cherry picked from commit c7650fa9381a4d3bd30ae69d1cc87ae54db10ec0) branch: release/jon3.2.x link: http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=71a7ee31c time: 2013-10-29 17:28:11 -0500 commit: 71a7ee31c7374abbfd1fa66867458216b47ca006 author: Jiri Kremser - jkremser message: [BZ 1016175] Add support for changing the storage password - adding length validators for username and password; moving the storage node credentials to its own config section - renaming "password" field to "passwordHash"; setting the password to null if there was no change at all to it - adding support for storing the C* credentials to system settings db table - adding length validators for username and password; moving the storage node credentials to its own config section (cherry picked from commit 1e6aee931a92ff658c2def4d5714e81e0a24b4a5)
Implemented this feature. It is now possible to change the password for the storage cluster from the Storage Node Admin UI (Cluster Settings tab). Only the password can be updated; the username is displayed as a read-only field for reference purposes. Any update on the Storage Node Admin page will be propagated to the Storage Cluster and to the other HA servers. There should be no data loss and no error messages after changing the password, all the servers will automatically refresh storage cluster session. release/jon3.2.x branch commits: https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=71a7ee31c7374abbfd1fa66867458216b47ca006 https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=8d066cc46596c819f3218ff9ed7bd74a72385b03 https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=2bd19caf280703079919edaeb1102e8be713ab93 https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=14990dd8d09b4a8abc765eefe099c1d617f37bf6 https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=49e14ceaf3a6d7a83251750d162d5c6d48085a3c
Moving to ON_QA for test with new brew build.
Verified on Version : 3.2.0.ER5 Build Number : 2cb2bc9:225c796 Password was successfully updated on all(2) storage nodes.
I have to move it back to assigned. Password on all storage nodes was updated successfully (verified via ./cqlsh -u yhzsrcvt -p mujtest 10.16.23.69) but after restart of one storage node, the jon server started throwing exceptions to the log. I had following set up: machine1: Jon server, jon agent and storage node machine2: Jon agent and storage node Following scenario failed: 1- change password for storage nodes 2- restart storage node on machine2 Result: #Administration/Topology/StorageNodes page shows Globally Uncaught Exception and jon server throws following exceptions: 03:35:33,368 ERROR [com.datastax.driver.core.Session] (Cassandra Java Driver worker-3) Error creating pool to /10.16.23.69 (Authentication error on host /10.16.23.69: Username and/or password are incorrect) 03:35:42,587 ERROR [org.rhq.server.metrics.MetricsServer] (New I/O worker #29) An error occurred while inserting raw data MeasurementDataNumeric[name=Calculated.DataDiskUsedPercentage, value=0.0, scheduleId=10378, timestamp=1384245335351]: com.datastax.driver.core.exceptions.DriverInternalError: Tried to execute unknown prepared query 3de31b7502a532efbecfb4397f549d7d Additional info: Restart of jon server solves this issue. Complete jon server log attached.
Created attachment 822787 [details] jon server log
The job that updates the storage node session with new credentials on the RHQ server runs every 2 minutes with a 1 minute drain period on the existing session. Unless you restart the storage nodes (like you did), the old session with the old credentials can still function for a short time (until it gets invalidated by C*). The RHQ server updates (to create a new storage session) cannot be done right away because of HA environments; where the change needs to be done to all RHQ servers. So the code that changes the credentials needs to run on a timer and execute on every HA server. When you restart the storage nodes right away after a password change, the servers will eventually pick up the password change and refresh the bad session. Filip, in your case you did not wait long enough for the credentials job to refresh the session. Can you please change the test case a little? You can do two tweaks to the test case to ensure the functionality works as expected: 1) Do not restart the storage nodes for 4 minutes. Restart afterwards and there should be no impact. 2) Restart the storage nodes immediately but then wait about 3 minutes. The RHQ server should use the new credentials when the job that checks for credential updates runs.
I added another step to the test case as you suggested in comment 12. So I tried following: 1. I had the same set up as described in comment 10: machine1: Jon server, jon agent and storage node (10.16.23.61) machine2: Jon agent and storage node (10.16.23.69) 2. change storage password via jon server's UI (Administration->Storage Nodes->Cluster settings) 3. wait ~10 minutest 4. stop storage node on machine2 using rhqctl 5. start storage node on machine2 using rhqctl Result: Following errors in jon server log: 04:07:08,505 ERROR [com.datastax.driver.core.Session] (Cassandra Java Driver worker-2) Error creating pool to /10.16.23.69 (Authentication error on host /10.16.23.69: Username and/or password are incorrect) 04:07:08,760 ERROR [com.datastax.driver.core.Session] (Cassandra Java Driver worker-2) Error creating pool to /10.16.23.69 (Authentication error on host /10.16.23.69: Username and/or password are incorrect) 04:07:09,649 ERROR [com.datastax.driver.core.AbstractReconnectionHandler] (Reconnection-1) Authentication error on host /10.16.23.69: Username and/or password are incorrect 04:07:09,650 ERROR [com.datastax.driver.core.AbstractReconnectionHandler] (Reconnection-1) Retry against /10.16.23.69 have been suspended. It won't be retried unless the node is restarted. 04:07:09,993 ERROR [com.datastax.driver.core.AbstractReconnectionHandler] (Reconnection-1) Authentication error on host /10.16.23.69: Username and/or password are incorrect 04:07:09,995 ERROR [com.datastax.driver.core.AbstractReconnectionHandler] (Reconnection-1) Retry against /10.16.23.69 have been suspended. It won't be retried unless the node is restarted. 04:07:11,449 ERROR [com.datastax.driver.core.RequestHandler] (New I/O worker #27) Tried to execute unknown prepared query 3de31b7502a532efbecfb4397f549d7d Errors are still being thrown even after another ~15 minutes. The issue disappeared when the jon server was restarted. I'm still keeping both machines up for possible further investigating..
Because it's not visible enough in comment 10, I'm adding it here again. Result of this issue is following: 1- #Administration/Topology/StorageNodes page shows Globally Uncaught Exception 2- errors similar to 04:16:11,451 ERROR [com.datastax.driver.core.RequestHandler] (New I/O worker #28) Tried to execute unknown prepared query 3de31b7502a532efbecfb4397f549d7d 04:16:11,451 ERROR [org.rhq.server.metrics.MetricsServer] (New I/O worker #28) An error occurred while inserting raw data MeasurementDataNumeric[name=Calculated.TotalDiskUsedPercentage, value=0.53, scheduleId=11785, timestamp=1384420564914]: com.datastax.driver.core.exceptions.DriverInternalError: Tried to execute unknown prepared query 3de31b7502a532efbecfb4397f549d7d are being thrown to rhq server log periodically. Exception which is throw to rhq server log when accessing #Administration/Topology/StorageNodes page: 04:15:20,030 ERROR [org.jboss.as.ejb3.invocation] (http-/0.0.0.0:7080-5) JBAS014134: EJB Invocation failed on component MeasurementDataManagerBean for method public abstract org.rhq.core.domain.measurement.MeasurementAggregate org.rhq.enterprise.server.measurement.MeasurementDataManagerRemote.getMeasurementAggregate(org.rhq.core.domain.auth.Subject,int,long,long) throws org.rhq.enterprise.server.measurement.MeasurementException: javax.ejb.EJBTransactionRolledbackException: Tried to execute unknown prepared query 15f2e28144a6700f02c39fcee365b36d Complete exceptions are visible in attached log (time stamps could be different so search for string without time stamps)
Identified one more instance where the session was not refreshed. The MetricsDAO makes use of prepared statements that were prepared with the previous session (old credentials). Added code that refreshes all the prepared statements when the new session (new credentials) is created. This will fix the UI errors (comment #14) as well the errors from the data purge job (comment #13). release/jon3.2.x branch commit: https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=90f096f11ebd1122e01ed39fe5b210cd94881e01
Moving to ON_QA as available for testing with new brew build.
Commit [BZ 1016175] MetricsDAO was still holding references to st... didn't make it to rc/jon3.2.0.ER6 so the problem is still there on ER6.
Moving to ER07 target milestone and MODIFIED as initially rejected by QE because of bad ER6 build but put into varied states(MODIFIED & ASSIGNED).
Moving to ON_QA as available to test in ER7 and later brew builds.
Verified on Version : 3.2.0.ER7 Build Number : e8e6401:ff0061d