1016175 – Add support for changing the storage password

Bug 1016175 - Add support for changing the storage password

Summary: Add support for changing the storage password

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Security
Sub Component:
Version:	JON 3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ER07
Target Release:	JON 3.2.0
Assignee:	Jirka Kremser
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1012435
TreeView+	depends on / blocked

Reported:	2013-10-07 16:19 UTC by John Sanda
Modified:	2014-01-02 20:37 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
jon server log (1.59 MB, text/x-log) 2013-11-12 09:02 UTC, Filip Brychta	no flags	Details
View All

Description John Sanda 2013-10-07 16:19:31 UTC

Description of problem:
The server installer generates a storage node username/password. There is currently no support for changing the password from either the UI or from the CLI. Users would have to fall back to using cqlsh or cassandra-cli to make the change. We need to provide support for allowing the user to change the password through JON.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Jirka Kremser 2013-10-09 14:19:48 UTC

From the fact it is stored in the system config table, I assume it is a cluster wide setting.

I'll add three input items to the storage cluster settings page (#Administration/Topology/StorageNodes/Settings):
 1 for username
 2 for password (characters will be obfuscated by "••••")
 3 again for password (the value has to be same as 2)

basically, the same approach as on #Administration/Security/Users/2 
The user will need the MANAGE_SETTINGS right

All the above is just for the UI <--> system settings part. As for the "system settings <--> storage nodes" synchronization, is still not resolved. I think, I'll need some help here.

Comment 4 John Sanda 2013-10-09 14:46:41 UTC

Originally we were planning on adding support for allowing users to authenticate against Cassandra using their RHQ credentials. This work was proposed in its place. It has yet to be determined what if anything should be done.

Comment 6 Jirka Kremser 2013-10-30 11:18:38 UTC

49e14ceaf
14990dd8d
2bd19caf2
8d066cc46
71a7ee31c

details:

branch:  release/jon3.2.x
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=49e14ceaf
time:    2013-10-29 17:30:06 -0500
commit:  49e14ceaf3a6d7a83251750d162d5c6d48085a3c
author:  Jiri Kremser - jkremser
message: [BZ 1016175] i18n, wrapping long line
    (cherry picked from commit 3e2934575a121ab292a6bf120ede3287b953959e)


branch:  release/jon3.2.x
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=14990dd8d
time:    2013-10-29 17:29:41 -0500
commit:  14990dd8d09b4a8abc765eefe099c1d617f37bf6
author:  Stefan Negrea - snegrea
message: [BZ 1016175] Update and simplifiy some verbiage based on feedback.
    (cherry picked from commit d5cd83d5008374b87e590197aad459cf09dbc737)


branch:  release/jon3.2.x
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=2bd19caf2
time:    2013-10-29 17:29:13 -0500
commit:  2bd19caf280703079919edaeb1102e8be713ab93
author:  Jiri Kremser - jkremser
message: [BZ 1016175] Workaround to SmartGWT bug: password validators were
         changing the focus after each user input making it unusable.
    (cherry picked from commit 9f460f107d825ac3df5b5c485affc9761935b559)


branch:  release/jon3.2.x
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=8d066cc46
time:    2013-10-29 17:28:40 -0500
commit:  8d066cc46596c819f3218ff9ed7bd74a72385b03
author:  Stefan Negrea - snegrea
message: [BZ 1016175] Add support for changing the storage password
    
    Add code to update the storage session when credentials get update in the system settings table. Created a quartz job to trigger the refresh every 2 minutes.
    
    The process to refresh the session is as follows:
    1) Get new credentials from the database
    2) Create a new session with new credentials
    3) Replace existing session with newly created one
    4) Allow few minutes for existing session drainage
    5) Shutdown old session
    
    Other changes:
    - Add code to update Cassandra password via a direct CQL query using the existing open session.
    - Update log level and text for the storage cluster credentials update job.
    - Make the username read-only; adding more intuitive description to the other properties.
    
    (cherry picked from commit c7650fa9381a4d3bd30ae69d1cc87ae54db10ec0)


branch:  release/jon3.2.x
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=71a7ee31c
time:    2013-10-29 17:28:11 -0500
commit:  71a7ee31c7374abbfd1fa66867458216b47ca006
author:  Jiri Kremser - jkremser
message: [BZ 1016175] Add support for changing the storage password
    
    - adding length validators for username and password; moving the storage node credentials to its own config section
    - renaming "password" field to "passwordHash"; setting the password to null if there was no change at all to it
    - adding support for storing the C* credentials to system settings db table
    - adding length validators for username and password; moving the storage node credentials to its own config section
    
    (cherry picked from commit 1e6aee931a92ff658c2def4d5714e81e0a24b4a5)

Comment 7 Stefan Negrea 2013-10-31 15:10:41 UTC

Implemented this feature. It is now possible to change the password for the storage cluster from the Storage Node Admin UI (Cluster Settings tab). Only the password can be updated; the username is displayed as a read-only field for reference purposes.

Any update on the Storage Node Admin page will be propagated to the Storage Cluster and to the other HA servers. There should be no data loss and no error messages after changing the password, all the servers will automatically refresh storage cluster session.


release/jon3.2.x branch commits:

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=71a7ee31c7374abbfd1fa66867458216b47ca006

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=8d066cc46596c819f3218ff9ed7bd74a72385b03

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=2bd19caf280703079919edaeb1102e8be713ab93

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=14990dd8d09b4a8abc765eefe099c1d617f37bf6

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=49e14ceaf3a6d7a83251750d162d5c6d48085a3c

Comment 8 Simeon Pinder 2013-11-07 02:17:25 UTC

Moving to ON_QA for test with new brew build.

Comment 9 Filip Brychta 2013-11-08 13:54:15 UTC

Verified on
Version :	
3.2.0.ER5
Build Number :	
2cb2bc9:225c796

Password was successfully updated on all(2) storage nodes.

Comment 10 Filip Brychta 2013-11-12 09:01:04 UTC

I have to move it back to assigned.

Password on all storage nodes was updated successfully (verified via ./cqlsh -u yhzsrcvt -p mujtest 10.16.23.69) but after restart of one storage node, the jon server started throwing exceptions to the log.

I had following set up:
machine1: Jon server, jon agent and storage node 
machine2: Jon agent and storage node

Following scenario failed:
1- change password for storage nodes
2- restart storage node on machine2

Result:
#Administration/Topology/StorageNodes page shows Globally Uncaught Exception and jon server throws following exceptions:
03:35:33,368 ERROR [com.datastax.driver.core.Session] (Cassandra Java Driver worker-3) Error creating pool to /10.16.23.69 (Authentication error on host /10.16.23.69: Username and/or password are incorrect)
03:35:42,587 ERROR [org.rhq.server.metrics.MetricsServer] (New I/O worker #29) An error occurred while inserting raw data MeasurementDataNumeric[name=Calculated.DataDiskUsedPercentage, value=0.0, scheduleId=10378, timestamp=1384245335351]: com.datastax.driver.core.exceptions.DriverInternalError: Tried to execute unknown prepared query 3de31b7502a532efbecfb4397f549d7d

Additional info:
Restart of jon server solves this issue.
Complete jon server log attached.

Comment 11 Filip Brychta 2013-11-12 09:02:09 UTC

Created attachment 822787 [details]
jon server log

Comment 12 Stefan Negrea 2013-11-13 16:49:19 UTC

The job that updates the storage node session with new credentials on the RHQ server runs every 2 minutes with a 1 minute drain period on the existing session. Unless you restart the storage nodes (like you did), the old session with the old credentials can still function for a short time (until it gets invalidated by C*). The RHQ server updates (to create a new storage session) cannot be done right away because of HA environments; where the change needs to be done to all RHQ servers. So the code that changes the credentials needs to run on a timer and execute on every HA server.

When you restart the storage nodes right away after a password change, the servers will eventually pick up the password change and refresh the bad session.


Filip, in your case you did not wait long enough for the credentials job to refresh the session. Can you please change the test case a little? You can do two tweaks to the test case to ensure the functionality works as expected:
1) Do not restart the storage nodes for 4 minutes. Restart afterwards and there should be no impact.
2) Restart the storage nodes immediately but then wait about 3 minutes. The RHQ server should use the new credentials when the job that checks for credential updates runs.

Comment 13 Filip Brychta 2013-11-14 09:28:42 UTC

I added another step to the test case as you suggested in comment 12. So I tried following:
1. I had the same set up as described in comment 10:
machine1: Jon server, jon agent and storage node (10.16.23.61) 
machine2: Jon agent and storage node (10.16.23.69)
2. change storage password via jon server's UI (Administration->Storage Nodes->Cluster settings)
3. wait ~10 minutest
4. stop storage node on machine2 using rhqctl
5. start storage node on machine2 using rhqctl


Result:
Following errors in jon server log:
04:07:08,505 ERROR [com.datastax.driver.core.Session] (Cassandra Java Driver worker-2) Error creating pool to /10.16.23.69 (Authentication error on host /10.16.23.69: Username and/or password are incorrect)
04:07:08,760 ERROR [com.datastax.driver.core.Session] (Cassandra Java Driver worker-2) Error creating pool to /10.16.23.69 (Authentication error on host /10.16.23.69: Username and/or password are incorrect)
04:07:09,649 ERROR [com.datastax.driver.core.AbstractReconnectionHandler] (Reconnection-1) Authentication error on host /10.16.23.69: Username and/or password are incorrect
04:07:09,650 ERROR [com.datastax.driver.core.AbstractReconnectionHandler] (Reconnection-1) Retry against /10.16.23.69 have been suspended. It won't be retried unless the node is restarted.
04:07:09,993 ERROR [com.datastax.driver.core.AbstractReconnectionHandler] (Reconnection-1) Authentication error on host /10.16.23.69: Username and/or password are incorrect
04:07:09,995 ERROR [com.datastax.driver.core.AbstractReconnectionHandler] (Reconnection-1) Retry against /10.16.23.69 have been suspended. It won't be retried unless the node is restarted.
04:07:11,449 ERROR [com.datastax.driver.core.RequestHandler] (New I/O worker #27) Tried to execute unknown prepared query 3de31b7502a532efbecfb4397f549d7d



Errors are still being thrown even after another ~15 minutes.
The issue disappeared when the jon server was restarted.
I'm still keeping both machines up for possible further investigating..

Comment 14 Filip Brychta 2013-11-14 15:58:16 UTC

Because it's not visible enough in comment 10, I'm adding it here again. Result of this issue is following:

1- #Administration/Topology/StorageNodes page shows Globally Uncaught Exception 
2- errors similar to 
04:16:11,451 ERROR [com.datastax.driver.core.RequestHandler] (New I/O worker #28) Tried to execute unknown prepared query 3de31b7502a532efbecfb4397f549d7d
04:16:11,451 ERROR [org.rhq.server.metrics.MetricsServer] (New I/O worker #28) An error occurred while inserting raw data MeasurementDataNumeric[name=Calculated.TotalDiskUsedPercentage, value=0.53, scheduleId=11785, timestamp=1384420564914]: com.datastax.driver.core.exceptions.DriverInternalError: Tried to execute unknown prepared query 3de31b7502a532efbecfb4397f549d7d

are being thrown to rhq server log periodically.



Exception which is throw to rhq server log when accessing #Administration/Topology/StorageNodes page:
04:15:20,030 ERROR [org.jboss.as.ejb3.invocation] (http-/0.0.0.0:7080-5) JBAS014134: EJB Invocation failed on component MeasurementDataManagerBean for method public abstract org.rhq.core.domain.measurement.MeasurementAggregate org.rhq.enterprise.server.measurement.MeasurementDataManagerRemote.getMeasurementAggregate(org.rhq.core.domain.auth.Subject,int,long,long) throws org.rhq.enterprise.server.measurement.MeasurementException: javax.ejb.EJBTransactionRolledbackException: Tried to execute unknown prepared query 15f2e28144a6700f02c39fcee365b36d


Complete exceptions are visible in attached log (time stamps could be different so search for string without time stamps)

Comment 15 Stefan Negrea 2013-11-14 21:28:46 UTC

Identified one more instance where the session was not refreshed. The MetricsDAO makes use of prepared statements that were prepared with the previous session (old credentials). Added code that refreshes all the prepared statements when the new session (new credentials) is created. This will fix the UI errors (comment #14) as well the errors from the data purge job (comment #13).

release/jon3.2.x branch commit:
https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=90f096f11ebd1122e01ed39fe5b210cd94881e01

Comment 16 Simeon Pinder 2013-11-19 15:48:00 UTC

Moving to ON_QA as available for testing with new brew build.

Comment 17 Filip Brychta 2013-11-20 10:42:05 UTC

Commit [BZ 1016175] MetricsDAO was still holding references to st... didn't make it to rc/jon3.2.0.ER6 so the problem is still there on ER6.

Comment 18 Simeon Pinder 2013-11-22 04:44:51 UTC

Moving to ER07 target milestone and MODIFIED as initially rejected by QE because of bad ER6 build but put into varied states(MODIFIED & ASSIGNED).

Comment 19 Simeon Pinder 2013-11-22 05:01:39 UTC

Moving to ON_QA as available to test in ER7 and later brew builds.

Comment 20 Filip Brychta 2013-11-22 13:53:22 UTC

Verified on
Version :	
3.2.0.ER7
Build Number :	
e8e6401:ff0061d

Note You need to log in before you can comment on or make changes to this bug.