Bug 1234912
Summary: | Do not authenticate against new storage node when replication_factor of system_auth keyspace is wrong | ||
---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | John Sanda <jsanda> |
Component: | Core Server, Storage Node | Assignee: | Libor Zoubek <lzoubek> |
Status: | CLOSED ERRATA | QA Contact: | Filip Brychta <fbrychta> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | JON 3.3.0 | CC: | fbrychta, loleary, lzoubek, spinder, theute |
Target Milestone: | ER02 | Keywords: | Triaged |
Target Release: | JON 3.3.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-10-28 14:36:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1200594 |
Description
John Sanda
2015-06-23 13:30:43 UTC
At start up, and maybe periodically as a scheduled job, we should check that the replication_factor is what we expect it to be for the system_auth and rhq keyspaces. Of course in the original scenario described this won't be possible since we cannot authenticate against the new node. We store and track the state of cluster maintenance in the rhq_storage_node table in the RDBMS. I think we need an explicit state stored somewhere in the RDBMS that allow to easily and immediately (at startup) identify the problem. State is tracked using the StorageNode.OperationMode enum. Maybe we could add two additional values like UPDATE_SYSTEM_AUTH_SCHEMA and UPDATE_RHQ_SCHEMA. The one problem with storing state in this way is that if another deploy or undeploy process is started, we essentially lose this state information. This problem is not specific to this situation. It is a problem in general with the implementation for how we store and track state with respect to cluster maintenance. branch: master link: https://github.com/rhq-project/rhq/commit/278fc3a2a time: 2015-09-30 15:41:46 +0200 commit: 278fc3a2a95c7eb1ce0af7b0ff80f73d0f309b8d author: Libor Zoubek - lzoubek message: Bug 1234912 - Do not authenticate against new storage node when replication_factor of system_auth keyspace is wrong For system_auth keyspace set replication_factor=clusterSize, so each node keeps it's own copy of auth data. Created recurrent job which checks replication_factor for rhq and system_auth keyspaces when invalid replication_factor is detected, job tries to fix it and then recommends running clusterMaintenance This commit also changes "expected" replication factor of system_auth keyspace to be equal to number of nodes. branch: release/jon3.3.x link: https://github.com/rhq-project/rhq/commit/ee4afd78d time: 2015-09-30 19:33:16 +0200 commit: ee4afd78df30af016539b925de06179827c40773 author: Libor Zoubek - lzoubek message: Bug 1234912 - Do not authenticate against new storage node when replication_factor of system_auth keyspace is wrong For system_auth keyspace set replication_factor=clusterSize, so each node keeps it's own copy of auth data. Created recurrent job which checks replication_factor for rhq and system_auth keyspaces when invalid replication_factor is detected, job tries to fix it and then recommends running clusterMaintenance This commit also changes "expected" replication factor of system_auth keyspace to be equal to number of nodes. (cherry picked from commit 278fc3a2a95c7eb1ce0af7b0ff80f73d0f309b8d) Signed-off-by: Libor Zoubek <lzoubek> branch: master link: https://github.com/rhq-project/rhq/commit/7fb9222c8 time: 2015-10-05 15:41:16 +0200 commit: 7fb9222c80981fb876d8a7eea472304761f42555 author: Libor Zoubek - lzoubek message: Bug 1234912 - Do not authenticate against new storage node when replication_factor of system_auth keyspace is wrong Correctly close storage cluster session and fix scheduling interval of job branch: release/jon3.3.x link: https://github.com/rhq-project/rhq/commit/3ef061530 time: 2015-10-05 15:42:13 +0200 commit: 3ef06153042b4105a1da6dd678944e3240a25f4f author: Libor Zoubek - lzoubek message: Bug 1234912 - Do not authenticate against new storage node when replication_factor of system_auth keyspace is wrong Correctly close storage cluster session and fix scheduling interval of job (cherry picked from commit 7fb9222c80981fb876d8a7eea472304761f42555) Signed-off-by: Libor Zoubek <lzoubek> branch: master link: https://github.com/rhq-project/rhq/commit/e1fa9edbe time: 2015-10-08 16:34:35 +0200 commit: e1fa9edbe0a53bf39c86312cf7a8848e934ac57b author: Libor Zoubek - lzoubek message: Bug 1234912 - Do not authenticate against new storage node when replication_factor of system_auth keyspace is wrong Fix "healthy" replication factor definition branch: release/jon3.3.x link: https://github.com/rhq-project/rhq/commit/fa7b1a1f8 time: 2015-10-08 17:00:58 +0200 commit: fa7b1a1f8dc55140e8b9fc900db044bde3892f98 author: Libor Zoubek - lzoubek message: Bug 1234912 - Do not authenticate against new storage node when replication_factor of system_auth keyspace is wrong Fix "healthy" replication factor definition (cherry picked from commit e1fa9edbe0a53bf39c86312cf7a8848e934ac57b) Signed-off-by: Libor Zoubek <lzoubek> Moving to ON_QA as available to test with the following build: https://brewweb.devel.redhat.com/buildinfo?buildID=460382 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of jon-server-3.3.0.GA-update-04.zip. Moving target milestone to ER02 to retest after latest Cassandra changes. Moving to ON_QA as available to test with the following build: https://brewweb.devel.redhat.com//buildinfo?buildID=461043 *Note: jon-server-patch-3.3.0.GA.zip maps to ER02 build of jon-server-3.3.0.GA-update-04.zip. Verified on: Version : 3.3.0.GA Update 04 Build Number : e9ed05b:aa79ebd Verification steps: Deploying and removing up to 4 storage nodes and manually changing replication factor for rhq and system_auth keyspaces and checking that those are automatically reset to correct values. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1947.html |