Description of problem: For multi-broker deployments (i.e. pretty much any "real" deployments) there is a necessity to have broker authentication keys and salt synchronized across all brokers. This is mentioned obscurely in the docs: https://access.redhat.com/site/documentation/en-US/OpenShift_Enterprise/2/html-single/Deployment_Guide/index.html#Configuring_the_Required_Services And now, in the install script: https://github.com/openshift/openshift-extras/blob/enterprise-2.0/enterprise/install-scripts/openshift.ks#L129 "MANUAL TASKS" #4 If these points are not heeded, the results are painful. When a broker creates a gear, it creates credentials that may be used by the gear to make requests to the broker in order to perform certain operations (autoscaling, Jenkins builds, recording deployments). These requests will fail (401 Unauthorized) when made to a broker that has different auth keys or salt. About the only sign of the problem (aside from the user experience) is that the broker app log (production.log) records a stack trace with an obscure error (which varies depending on whether it's the auth salt or the keys that don't match). oo-diagnostics should look for this error and FAIL with a recommended fix. The fix is to synchronize the keys, then use oo-admin-broker-auth to resync all credentials on existing gears. Steps to Reproduce: 1. Install two broker hosts and one node. 2. Create a scaled app using the first broker. 3. Set the node's node.conf:BROKER_HOST to the second broker. 4. Push a commit to the app's git repo, or scale it up from the head gear (ssh in and haproxy_ctld -u) The request is rejected and one of the following appears in the production.log: [ERROR] Broker key authentication failed. bad decrypt [ERROR] Broker key authentication failed. padding check failed ... with a long stack trace. To be flexible, oo-diagnostics should check development.log if appropriate. Also, if possible oo-diagnostics should only report this error when it appears after the most recent restart (don't want to keep reporting it after fixed).
Addendum: After changing node.conf at step 3, be sure to restart ruby193-mcollective to pick up the change. Or just install it that way to begin with.
PR <https://github.com/openshift/origin-server/pull/6283> is submitted to fix this bug.
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/4ef7653a96b4abe74616a588adfe3af466422c1f Bug 1064039: Add oo-diagnostic report 401 Unauthorized error The 'oo-diagnostics' currently won't fail if there are 410 Unauthorized error in the production/development.log. After this change, 'oo-diagnostics' will check either production or development.log depending on which mode the broker is in. If the 401 errors exist, an error message will be displayed with the sugguested solution to fix the issue. The error message may continue to be displayed in the subsequential 'oo-diagnostics' until the 401 error message is cleared up in the next log rotation. Bug <1064039> Link <https://bugzilla.redhat.com/show_bug.cgi?id=1064039> Signed-off-by: Vu Dinh <vdinh>
Verified this bug with rubygem-openshift-origin-common-1.29.4.1-1.el6op.noarch, and PASS. 1. Install two separate broker servers, they will install different authentication keys and salt in broker's configuration 2. Run oo-diagnostics to make both of the two servers are running well. 3. Against broker1, deploy a scalable app. 4. On node, change /etc/openshift/env/OPENSHIFT_BROKER_HOST and /etc/openshift/node.conf to point broker host to broker2, then restart ruby193-mcollective service. 5. Log into app, run the following command: [scaphp53app-jialiu.ose-20151123.example.com 5652f2791a5f2141ab000009]\> haproxy_ctld -u I, [2015-11-23T19:07:31.487712 #28481] INFO -- : Starting haproxy_ctld I, [2015-11-23T19:07:31.488385 #28481] INFO -- : add-gear - capacity: 0.0% gear_count: 1 sessions: 0 up_thresh: 90.0% I, [2015-11-23T19:07:32.239480 #28481] INFO -- : add-gear - exit_code: 1 output: Failed to get application info from the broker: 401 Unauthorized Failed to get application info from the broker: 401 Unauthorized In broker2's log, get the following error: 2015-11-23 19:07:33.250 [ERROR] Broker key authentication failed. padding check failed Run oo-diagnostics again, get the following warning: INFO: running: test_auth_keys_salt_failure WARN: test_auth_keys_salt_failure Detected an '401 Unauthorized' error on the production log. This error may indicate inconsistent authentication keys or salts among brokers. Please follow below steps to fix the issue: 1. Manually synchronize the authentication keys and salt in broker's configuration 2. Then use 'oo-admin-broker-auth' to resync all credentials on existing gears Note: This error message may continue to be displayed even if the issue is already resolved as the error message is still in the production log. The error will be cleared after the next log rotation. Change broker to development, the warning is also reproduced. Following the warning message, run "oo-admin-broker-auth --rekey-all" on broker, scale up app again, this time it succeed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2666.html