Bug 1064039 - RFE oo-diagnostics should report when node auth is failing (401 Unauthorized)
Summary: RFE oo-diagnostics should report when node auth is failing (401 Unauthorized)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 2.0.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Vu Dinh
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-11 21:13 UTC by Luke Meyer
Modified: 2015-12-17 18:11 UTC (History)
6 users (show)

Fixed In Version: openshift-origin-broker-util-1.37.4.1-1.el6op
Doc Type: Bug Fix
Doc Text:
The `oo-diagnostics` tool did not warn users if there were "Broker key authentication failed" errors in log files, which indicated potential key/salt inconsistencies. This is problematic because if gears are created with a mismatched key/salt, future gears become inaccessible due to "401 Unauthorized" errors. This bug fix adds a proper check to the `oo-diagnostics` tool to issue warnings if these errors are listed in log files. Also, a suggested fix is included in the warning message to help users rectify the issue.
Clone Of:
Environment:
Last Closed: 2015-12-17 17:08:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:2666 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 2.2.8 security, bug fix, and enhancement update 2015-12-17 22:07:54 UTC

Description Luke Meyer 2014-02-11 21:13:11 UTC
Description of problem:
For multi-broker deployments (i.e. pretty much any "real" deployments) there is a necessity to have broker authentication keys and salt synchronized across all brokers. This is mentioned obscurely in the docs:
https://access.redhat.com/site/documentation/en-US/OpenShift_Enterprise/2/html-single/Deployment_Guide/index.html#Configuring_the_Required_Services
And now, in the install script:
https://github.com/openshift/openshift-extras/blob/enterprise-2.0/enterprise/install-scripts/openshift.ks#L129 "MANUAL TASKS" #4

If these points are not heeded, the results are painful. When a broker creates a gear, it creates credentials that may be used by the gear to make requests to the broker in order to perform certain operations (autoscaling, Jenkins builds, recording deployments). These requests will fail (401 Unauthorized) when made to a broker that has different auth keys or salt.

About the only sign of the problem (aside from the user experience) is that the broker app log (production.log) records a stack trace with an obscure error (which varies depending on whether it's the auth salt or the keys that don't match). oo-diagnostics should look for this error and FAIL with a recommended fix.

The fix is to synchronize the keys, then use oo-admin-broker-auth to resync all credentials on existing gears.

Steps to Reproduce:
1. Install two broker hosts and one node.
2. Create a scaled app using the first broker.
3. Set the node's node.conf:BROKER_HOST to the second broker.
4. Push a commit to the app's git repo, or scale it up from the head gear (ssh in and haproxy_ctld -u)

The request is rejected and one of the following appears in the production.log:
[ERROR] Broker key authentication failed. bad decrypt
[ERROR] Broker key authentication failed. padding check failed
... with a long stack trace.

To be flexible, oo-diagnostics should check development.log if appropriate. Also, if possible oo-diagnostics should only report this error when it appears after the most recent restart (don't want to keep reporting it after fixed).

Comment 2 Luke Meyer 2014-02-11 21:26:29 UTC
Addendum:

After changing node.conf at step 3, be sure to restart ruby193-mcollective to pick up the change. Or just install it that way to begin with.

Comment 3 Vu Dinh 2015-10-21 19:21:06 UTC
PR <https://github.com/openshift/origin-server/pull/6283> is submitted to fix this bug.

Comment 4 openshift-github-bot 2015-10-22 22:50:45 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/4ef7653a96b4abe74616a588adfe3af466422c1f
Bug 1064039: Add oo-diagnostic report 401 Unauthorized error

The 'oo-diagnostics' currently won't fail if there are 410 Unauthorized
error in the production/development.log. After this change, 'oo-diagnostics'
will check either production or development.log depending on which mode the
broker is in. If the 401 errors exist, an error message will be displayed
with the sugguested solution to fix the issue. The error message may
continue to be displayed in the subsequential 'oo-diagnostics'
until the 401 error message is cleared up in the next log rotation.

Bug <1064039>
Link <https://bugzilla.redhat.com/show_bug.cgi?id=1064039>

Signed-off-by: Vu Dinh <vdinh>

Comment 9 Johnny Liu 2015-11-23 11:22:30 UTC
Verified this bug with rubygem-openshift-origin-common-1.29.4.1-1.el6op.noarch, and PASS.


1. Install two separate broker servers, they will install different authentication keys and salt in broker's configuration 
2. Run oo-diagnostics to make both of the two servers are running well.
3. Against broker1, deploy a scalable app.
4. On node, change /etc/openshift/env/OPENSHIFT_BROKER_HOST and /etc/openshift/node.conf to point broker host to broker2, then restart ruby193-mcollective service.
5. Log into app, run the following command:
[scaphp53app-jialiu.ose-20151123.example.com 5652f2791a5f2141ab000009]\> haproxy_ctld -u
I, [2015-11-23T19:07:31.487712 #28481]  INFO -- : Starting haproxy_ctld
I, [2015-11-23T19:07:31.488385 #28481]  INFO -- : add-gear - capacity: 0.0% gear_count: 1 sessions: 0 up_thresh: 90.0%
I, [2015-11-23T19:07:32.239480 #28481]  INFO -- : add-gear - exit_code: 1  output: Failed to get application info from the broker: 401 Unauthorized

Failed to get application info from the broker: 401 Unauthorized

In broker2's log, get the following error:
2015-11-23 19:07:33.250 [ERROR] Broker key authentication failed. padding check failed

Run oo-diagnostics again, get the following warning:
INFO: running: test_auth_keys_salt_failure
WARN: test_auth_keys_salt_failure
      Detected an '401 Unauthorized' error on the production log. This error
      may indicate inconsistent authentication keys or salts among brokers.
      Please follow below steps to fix the issue:
      1. Manually synchronize the authentication keys and salt in broker's
      configuration
      2. Then use 'oo-admin-broker-auth' to resync all credentials on
      existing gears

      Note: This error message may continue to be displayed even if the issue
      is already resolved as the error message is still in the production log.
      The error will be cleared after the next log rotation.


Change broker to development, the warning is also reproduced. 

Following the warning message, run "oo-admin-broker-auth --rekey-all" on broker, scale up app again, this time it succeed.

Comment 11 errata-xmlrpc 2015-12-17 17:08:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2666.html


Note You need to log in before you can comment on or make changes to this bug.