Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1410899 - Metrics - Could not acquire a Kubernetes client connection
Metrics - Could not acquire a Kubernetes client connection
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular (Show other bugs)
3.3.1
Unspecified Unspecified
medium Severity high
: ---
: 3.4.z
Assigned To: Matt Wringe
Mike Fiedler
:
Depends On:
Blocks: 1448999
  Show dependency treegraph
 
Reported: 2017-01-06 14:04 EST by Eric Jones
Modified: 2018-07-22 22:35 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When authenticating users, Hawkular Metrics was not properly handling error response back from the OpenShift master for a subjectaccessreview. Consequence: If the authentication token passed was not valid, the connection to Hawkular Metrics would stay open until a timeout. Fix: Hawkular Metrics now properly handles an error response back from the OpenShift server and closes the connection. Result: If a user passes an invalid token, their connection will close properly and not remain open until a timeout.
Story Points: ---
Clone Of:
: 1448999 (view as bug list)
Environment:
Last Closed: 2017-01-31 15:19:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0218 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4.1.2 bug fix update 2017-01-31 20:18:20 EST

  None (edit)
Description Eric Jones 2017-01-06 14:04:28 EST
Description of problem:
After upping the cassandra and hawkular-metrics pods from 1 up to 3 replicas, hawkular-metrics periodically will become unreachable.

Gateway timeout in the WebUI and "Could not acquire a Kubernetes client connection" from curling it directly.

They appear to be able to restore normal usage temporarily by restarting the pods.

After discussing things in an internal mailing list [0], it was recommended that they get a bugzilla opened.

[0] http://post-office.corp.redhat.com/archives/openshift-sme/2017-January/msg00178.html

Version-Release number of selected component (if applicable):

Steps to Reproduce:
Deploy metrics on OCP 3.3 via Ansible and post installation scale cassandra (via cassandra-node template) to 3 pods and the hawkular metrics pod to 3.

Once the pods have been deployed, cURL the 3 pods in a loop until failure.

Customer has provided tcpdumps, thread dumps, and a little more other information that will be provided in a private update shortly.
Comment 11 Peng Li 2017-01-25 08:32:26 EST
set up a cluster with 1 master and 3 nodes, vm type: m3.large, installed 3.4.1 metrics, and observe UI for 2hours, issue is not reproduced.

# openshift version
openshift v3.4.1.2
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

#image
metrics-hawkular-metrics   3.4.1               ea4c68d376ca        18 hours ago        1.5 GB


# oc get pod
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-48ofe   1/1       Running   0          3h
hawkular-cassandra-2-rlzbk   1/1       Running   1          3h
hawkular-cassandra-3-3peho   1/1       Running   0          3h
hawkular-metrics-dka4f       1/1       Running   0          2h
hawkular-metrics-n5mgv       1/1       Running   0          3h
hawkular-metrics-tvy9w       1/1       Running   0          2h
Comment 19 Peter Ruan 2017-01-30 18:01:25 EST
marking bug as verified per comment #18
Comment 20 Mike Fiedler 2017-01-30 19:48:59 EST
Verified on 3.4.1.2.  Requests with invalid tokens no longer hang indefinitely.   Also tested were missing tokens and invalid endpoints, both of which worked well.
Comment 22 errata-xmlrpc 2017-01-31 15:19:37 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0218

Note You need to log in before you can comment on or make changes to this bug.