Bug 1454858 - [paid][free][online-int][starter-us-east-1] Registry liveness probe failures for http2: no cached connection was available
Summary: [paid][free][online-int][starter-us-east-1] Registry liveness probe failures ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.6.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.7.0
Assignee: Seth Jennings
QA Contact: Mike Fiedler
URL:
Whiteboard:
: 1466035 (view as bug list)
Depends On:
Blocks: 1608360 1633769
TreeView+ depends on / blocked
 
Reported: 2017-05-23 15:09 UTC by Mike Fiedler
Modified: 2020-08-13 09:14 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
: 1608360 1633769 (view as bug list)
Environment:
Last Closed: 2017-11-28 21:56:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Mike Fiedler 2017-05-23 15:09:45 UTC
Description of problem:

The registry pods in online-int (paid integration env) are reporting semi-frequent liveness and readiness health probe failures with the message "http2: no cached connection was available"

The pods are not restarting, so the problem may be harmless.

https://github.com/golang/go/issues/16582

Registry pod 1
Events:
  FirstSeen     LastSeen        Count   From                                    SubObjectPath         Type             Reason          Message
  ---------     --------        -----   ----                                    -------------         -------- ------          -------
  12d           1h              158     {kubelet ip-172-31-55-13.ec2.internal}  spec.containers{registry}      Warning         Unhealthy       Liveness probe failed: Get https://10.1.5.120:5000/healthz: http2: no cached connection was available
  12d           50m             149     {kubelet ip-172-31-55-13.ec2.internal}  spec.containers{registry}      Warning         Unhealthy       Readiness probe failed: Get https://10.1.5.120:5000/healthz: http2: no cached connection was available

and Registry pod 2

Events:
  FirstSeen     LastSeen        Count   From                                    SubObjectPath                   Type            Reason          Message
  ---------     --------        -----   ----                                    -------------                   --------        ------          -------
  6d            55m             35      {kubelet ip-172-31-55-12.ec2.internal}  spec.containers{registry}       Warning         Unhealthy       Liveness probe failed: Get https://10.1.8.182:5000/healthz: http2: no cached connection was available


Version-Release number of selected component (if applicable): 3.5.5.10


How reproducible:  Seems to be happening at least once an hour.   Will monitor.

Comment 1 Mike Fiedler 2017-05-23 15:10:08 UTC
Possibly https://github.com/golang/go/issues/16582?

Comment 2 Derek Carr 2017-05-25 15:39:53 UTC
This should be fixed in the move to go 1.8 per https://github.com/golang/go/commit/7a622740655bb5fcbd160eb96887032314842e6e

As a result, this should resolve when move to kube 1.7.

Comment 3 Stefanie Forrester 2017-06-20 15:41:56 UTC
I'm seeing this frequently in prod, on starter-us-east-1 too. It happens each time I deploy the router or registry pods.

Comment 4 Derek Carr 2017-06-28 20:16:05 UTC
*** Bug 1466035 has been marked as a duplicate of this bug. ***

Comment 6 DeShuai Ma 2017-09-13 02:52:56 UTC
Could you help verify the bug? thanks

Comment 7 Mike Fiedler 2017-09-18 17:50:49 UTC
There is no online environment available with 3.7 on it yet.   Moving to POST since it is fixed upstream.

Comment 8 Seth Jennings 2017-09-26 20:57:55 UTC
This does not appear to be fixed in Go 1.8.  Furthermore, the linked commit is also included in Go 1.7.

$ oc version
oc v3.7.0-0.127.0
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO
 
Server https://master.lab.variantweb.net:8443
openshift v3.7.0-0.127.0
kubernetes v1.7.0+80709908fd

$ oc get events
LASTSEEN   FIRSTSEEN   COUNT     NAME                      KIND      SUBOBJECT                   TYPE      REASON      SOURCE                              MESSAGE
1h         17h         31        docker-registry-1-rn2c6   Pod       spec.containers{registry}   Warning   Unhealthy   kubelet, infra.lab.variantweb.net   Liveness probe failed: Get https://10.128.0.4:5000/healthz: http2: no cached connection was available
26m        17h         48        docker-registry-1-rn2c6   Pod       spec.containers{registry}   Warning   Unhealthy   kubelet, infra.lab.variantweb.net   Readiness probe failed: Get https://10.128.0.4:5000/healthz: http2: no cached connection was available

Comment 9 Seth Jennings 2017-09-26 21:23:46 UTC
Upstream kube issue:
https://github.com/kubernetes/kubernetes/issues/49740

Comment 12 Mike Fiedler 2017-10-31 19:03:58 UTC
Verfied on 3.7.0-0.188.0.   During registry stress testing with 250 and 500 concurrent builds, the message is no longer seen.

Comment 15 errata-xmlrpc 2017-11-28 21:56:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.