1454858 – [paid][free][online-int][starter-us-east-1] Registry liveness probe failures for http2: no cached connection was available

Bug 1454858 - [paid][free][online-int][starter-us-east-1] Registry liveness probe failures for http2: no cached connection was available

Summary: [paid][free][online-int][starter-us-east-1] Registry liveness probe failures ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.6.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Seth Jennings
QA Contact:	Mike Fiedler
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1466035 (view as bug list)
Depends On:
Blocks:	1608360 1633769
TreeView+	depends on / blocked

Reported:	2017-05-23 15:09 UTC by Mike Fiedler
Modified:	2020-08-13 09:14 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Clones:	1608360 1633769 (view as bug list)
Environment:
Last Closed:	2017-11-28 21:56:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description Mike Fiedler 2017-05-23 15:09:45 UTC

Description of problem:

The registry pods in online-int (paid integration env) are reporting semi-frequent liveness and readiness health probe failures with the message "http2: no cached connection was available"

The pods are not restarting, so the problem may be harmless.

https://github.com/golang/go/issues/16582

Registry pod 1
Events:
  FirstSeen     LastSeen        Count   From                                    SubObjectPath         Type             Reason          Message
  ---------     --------        -----   ----                                    -------------         -------- ------          -------
  12d           1h              158     {kubelet ip-172-31-55-13.ec2.internal}  spec.containers{registry}      Warning         Unhealthy       Liveness probe failed: Get https://10.1.5.120:5000/healthz: http2: no cached connection was available
  12d           50m             149     {kubelet ip-172-31-55-13.ec2.internal}  spec.containers{registry}      Warning         Unhealthy       Readiness probe failed: Get https://10.1.5.120:5000/healthz: http2: no cached connection was available

and Registry pod 2

Events:
  FirstSeen     LastSeen        Count   From                                    SubObjectPath                   Type            Reason          Message
  ---------     --------        -----   ----                                    -------------                   --------        ------          -------
  6d            55m             35      {kubelet ip-172-31-55-12.ec2.internal}  spec.containers{registry}       Warning         Unhealthy       Liveness probe failed: Get https://10.1.8.182:5000/healthz: http2: no cached connection was available


Version-Release number of selected component (if applicable): 3.5.5.10


How reproducible:  Seems to be happening at least once an hour.   Will monitor.

Comment 1 Mike Fiedler 2017-05-23 15:10:08 UTC

Possibly https://github.com/golang/go/issues/16582?

Comment 2 Derek Carr 2017-05-25 15:39:53 UTC

This should be fixed in the move to go 1.8 per https://github.com/golang/go/commit/7a622740655bb5fcbd160eb96887032314842e6e

As a result, this should resolve when move to kube 1.7.

Comment 3 Stefanie Forrester 2017-06-20 15:41:56 UTC

I'm seeing this frequently in prod, on starter-us-east-1 too. It happens each time I deploy the router or registry pods.

Comment 4 Derek Carr 2017-06-28 20:16:05 UTC

*** Bug 1466035 has been marked as a duplicate of this bug. ***

Comment 6 DeShuai Ma 2017-09-13 02:52:56 UTC

Could you help verify the bug? thanks

Comment 7 Mike Fiedler 2017-09-18 17:50:49 UTC

There is no online environment available with 3.7 on it yet.   Moving to POST since it is fixed upstream.

Comment 8 Seth Jennings 2017-09-26 20:57:55 UTC

This does not appear to be fixed in Go 1.8.  Furthermore, the linked commit is also included in Go 1.7.

$ oc version
oc v3.7.0-0.127.0
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO
 
Server https://master.lab.variantweb.net:8443
openshift v3.7.0-0.127.0
kubernetes v1.7.0+80709908fd

$ oc get events
LASTSEEN   FIRSTSEEN   COUNT     NAME                      KIND      SUBOBJECT                   TYPE      REASON      SOURCE                              MESSAGE
1h         17h         31        docker-registry-1-rn2c6   Pod       spec.containers{registry}   Warning   Unhealthy   kubelet, infra.lab.variantweb.net   Liveness probe failed: Get https://10.128.0.4:5000/healthz: http2: no cached connection was available
26m        17h         48        docker-registry-1-rn2c6   Pod       spec.containers{registry}   Warning   Unhealthy   kubelet, infra.lab.variantweb.net   Readiness probe failed: Get https://10.128.0.4:5000/healthz: http2: no cached connection was available

Comment 9 Seth Jennings 2017-09-26 21:23:46 UTC

Upstream kube issue:
https://github.com/kubernetes/kubernetes/issues/49740

Comment 10 Seth Jennings 2017-10-02 03:50:57 UTC

Kube PR:
https://github.com/kubernetes/kubernetes/pull/53318

Origin PR:
https://github.com/openshift/origin/pull/16633

Comment 12 Mike Fiedler 2017-10-31 19:03:58 UTC

Verfied on 3.7.0-0.188.0.   During registry stress testing with 250 and 500 concurrent builds, the message is no longer seen.

Comment 15 errata-xmlrpc 2017-11-28 21:56:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.