Bug 1594187
Summary: | Openshift-on-OpenStack playbook increase watch_retry_timeout for kuryr-cni | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jon Uriarte <juriarte> |
Component: | Installer | Assignee: | MichaĆ Dulko <mdulko> |
Status: | CLOSED ERRATA | QA Contact: | Jon Uriarte <juriarte> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3.10.0 | CC: | aos-bugs, jokerman, juriarte, mmccomas, tsedovic, vlaad |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 3.10.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Kuryr was only retrying connections to OpenShift API for 60 seconds.
Consequence: When OpenShift API outage lasted longer than 60 seconds Kuryr pods were stopping retrying but wasn't actually stopping pod execution. This led to pods being alive, but not functional at all.
Fix: Increase the 60 seconds timeout to 3600 seconds.
Result: This makes Kuryr services retry connections for an hour, which is virtually forever (if OpenShift API has an hour-long outage, there's definitely some major issue outside of Kuryr).
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-11 16:39:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jon Uriarte
2018-06-22 10:56:16 UTC
https://github.com/openshift/openshift-ansible/pull/8952 release-3.10 backport already merged Verified in openshift-ansible-3.10.59-1.git.0.f9ba890.el7.noarch on OSP 13 2018-10-02.1 puddle. OCP on OSP installation playbooks do end successfully and all the pods are in Running status. $ oc get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE default docker-registry-1-gp9qg 1/1 Running 0 5h 10.11.0.7 infra-node-0.openshift.example.com default registry-console-1-pfrcm 1/1 Running 0 5h 10.11.0.18 master-0.openshift.example.com default router-1-hs54c 1/1 Running 0 5h 192.168.99.5 infra-node-0.openshift.example.com kube-system master-api-master-0.openshift.example.com 1/1 Running 1 5h 192.168.99.14 master-0.openshift.example.com kube-system master-controllers-master-0.openshift.example.com 1/1 Running 1 5h 192.168.99.14 master-0.openshift.example.com kube-system master-etcd-master-0.openshift.example.com 1/1 Running 1 5h 192.168.99.14 master-0.openshift.example.com openshift-infra kuryr-cni-ds-27tvb 2/2 Running 0 5h 192.168.99.14 master-0.openshift.example.com openshift-infra kuryr-cni-ds-llwgw 2/2 Running 0 5h 192.168.99.10 app-node-0.openshift.example.com openshift-infra kuryr-cni-ds-ngvcz 2/2 Running 0 5h 192.168.99.5 infra-node-0.openshift.example.com openshift-infra kuryr-cni-ds-rs2h4 2/2 Running 0 5h 192.168.99.13 app-node-1.openshift.example.com openshift-infra kuryr-controller-59fc7f478b-q6bxt 1/1 Running 0 5h 192.168.99.13 app-node-1.openshift.example.com openshift-node sync-8nfc9 1/1 Running 0 5h 192.168.99.10 app-node-0.openshift.example.com openshift-node sync-qlkx6 1/1 Running 0 5h 192.168.99.5 infra-node-0.openshift.example.com openshift-node sync-t7c7z 1/1 Running 0 5h 192.168.99.14 master-0.openshift.example.com openshift-node sync-vrldf 1/1 Running 0 5h 192.168.99.13 app-node-1.openshift.example.com $ oc -n openshift-infra get configmap kuryr-config -o yaml | grep watch_retry watch_retry_timeout = 3600 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2709 |