Bug 1749209

Summary: [IPI] [OSP] Kuryr - pods not getting annotated in Kuryr Controller due to watcher stopped
Product: OpenShift Container Platform Reporter: Udi Shkalim <ushkalim>
Component: NetworkingAssignee: Michał Dulko <mdulko>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: asegurap, ltomasbo
Version: 4.2.0Keywords: Triaged
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:40:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udi Shkalim 2019-09-05 07:09:03 UTC
Description of problem:
Installation failed to complete.
Seems like in the HA setups kuryr-controller's watchers die after a while (typically when left overnight) and never get recreated. This leads to pods not getting annotated.

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-09-04-102339

How reproducible:
1/4

Steps to Reproduce:
1. Deploy IPI OSP + kuryr
2. 
3.

Actual results:
Installation failed

Expected results:
Installation pass

Additional info:
[stack@undercloud-0 ~]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          12h     Unable to apply 4.2.0-0.nightly-2019-09-04-102339: an unknown error has occurred

[stack@undercloud-0 ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                           4.2.0-0.nightly-2019-09-04-102339   True        False         False      12h
dns                                        4.2.0-0.nightly-2019-09-04-102339   True        False         False      12h
insights                                   4.2.0-0.nightly-2019-09-04-102339   True        False         True       12h
kube-apiserver                             4.2.0-0.nightly-2019-09-04-102339   True        False         False      11h
kube-controller-manager                    4.2.0-0.nightly-2019-09-04-102339   True        False         False      11h
kube-scheduler                             4.2.0-0.nightly-2019-09-04-102339   True        False         False      11h
machine-api                                4.2.0-0.nightly-2019-09-04-102339   True        False         False      12h
machine-config                             4.2.0-0.nightly-2019-09-04-102339   True        False         False      11h
network                                    4.2.0-0.nightly-2019-09-04-102339   True        False         False      12h
openshift-apiserver                        4.2.0-0.nightly-2019-09-04-102339   False       False         False      11h
openshift-controller-manager               4.2.0-0.nightly-2019-09-04-102339   True        False         False      12h
operator-lifecycle-manager                 4.2.0-0.nightly-2019-09-04-102339   True        False         False      12h
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-09-04-102339   True        False         False      12h
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-09-04-102339   True        False         False      11h
service-ca                                 4.2.0-0.nightly-2019-09-04-102339   True        False         False      12h



[stack@undercloud-0 ~]$ oc -n openshift-kuryr get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kuryr-cni-b5978                        1/1     Running   0          12h
kuryr-cni-fnp9r                        1/1     Running   0          11h
kuryr-cni-gpf9x                        1/1     Running   3          12h
kuryr-cni-hrc5x                        1/1     Running   1          12h
kuryr-cni-qvq7s                        1/1     Running   0          12h
kuryr-cni-vc9v7                        1/1     Running   3          12h
kuryr-controller-77c68665db-vm2m7      1/1     Running   6          12h
kuryr-dns-admission-controller-6ssxj   1/1     Running   0          12h
kuryr-dns-admission-controller-grzkr   1/1     Running   0          12h
kuryr-dns-admission-controller-gw9h7   1/1     Running   0          12h

Comment 1 Michał Dulko 2019-09-10 10:03:01 UTC
The fix is now merged into openshift/kuryr-kubernetes.

Comment 3 Udi Shkalim 2019-10-06 13:10:19 UTC
Verified on 4.2.0-0.nightly-2019-10-02-122541
Installation completed multiple times.

[stack@undercloud-0 ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
cloud-credential                           4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
cluster-autoscaler                         4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
console                                    4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
dns                                        4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
image-registry                             4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
ingress                                    4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
insights                                   4.2.0-0.nightly-2019-10-02-122541   True        False         True       3d20h
kube-apiserver                             4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
kube-controller-manager                    4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
kube-scheduler                             4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
machine-api                                4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
machine-config                             4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
marketplace                                4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d18h
monitoring                                 4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d18h
network                                    4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
node-tuning                                4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
openshift-apiserver                        4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
openshift-controller-manager               4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
openshift-samples                          4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
operator-lifecycle-manager                 4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d1h
service-ca                                 4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d20h
service-catalog-apiserver                  4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
service-catalog-controller-manager         4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h
storage                                    4.2.0-0.nightly-2019-10-02-122541   True        False         False      3d19h

Comment 4 errata-xmlrpc 2019-10-16 06:40:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922