Bug 1882453 - [Kuryr] kuryr controller and kuryr cni pod unstable on fresh ocp kuryr cluster
Summary: [Kuryr] kuryr controller and kuryr cni pod unstable on fresh ocp kuryr cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Maysa Macedo
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-24 15:41 UTC by Anurag saxena
Modified: 2020-10-27 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:45:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
oc logs kuryr-controller-bc446d496-88x6z -p (15.24 KB, text/plain)
2020-09-24 15:50 UTC, Anurag saxena
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 361 0 None closed Bug 1882453: Ensure LB member is updated if a conflict happens 2020-10-01 15:52:53 UTC
OpenStack gerrit 754423 0 None NEW Ensure LB member is updated if a conflict happens 2020-10-01 15:52:44 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:45:22 UTC

Description Anurag saxena 2020-09-24 15:41:54 UTC
Description of problem: Brought up a fresh OCP cluster with Kuryr network plugin and found kuryr cni and controller pods crashing frequently 

the controller is restarting because it can't create another member for that loadbalancer and loadbalander quota seems to be okay so. May be some kind of race writing into the CRD between service and endpoints handler as discussed with Maysa/Luis

Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-09-22-213802


How reproducible: Intermittent to rarely


Steps to Reproduce:
1.Deploy OCP cluster with kuryr network plugin
2.
3.

Actual results: kuryr pods landing up in crashloopback mode


Expected results:Cluster should be installed successfully


Additional info:

$ oc get pods -n openshift-kuryr
NAME                                   READY   STATUS             RESTARTS   AGE
kuryr-cni-26xx5                        1/1     Running            3          37h
kuryr-cni-669vm                        1/1     Running            5          36h
kuryr-cni-d2spv                        0/1     CrashLoopBackOff   415        36h
kuryr-cni-ghnjq                        1/1     Running            3          37h
kuryr-cni-krvvj                        1/1     Running            3          37h
kuryr-cni-pkd74                        1/1     Running            1          36h
kuryr-controller-bc446d496-88x6z       1/1     Running            7          94m
kuryr-dns-admission-controller-8k7l4   1/1     Running            0          37h
kuryr-dns-admission-controller-f6mk5   1/1     Running            0          37h
kuryr-dns-admission-controller-xx9sg   1/1     Running            0          37h

kuryr-controller events:

Events:
  Type     Reason     Age                     From                                    Message
  ----     ------     ----                    ----                                    -------
  Normal   Pulled     24m (x411 over 36h)     kubelet, wsun0923-2sp2w-worker-0-lb7h9  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:599379455d7b62b6508c4ce92b65cb4145e7050c139809602103661ec5d4a883" already present on machine
  Warning  BackOff    9m29s (x4923 over 31h)  kubelet, wsun0923-2sp2w-worker-0-lb7h9  Back-off restarting failed container
  Warning  Unhealthy  4m36s (x4138 over 31h)  kubelet, wsun0923-2sp2w-worker-0-lb7h9  Liveness probe failed: HTTP probe failed with statuscode: 500
[anusaxen@anusaxen verification-tests]$ oc logs kuryr-controller-bc446d496-88x6z -n openshift-kuryr
2020-09-24 15:24:05.897 1 INFO kuryr_kubernetes.config [-] Logging enabled!
2020-09-24 15:24:05.905 1 INFO kuryr_kubernetes.config [-] /usr/bin/kuryr-k8s-controller version 4.6.0
2020-09-24 15:24:07.912 1 INFO os_vif [-] Loaded VIF plugins: linux_bridge, noop, ovs, noop, sriov
2020-09-24 15:24:07.922 1 INFO kuryr_kubernetes.controller.service [-] Configured handlers: ['vif', 'kuryrport', 'service', 'endpoints', 'kuryrloadbalancer', 'policy', 'pod_label', 'namespace', 'kuryrnetworkpolicy', 'kuryrnetwork']
2020-09-24 15:24:07.958 1 WARNING kuryr_kubernetes.controller.drivers.lbaasv2 [-] [neutron_defaults]resource_tags is set, but Octavia API 2.0 does not support resource tagging. Kuryr will put requested tags in the description field of Octavia resources.
2020-09-24 15:24:08.140 1 INFO kuryr_kubernetes.controller.service [-] Loaded handlers: ['endpoints', 'kuryrloadbalancer', 'kuryrnetwork', 'kuryrnetworkpolicy', 'kuryrport', 'namespace', 'pod_label', 'policy', 'service', 'vif']
2020-09-24 15:24:08.150 1 WARNING oslo_config.cfg [-] Deprecated: Option "sg_mode" from group "octavia_defaults" is deprecated for removal (enforce_sg_rules option can be used instead).  Its value may be silently ignored in the future.
2020-09-24 15:24:08.159 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopped
2020-09-24 15:24:08.160 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' starting
2020-09-24 15:24:08.161 1 INFO kuryr_kubernetes.controller.service [-] Running in non-HA mode, starting watcher immediately.
2020-09-24 15:24:08.177 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/services'
2020-09-24 15:24:08.181 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods'
2020-09-24 15:24:08.197 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrports'
2020-09-24 15:24:08.209 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrloadbalancers'
2020-09-24 15:24:08.219 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrnetworkpolicies'
2020-09-24 15:24:08.223 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrnetworks'
2020-09-24 15:24:08.233 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/endpoints'
2020-09-24 15:24:08.246 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/namespaces'
2020-09-24 15:24:08.251 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/networking.k8s.io/v1/networkpolicies'
2020-09-24 15:24:24.469 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53
2020-09-24 15:25:09.965 1 INFO kuryr_kubernetes.controller.drivers.vif_pool [-] PORTS POOL: pools updated with pre-created ports
2020-09-24 15:25:09.967 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' started
2020-09-24 15:25:09.971 1 INFO kuryr_kubernetes.controller.managers.prometheus_exporter [-] Starting Prometheus exporter
 * Serving Flask app "prometheus-exporter" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
2020-09-24 15:25:09.974 1 INFO werkzeug [-]  * Running on http://[::]:9654/ (Press CTRL+C to quit)
2020-09-24 15:25:09.974 1 INFO kuryr_kubernetes.health [-] Starting controller-health health check server on :::8091.
 * Serving Flask app "controller-health" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
2020-09-24 15:26:25.880 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53
2020-09-24 15:28:26.397 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53
2020-09-24 15:30:25.849 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53
2020-09-24 15:32:25.872 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53

Comment 1 Anurag saxena 2020-09-24 15:50:11 UTC
Created attachment 1716353 [details]
oc logs kuryr-controller-bc446d496-88x6z -p

Comment 2 Luis Tomas Bolivar 2020-09-25 06:57:07 UTC
This is running on OSP 13

Comment 4 Jon Uriarte 2020-10-02 11:09:54 UTC
Verified in:
4.6.0-0.nightly-2020-10-02-001427
OSP 13 2020-09-16.1

on OSASINFRA team hybrid deployments.

The installation was successful and the kuryr pods are running ok without crashloops.
$ oc get pods -n openshift-kuryr
NAME                               READY   STATUS    RESTARTS   AGE
kuryr-cni-2prfj                    1/1     Running   1          45m
kuryr-cni-4l5p9                    1/1     Running   0          45m
kuryr-cni-cgflf                    1/1     Running   0          30m
kuryr-cni-j28qm                    1/1     Running   0          33m
kuryr-cni-k68zp                    1/1     Running   0          33m
kuryr-cni-vtwg2                    1/1     Running   1          45m
kuryr-controller-9999f7ffd-ttsqm   1/1     Running   1          45m

Please feel free to reopen it again if it's seen in PSI environment.

Comment 7 errata-xmlrpc 2020-10-27 16:45:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.