Description of problem: Brought up a fresh OCP cluster with Kuryr network plugin and found kuryr cni and controller pods crashing frequently the controller is restarting because it can't create another member for that loadbalancer and loadbalander quota seems to be okay so. May be some kind of race writing into the CRD between service and endpoints handler as discussed with Maysa/Luis Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-09-22-213802 How reproducible: Intermittent to rarely Steps to Reproduce: 1.Deploy OCP cluster with kuryr network plugin 2. 3. Actual results: kuryr pods landing up in crashloopback mode Expected results:Cluster should be installed successfully Additional info: $ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-26xx5 1/1 Running 3 37h kuryr-cni-669vm 1/1 Running 5 36h kuryr-cni-d2spv 0/1 CrashLoopBackOff 415 36h kuryr-cni-ghnjq 1/1 Running 3 37h kuryr-cni-krvvj 1/1 Running 3 37h kuryr-cni-pkd74 1/1 Running 1 36h kuryr-controller-bc446d496-88x6z 1/1 Running 7 94m kuryr-dns-admission-controller-8k7l4 1/1 Running 0 37h kuryr-dns-admission-controller-f6mk5 1/1 Running 0 37h kuryr-dns-admission-controller-xx9sg 1/1 Running 0 37h kuryr-controller events: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 24m (x411 over 36h) kubelet, wsun0923-2sp2w-worker-0-lb7h9 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:599379455d7b62b6508c4ce92b65cb4145e7050c139809602103661ec5d4a883" already present on machine Warning BackOff 9m29s (x4923 over 31h) kubelet, wsun0923-2sp2w-worker-0-lb7h9 Back-off restarting failed container Warning Unhealthy 4m36s (x4138 over 31h) kubelet, wsun0923-2sp2w-worker-0-lb7h9 Liveness probe failed: HTTP probe failed with statuscode: 500 [anusaxen@anusaxen verification-tests]$ oc logs kuryr-controller-bc446d496-88x6z -n openshift-kuryr 2020-09-24 15:24:05.897 1 INFO kuryr_kubernetes.config [-] Logging enabled! 2020-09-24 15:24:05.905 1 INFO kuryr_kubernetes.config [-] /usr/bin/kuryr-k8s-controller version 4.6.0 2020-09-24 15:24:07.912 1 INFO os_vif [-] Loaded VIF plugins: linux_bridge, noop, ovs, noop, sriov 2020-09-24 15:24:07.922 1 INFO kuryr_kubernetes.controller.service [-] Configured handlers: ['vif', 'kuryrport', 'service', 'endpoints', 'kuryrloadbalancer', 'policy', 'pod_label', 'namespace', 'kuryrnetworkpolicy', 'kuryrnetwork'] 2020-09-24 15:24:07.958 1 WARNING kuryr_kubernetes.controller.drivers.lbaasv2 [-] [neutron_defaults]resource_tags is set, but Octavia API 2.0 does not support resource tagging. Kuryr will put requested tags in the description field of Octavia resources. 2020-09-24 15:24:08.140 1 INFO kuryr_kubernetes.controller.service [-] Loaded handlers: ['endpoints', 'kuryrloadbalancer', 'kuryrnetwork', 'kuryrnetworkpolicy', 'kuryrport', 'namespace', 'pod_label', 'policy', 'service', 'vif'] 2020-09-24 15:24:08.150 1 WARNING oslo_config.cfg [-] Deprecated: Option "sg_mode" from group "octavia_defaults" is deprecated for removal (enforce_sg_rules option can be used instead). Its value may be silently ignored in the future. 2020-09-24 15:24:08.159 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopped 2020-09-24 15:24:08.160 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' starting 2020-09-24 15:24:08.161 1 INFO kuryr_kubernetes.controller.service [-] Running in non-HA mode, starting watcher immediately. 2020-09-24 15:24:08.177 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/services' 2020-09-24 15:24:08.181 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods' 2020-09-24 15:24:08.197 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrports' 2020-09-24 15:24:08.209 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrloadbalancers' 2020-09-24 15:24:08.219 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrnetworkpolicies' 2020-09-24 15:24:08.223 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrnetworks' 2020-09-24 15:24:08.233 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/endpoints' 2020-09-24 15:24:08.246 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/namespaces' 2020-09-24 15:24:08.251 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/networking.k8s.io/v1/networkpolicies' 2020-09-24 15:24:24.469 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53 2020-09-24 15:25:09.965 1 INFO kuryr_kubernetes.controller.drivers.vif_pool [-] PORTS POOL: pools updated with pre-created ports 2020-09-24 15:25:09.967 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' started 2020-09-24 15:25:09.971 1 INFO kuryr_kubernetes.controller.managers.prometheus_exporter [-] Starting Prometheus exporter * Serving Flask app "prometheus-exporter" (lazy loading) * Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Debug mode: off 2020-09-24 15:25:09.974 1 INFO werkzeug [-] * Running on http://[::]:9654/ (Press CTRL+C to quit) 2020-09-24 15:25:09.974 1 INFO kuryr_kubernetes.health [-] Starting controller-health health check server on :::8091. * Serving Flask app "controller-health" (lazy loading) * Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Debug mode: off 2020-09-24 15:26:25.880 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53 2020-09-24 15:28:26.397 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53 2020-09-24 15:30:25.849 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53 2020-09-24 15:32:25.872 1 WARNING kuryr_kubernetes.controller.handlers.loadbalancer [-] Skipping listener creation for openshift-dns/dns-default:UDP as another one already exists with port 53
Created attachment 1716353 [details] oc logs kuryr-controller-bc446d496-88x6z -p
This is running on OSP 13
Verified in: 4.6.0-0.nightly-2020-10-02-001427 OSP 13 2020-09-16.1 on OSASINFRA team hybrid deployments. The installation was successful and the kuryr pods are running ok without crashloops. $ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-2prfj 1/1 Running 1 45m kuryr-cni-4l5p9 1/1 Running 0 45m kuryr-cni-cgflf 1/1 Running 0 30m kuryr-cni-j28qm 1/1 Running 0 33m kuryr-cni-k68zp 1/1 Running 0 33m kuryr-cni-vtwg2 1/1 Running 1 45m kuryr-controller-9999f7ffd-ttsqm 1/1 Running 1 45m Please feel free to reopen it again if it's seen in PSI environment.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196