Bug 1991537 - Kuryr controller in CrashLoopBackOff due to missing loadbalancer
Summary: Kuryr controller in CrashLoopBackOff due to missing loadbalancer
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: 3.11.z
Assignee: Michał Dulko
QA Contact: Itzik Brown
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-09 11:39 UTC by Mohammad
Modified: 2024-10-01 19:10 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-28 15:58:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 548 0 None None None 2021-08-17 15:40:54 UTC
Red Hat Product Errata RHSA-2021:3915 0 None None None 2021-10-28 15:58:27 UTC

Description Mohammad 2021-08-09 11:39:09 UTC
Description of problem: Kuryr controller on OCP 3.11.452 in CrashLoopBackOff due to missing loadbalancer.


Version-Release number of selected component (if applicable): OCP 3.11.452 on OSP13r13


How reproducible: Install OCP 3.11.452 on OSP13z13, create a namespace with multiple services and pods and leave it running. After a while (3 days for me) I found kuryr controller crashing because loadbalancers don't exist.

Expected results: Loadbalancers should not disappear, or at least, get re-created by kuryr, if possible.


Additional info: Attaching logs of kuryr controller, service and endpoints for a service that doesn't have a corresponding Octavia loadbalancer anymore.

Comment 11 Itzik Brown 2021-10-04 09:04:53 UTC
- Made sure the Kuryr controller code had the following patch:https://github.com/openshift/kuryr-kubernetes/pull/572
- All tempest tests passed 

- Simulated by setting a LB moving to an ERROR state and restarting the Kuryr controller and make sure it's ready

Created a service:
apiVersion: v1
kind: Service
metadata:
  name: demo
labels:
  app: demo
spec:
  selector:                  
    app: demo
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080


Created a deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo
  labels:
    app: demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
      - name: demo
        image: kuryr/demo
        ports:
        - containerPort: 8080


Set the LB state to ERROR:
 source ~/stackrc && ssh heat-admin@$(openstack server list -f value -c Name -c Networks | grep controller-0 | awk -F= '{print $2}')
 sudo docker exec -uroot -it galera-bundle-docker-0 mysql
 MariaDB [(none)]>use octavia;
 MariaDB [(none)]> UPDATE load_balancer SET  provisioning_status='ERROR' WHERE name='default/demo';

Restart the Kuryr controller and make sure it's ready

Version:
OSP13 2021-09-20.1
OCP v3.11.524

Comment 14 errata-xmlrpc 2021-10-28 15:58:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 3.11.542 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3915


Note You need to log in before you can comment on or make changes to this bug.