Bug 1884584 - Kuryr controller continuously restarting due to unable to clean up Network Policy
Summary: Kuryr controller continuously restarting due to unable to clean up Network Po...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: rdobosz
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks: 1896370
TreeView+ depends on / blocked
 
Reported: 2020-10-02 11:43 UTC by Maysa Macedo
Modified: 2021-02-24 15:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:22:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 398 0 None closed Bug 1884584: Handle None or {} labels in match_selector() 2021-01-18 09:06:00 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:22:54 UTC

Description Maysa Macedo 2020-10-02 11:43:10 UTC
Description of problem:

The following happened while running network policy tests with tempest:

2020-10-02 11:12:10.821 1 INFO kuryr_kubernetes.controller.drivers.vif_pool [-] Not enough time since the last pool update
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'ADDED', 'object': {'apiVersion': 'openstack.org/v1', 'kind': 'KuryrNetworkPolicy', 'metadata': {'annotations': {'networkPolicyLink': '/apis/networking.k8s.io/v1/namespaces/default/networkpolicies/kuryr-network-policy-1207324344'}, 'creationTimestamp': '2020-10-02T09:29:02Z', 'deletionGracePeriodSeconds': 0, 'deletionTimestamp': '2020-10-02T09:29:13Z', 'finalizers': ['kuryr.openstack.org/networkpolicy-finalizer'], 'generation': 12, 'managedFields': [{'apiVersion': 'openstack.org/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'.': {}, 'f:networkPolicyLink': {}}, 'f:finalizers': {'.': {}, 'v:"kuryr.openstack.org/networkpolicy-finalizer"': {}}}, 'f:spec': {'.': {}, 'f:egressSgRules': {}, 'f:ingressSgRules': {}, 'f:podSelector': {'.': {}, 'f:matchLabels': {}}, 'f:policyTypes': {}}, 'f:status': {'.': {}, 'f:podSelector': {'.': {}, 'f:matchLabels': {}}, 'f:securityGroupId': {}, 'f:securityGroupRules': {}}}, 'manager': 'python-requests', 'operation': 'Update', 'time': '2020-10-02T09:29:12Z'}], 'name': 'kuryr-network-policy-1207324344', 'namespace': 'default', 'resourceVersion': '71291', 'selfLink': '/apis/openstack.org/v1/namespaces/default/kuryrnetworkpolicies/kuryr-network-policy-1207324344', 'uid': 'ca93b3ec-0562-47bc-b7f1-b5672f858e15'}, 'spec': {'egressSgRules': [{'sgRule': {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'egress', 'ethertype': 'IPv4', 'port_range_max': 65535, 'port_range_min': 1, 'protocol': 'tcp'}}, {'sgRule': {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'egress', 'ethertype': 'IPv6', 'port_range_max': 65535, 'port_range_min': 1, 'protocol': 'tcp'}}], 'ingressSgRules': [{'sgRule': {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'ingress', 'ethertype': 'IPv4', 'port_range_max': 65535, 'port_range_min': 1, 'protocol': 'tcp'}}, {'sgRule': {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'ingress', 'ethertype': 'IPv6', 'port_range_max': 65535, 'port_range_min': 1, 'protocol': 'tcp'}}, {'sgRule': {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'ingress', 'ethertype': 'IPv4', 'remote_ip_prefix': '172.30.0.0/15'}}, {'sgRule': {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'ingress', 'ethertype': 'IPv4', 'remote_ip_prefix': '10.196.0.0/16'}}], 'podSelector': {'matchLabels': {'context': 'demo'}}, 'policyTypes': ['Ingress', 'Egress']}, 'status': {'podSelector': {'matchLabels': {'context': 'demo'}}, 'securityGroupId': '5164f282-7a9c-4e2e-a800-cda96de51203', 'securityGroupRules': [{'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'ingress', 'ethertype': 'IPv4', 'id': '5bd83ab9-cfa6-4203-b8e9-58d4f6e550f2', 'port_range_max': 65535, 'port_range_min': 1, 'protocol': 'tcp', 'security_group_id': '5164f282-7a9c-4e2e-a800-cda96de51203'}, {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'ingress', 'ethertype': 'IPv6', 'id': '73628eac-2598-4477-a70a-02e0df99e558', 'port_range_max': 65535, 'port_range_min': 1, 'protocol': 'tcp', 'security_group_id': '5164f282-7a9c-4e2e-a800-cda96de51203'}, {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'ingress', 'ethertype': 'IPv4', 'id': '06030704-4758-44a2-9821-da9ce96d524c', 'remote_ip_prefix': '172.30.0.0/15', 'security_group_id': '5164f282-7a9c-4e2e-a800-cda96de51203'}, {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'ingress', 'ethertype': 'IPv4', 'id': '79584ec1-b570-4789-8dc6-c7be4c4e4569', 'remote_ip_prefix': '10.196.0.0/16', 'security_group_id': '5164f282-7a9c-4e2e-a800-cda96de51203'}, {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'egress', 'ethertype': 'IPv4', 'id': 'a53153bd-d22d-4b37-8178-34ba33a02054', 'port_range_max': 65535, 'port_range_min': 1, 'protocol': 'tcp', 'security_group_id': '5164f282-7a9c-4e2e-a800-cda96de51203'}, {'description': 'Kuryr-Kubernetes NetPolicy SG rule', 'direction': 'egress', 'ethertype': 'IPv6', 'id': '9fa026b3-e327-4a9f-88bf-6664146b095f', 'port_range_max': 65535, 'port_range_min': 1, 'protocol': 'tcp', 'security_group_id': '5164f282-7a9c-4e2e-a800-cda96de51203'}]}}}: kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: '5164f282-7a9c-4e2e-a800-cda96de51203'
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/network_policy.py", line 617, in delete_np_sg
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     self.os_net.delete_security_group(sg_id)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/openstack/network/v2/_proxy.py", line 3113, in delete_security_group
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     ignore_missing=ignore_missing, if_revision=if_revision)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 46, in check
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     return method(self, expected, actual, *args, **kwargs)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/openstack/network/v2/_proxy.py", line 75, in _delete
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     rv = res.delete(self, if_revision=if_revision)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/openstack/resource.py", line 1622, in delete
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     self._translate_response(response, has_body=False, **kwargs)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/openstack/resource.py", line 1113, in _translate_response
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     exceptions.raise_from_response(response, error_message=error_message)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/openstack/exceptions.py", line 235, in raise_from_response
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     http_status=http_status, request_id=request_id
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://10.46.44.75:13696/v2.0/security-groups/5164f282-7a9c-4e2e-a800-cda96de51203, Security Group 5164f282-7a9c-4e2e-a800-cda96de51203 in use.
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging 
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging During handling of the above exception, another exception occurred:
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging 
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event, *args, **kwargs)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 96, in __call__
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     self._handler.set_liveness(alive=False)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     self.force_reraise()
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     six.reraise(self.type_, self.value, self.tb)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     raise value
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 81, in __call__
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event, *args, **kwargs)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 87, in __call__
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     self.on_finalize(obj)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrnetworkpolicy.py", line 313, in on_finalize
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     self._drv_policy.delete_np_sg(crd_sg)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/network_policy.py", line 623, in delete_np_sg
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging     raise exceptions.ResourceNotReady(sg_id)
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: '5164f282-7a9c-4e2e-a800-cda96de51203'
2020-10-02 11:12:44.953 1 ERROR kuryr_kubernetes.handlers.logging 
2020-10-02 11:12:54.657 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrNetworkPolicyHandler is dead.
2020-10-02 11:13:04.659 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrNetworkPolicyHandler is dead.

-----------------------------------
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  creationTimestamp: "2020-10-02T09:29:02Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2020-10-02T09:29:13Z"
  finalizers:
  - kuryr.openstack.org/networkpolicy-finalizer
  generation: 3
  name: kuryr-network-policy-1207324344
  namespace: default
  resourceVersion: "71289"
  selfLink: /apis/networking.k8s.io/v1/namespaces/default/networkpolicies/kuryr-network-policy-1207324344
  uid: 0b56bc42-259c-4ea7-8a80-dbef767f1c4f
spec:
  egress:
  - {}
  ingress:
  - {}
  podSelector:
    matchLabels:
      context: demo
  policyTypes:
  - Ingress
  - Egress
-----------------------------

apiVersion: openstack.org/v1
kind: KuryrNetworkPolicy
metadata:
  annotations:
    networkPolicyLink: /apis/networking.k8s.io/v1/namespaces/default/networkpolicies/kuryr-network-policy-1207324344
  creationTimestamp: "2020-10-02T09:29:02Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2020-10-02T09:29:13Z"
  finalizers:
  - kuryr.openstack.org/networkpolicy-finalizer
  name: kuryr-network-policy-1207324344
  namespace: default
  resourceVersion: "71291"
  selfLink: /apis/openstack.org/v1/namespaces/default/kuryrnetworkpolicies/kuryr-network-policy-1207324344
  uid: ca93b3ec-0562-47bc-b7f1-b5672f858e15
spec:
  egressSgRules:
  - sgRule:
      description: Kuryr-Kubernetes NetPolicy SG rule
      direction: egress
      ethertype: IPv4
      port_range_max: 65535
      port_range_min: 1
      protocol: tcp
  - sgRule:
      description: Kuryr-Kubernetes NetPolicy SG rule
      direction: egress
      ethertype: IPv6
      port_range_max: 65535
      port_range_min: 1
      protocol: tcp
  ingressSgRules:
  - sgRule:
      description: Kuryr-Kubernetes NetPolicy SG rule
      direction: ingress
      ethertype: IPv4
      port_range_max: 65535
      port_range_min: 1
      protocol: tcp
  - sgRule:
      description: Kuryr-Kubernetes NetPolicy SG rule
      direction: ingress
      ethertype: IPv6
      port_range_max: 65535
      port_range_min: 1
      protocol: tcp
  - sgRule:
      description: Kuryr-Kubernetes NetPolicy SG rule
      direction: ingress
      ethertype: IPv4
      remote_ip_prefix: 172.30.0.0/15
  - sgRule:
      description: Kuryr-Kubernetes NetPolicy SG rule
      direction: ingress
      ethertype: IPv4
      remote_ip_prefix: 10.196.0.0/16
  podSelector:
    matchLabels:
      context: demo
  policyTypes:
  - Ingress
  - Egress
status:
  podSelector:
    matchLabels:
      context: demo
  securityGroupId: 5164f282-7a9c-4e2e-a800-cda96de51203
  securityGroupRules:
  - description: Kuryr-Kubernetes NetPolicy SG rule
    direction: ingress
    ethertype: IPv4
    id: 5bd83ab9-cfa6-4203-b8e9-58d4f6e550f2
    port_range_max: 65535
    port_range_min: 1
    protocol: tcp
    security_group_id: 5164f282-7a9c-4e2e-a800-cda96de51203
  - description: Kuryr-Kubernetes NetPolicy SG rule
    direction: ingress
    ethertype: IPv6
    id: 73628eac-2598-4477-a70a-02e0df99e558
    port_range_max: 65535
    port_range_min: 1
    protocol: tcp
    security_group_id: 5164f282-7a9c-4e2e-a800-cda96de51203
  - description: Kuryr-Kubernetes NetPolicy SG rule
    direction: ingress
    ethertype: IPv4
    id: 06030704-4758-44a2-9821-da9ce96d524c
    remote_ip_prefix: 172.30.0.0/15
    security_group_id: 5164f282-7a9c-4e2e-a800-cda96de51203
  - description: Kuryr-Kubernetes NetPolicy SG rule
    direction: ingress
    ethertype: IPv4
    id: 79584ec1-b570-4789-8dc6-c7be4c4e4569
    remote_ip_prefix: 10.196.0.0/16
    security_group_id: 5164f282-7a9c-4e2e-a800-cda96de51203
  - description: Kuryr-Kubernetes NetPolicy SG rule
    direction: egress
    ethertype: IPv4
    id: a53153bd-d22d-4b37-8178-34ba33a02054
    port_range_max: 65535
    port_range_min: 1
    protocol: tcp
    security_group_id: 5164f282-7a9c-4e2e-a800-cda96de51203
  - description: Kuryr-Kubernetes NetPolicy SG rule
    direction: egress
    ethertype: IPv6
    id: 9fa026b3-e327-4a9f-88bf-6664146b095f
    port_range_max: 65535
    port_range_min: 1
    protocol: tcp
    security_group_id: 5164f282-7a9c-4e2e-a800-cda96de51203

[stack@undercloud-0 ~]$ oc get po -A --show-labels |grep demo |wc -l
0

(shiftstack) [stack@undercloud-0 ~]$ openstack port show 570a0cbe-c87d-44cd-97c5-3cc6921f4a85 -c security_group_ids
+--------------------+--------------------------------------+
| Field              | Value                                |
+--------------------+--------------------------------------+
| security_group_ids | 5164f282-7a9c-4e2e-a800-cda96de51203 |
+--------------------+--------------------------------------+

Version-Release number of selected component (if applicable):

4.6.0-0.nightly-2020-10-02-001427
OSP13

How reproducible:


Steps to Reproduce:
1. 
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 rlobillo 2020-11-13 16:16:52 UTC
Verified on 4.7.0-0.nightly-2020-11-11-055454 over OSP13 with Amphoras (2020-11-04.1)

Tempest test run several times without observing restarts on kuryr-controller. Results: https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-osasinfra-shiftstack_on_vms-ocp_verification-osp13-ocp4.7/20//artifact/tempest-results/

Comment 5 errata-xmlrpc 2021-02-24 15:22:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.