Bug 1924917

Summary: kuryr-controller in crash loop if IP is removed from secondary interfaces
Product: OpenShift Container Platform Reporter: Robert Heinzmann <rheinzma>
Component: NetworkingAssignee: Michał Dulko <mdulko>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: rlobillo
Version: 4.6   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:40:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1928029    

Description Robert Heinzmann 2021-02-03 21:23:06 UTC
Description of problem:

When the fixed_ip of a secondary interface (multus) is removed from a secondary network interface and the kuryr controller is restarted, the controller is crash looping with the following error message:

~~~
2021-02-03 17:44:59.534 1 ERROR oslo_service.service [-] Error starting thread.: IndexError: list index out of range
2021-02-03 17:44:59.534 1 ERROR oslo_service.service Traceback (most recent call last):
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/oslo_service/service.py", line 810, in run_service
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     service.start()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/service.py", line 110, in start
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     self.pool_driver.sync_pools()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1241, in sync_pools
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     vif_drv.sync_pools()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 994, in sync_pools
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     self._recover_precreated_ports()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 999, in _recover_precreated_ports
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     self._precreated_ports(action='recover')
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1022, in _precreated_ports
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     parent_ports, available_subports, subnets = self._get_trunks_info()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 412, in _get_trunks_info
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     'ip': port.fixed_ips[0]['ip_address'],
2021-02-03 17:44:59.534 1 ERROR oslo_service.service IndexError: list index out of range
2021-02-03 17:44:59.534 1 ERROR oslo_service.service 
~~~

The use case for this scenario is a setup with limit IP addresses. Multus secondary interfaces with ipvlan should be configured with wearabouts and only the pods should need IP addresses from the network, not the nodes. This is why the deployment automation removes the node IP's past deployment.

Version-Release number of selected component (if applicable):

OpenShift 4.6.12
OpenStack 16.1.3 (AIO)
Instances Deployed with additional network interfaces 

How reproducible:

Always

Steps to Reproduce:

1. Deploy Cluster with Kuryr
2. Configure MachineSet with additional network interfaces

~~~
[stack@osp16amd ocp-test1]$ oc get machineset -n openshift-machine-api
NAME                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ocp-phnb2-worker-0   2         2         2       2           8d
ocp-phnb2-worker-1   1         1         1       1           3h27m

[stack@osp16amd ocp-test1]$ oc get machineset ocp-phnb2-worker-1 -n openshift-machine-api -o json | jq -r .spec.template.spec.providerSpec.value.networks
[
  {
    "filter": {},
    "subnets": [
      {
        "filter": {
          "name": "ocp-phnb2-nodes",
          "tags": "openshiftClusterID=ocp-phnb2"
        }
      }
    ]
  },
  {
    "filter": {},
    "noAllowedAddressPairs": true,
    "subnets": [
      {
        "filter": {
          "name": "additional-network-subnet"
        }
      }
    ]
  }
]
~~~

3. Remove IP from Node Network interface

~~~
[stack@osp16amd ocp-test1]$ openstack port list --network additional-network
+--------------------------------------+--------------------------+-------------------+-------------------------------------------------------------------------------+--------+
| ID                                   | Name                     | MAC Address       | Fixed IP Addresses                                                            | Status |
+--------------------------------------+--------------------------+-------------------+-------------------------------------------------------------------------------+--------+
| 52a86be6-763c-497e-9d06-21cff0fa4dab | ocp-phnb2-worker-1-9wpjw | fa:16:3e:bd:58:45 | ip_address='192.168.123.68', subnet_id='8ea4d2d7-5541-4bd3-8828-86b441ae06f9' | ACTIVE |
| 66a3e17b-b111-4e06-a709-083c96cf57e6 |                          | fa:16:3e:d6:02:3f | ip_address='192.168.123.10', subnet_id='8ea4d2d7-5541-4bd3-8828-86b441ae06f9' | DOWN   |
| aaf9493c-bcfd-42bf-9cb3-b8eedbb5cb69 |                          | fa:16:3e:66:7e:2f | ip_address='192.168.123.1', subnet_id='8ea4d2d7-5541-4bd3-8828-86b441ae06f9'  | ACTIVE |
+--------------------------------------+--------------------------+-------------------+-------------------------------------------------------------------------------+--------+
[stack@osp16amd ocp-test1]$ openstack port set 52a86be6-763c-497e-9d06-21cff0fa4dab --no-fixed-ip --no-allowed-address  --allowed-address ip-address=192.168.123.0/24

[stack@osp16amd ocp-test1]$ openstack port list --network additional-network
+--------------------------------------+--------------------------+-------------------+-------------------------------------------------------------------------------+--------+
| ID                                   | Name                     | MAC Address       | Fixed IP Addresses                                                            | Status |
+--------------------------------------+--------------------------+-------------------+-------------------------------------------------------------------------------+--------+
| 52a86be6-763c-497e-9d06-21cff0fa4dab | ocp-phnb2-worker-1-9wpjw | fa:16:3e:bd:58:45 |                                                                               | ACTIVE |
| 66a3e17b-b111-4e06-a709-083c96cf57e6 |                          | fa:16:3e:d6:02:3f | ip_address='192.168.123.10', subnet_id='8ea4d2d7-5541-4bd3-8828-86b441ae06f9' | DOWN   |
| aaf9493c-bcfd-42bf-9cb3-b8eedbb5cb69 |                          | fa:16:3e:66:7e:2f | ip_address='192.168.123.1', subnet_id='8ea4d2d7-5541-4bd3-8828-86b441ae06f9'  | ACTIVE |
+--------------------------------------+--------------------------+-------------------+-------------------------------------------------------------------------------+--------+
~~~

4. Restart Kuryr Pod

~~~
[stack@osp16amd ocp-test1]$ oc delete pods -n openshift-kuryr -l app=kuryr-controller
pod "kuryr-controller-75957cd77d-fd444" deleted
~~~

5. Check the Kuryr Pod status

~~~
[stack@osp16amd ocp-test1]$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS             RESTARTS   AGE
kuryr-cni-57vnm                     1/1     Running            0          8d
kuryr-cni-59wzz                     1/1     Running            0          59m
kuryr-cni-6cjsx                     1/1     Running            0          28h
kuryr-cni-hgftg                     1/1     Running            0          8d
kuryr-cni-hlkq4                     1/1     Running            0          29h
kuryr-cni-xmf4k                     1/1     Running            0          8d
kuryr-controller-75957cd77d-4wtn5   0/1     CrashLoopBackOff   1          68s

[stack@osp16amd ocp-test1]$ oc logs -n openshift-kuryr kuryr-controller-75957cd77d-4wtn5
2021-02-03 17:44:33.247 1 INFO kuryr_kubernetes.config [-] Logging enabled!
2021-02-03 17:44:33.248 1 INFO kuryr_kubernetes.config [-] /usr/bin/kuryr-k8s-controller version 4.6.0
2021-02-03 17:44:34.335 1 INFO os_vif [-] Loaded VIF plugins: linux_bridge, noop, ovs, noop, sriov
2021-02-03 17:44:34.344 1 INFO kuryr_kubernetes.controller.service [-] Configured handlers: ['vif', 'kuryrport', 'service', 'endpoints', 'kuryrloadbalancer', 'policy', 'pod_label', 'namespace', 'kuryrnetworkpolicy', 'kuryrnetwork']
2021-02-03 17:44:34.384 1 INFO kuryr_kubernetes.controller.drivers.lbaasv2 [-] Octavia supports ACLs for Amphora provider.
2021-02-03 17:44:34.384 1 INFO kuryr_kubernetes.controller.drivers.lbaasv2 [-] Octavia supports double listeners (different protocol, same port) for Amphora provider.
2021-02-03 17:44:34.384 1 INFO kuryr_kubernetes.controller.drivers.lbaasv2 [-] Octavia supports resource tags.
2021-02-03 17:44:34.462 1 INFO kuryr_kubernetes.controller.service [-] Loaded handlers: ['endpoints', 'kuryrloadbalancer', 'kuryrnetwork', 'kuryrnetworkpolicy', 'kuryrport', 'namespace', 'pod_label', 'policy', 'service', 'vif']
2021-02-03 17:44:34.472 1 WARNING oslo_config.cfg [-] Deprecated: Option "sg_mode" from group "octavia_defaults" is deprecated for removal (enforce_sg_rules option can be used instead).  Its value may be silently ignored in the future.
2021-02-03 17:44:34.481 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopped
2021-02-03 17:44:34.482 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' starting
2021-02-03 17:44:34.483 1 INFO kuryr_kubernetes.controller.service [-] Running in non-HA mode, starting watcher immediately.
2021-02-03 17:44:34.487 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/endpoints'
2021-02-03 17:44:34.492 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrloadbalancers'
2021-02-03 17:44:34.502 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrports'
2021-02-03 17:44:34.509 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/namespaces'
2021-02-03 17:44:34.512 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods'
2021-02-03 17:44:34.517 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/networking.k8s.io/v1/networkpolicies'
2021-02-03 17:44:34.522 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrnetworkpolicies'
2021-02-03 17:44:34.527 1 INFO kuryr_kubernetes.watcher [-] Started watching '/apis/openstack.org/v1/kuryrnetworks'
2021-02-03 17:44:34.531 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/services'
2021-02-03 17:44:59.534 1 ERROR oslo_service.service [-] Error starting thread.: IndexError: list index out of range
2021-02-03 17:44:59.534 1 ERROR oslo_service.service Traceback (most recent call last):
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/oslo_service/service.py", line 810, in run_service
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     service.start()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/service.py", line 110, in start
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     self.pool_driver.sync_pools()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1241, in sync_pools
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     vif_drv.sync_pools()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 994, in sync_pools
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     self._recover_precreated_ports()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 999, in _recover_precreated_ports
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     self._precreated_ports(action='recover')
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1022, in _precreated_ports
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     parent_ports, available_subports, subnets = self._get_trunks_info()
2021-02-03 17:44:59.534 1 ERROR oslo_service.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 412, in _get_trunks_info
2021-02-03 17:44:59.534 1 ERROR oslo_service.service     'ip': port.fixed_ips[0]['ip_address'],
2021-02-03 17:44:59.534 1 ERROR oslo_service.service IndexError: list index out of range
2021-02-03 17:44:59.534 1 ERROR oslo_service.service 
2021-02-03 17:44:59.536 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopping
2021-02-03 17:44:59.536 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/endpoints'
2021-02-03 17:44:59.537 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrloadbalancers'
2021-02-03 17:44:59.537 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrports'
2021-02-03 17:44:59.538 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/namespaces'
2021-02-03 17:44:59.539 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/pods'
2021-02-03 17:44:59.539 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/networking.k8s.io/v1/networkpolicies'
2021-02-03 17:44:59.540 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrnetworkpolicies'
2021-02-03 17:44:59.540 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrnetworks'
2021-02-03 17:44:59.541 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/services'
2021-02-03 17:44:59.541 1 INFO kuryr_kubernetes.watcher [-] No remaining active watchers, Exiting...
~~~


Actual results:

kuryr-controller in Crash Loop

Expected results:

kuryr-controller OK

Additional info:

Enabeling the fixed IP on the port actually fixes the issue:

Add IP:
~~~
openstack port set 52a86be6-763c-497e-9d06-21cff0fa4dab --no-fixed-ip --fixed-ip subnet=additional-network-subnet,ip-address=192.168.123.68 --no-security-group  --no-allowed-address  --allowed-address ip-address=192.168.123.0/24
~~~

Restart Kuryr: 
~~~
[stack@osp16amd ocp-test1]$ oc delete pods -n openshift-kuryr -l app=kuryr-controller
pod "kuryr-controller-75957cd77d-4wtn5" deleted
~~~

Check Status:
~~~
[stack@osp16amd ocp-test1]$ oc get pods -n openshift-kuryr -l app=kuryr-controller
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-controller-75957cd77d-n7mbp   1/1     Running   0          44s
~~~

Comment 1 Luis Tomas Bolivar 2021-02-04 07:39:26 UTC
Looks like kuryr is configured to use that interface (the VM trunk port), and it is not supported to have the trunk port of the VM without IP, as that one is the one used by the subports attached to the containers

Comment 2 Robert Heinzmann 2021-02-04 07:52:46 UTC
Actually only the IP of the SECONDARY interface (ens4) is removed, not the ip of the PRIMARY interface (ens3) used for kuryr and the subports

Note: Port where IP was removed was 52a86be6-763c-497e-9d06-21cff0fa4dab

~~~
[stack@osp16amd ocp-test1]$ openstack network trunk list | grep ocp-phnb2-worker-1
| 0b491376-57a3-44c4-9576-40581c62b6b5 | ocp-phnb2-worker-1-9wpjw | 52a86be6-763c-497e-9d06-21cff0fa4dab |             |
| cf9547e0-5ce4-4120-9b70-36112c7b359e | ocp-phnb2-worker-1-9wpjw | 40607804-4fca-4db0-9191-8552481d61bf |             |

# This is one where the IP was removed
[stack@osp16amd ocp-test1]$ openstack port show 52a86be6-763c-497e-9d06-21cff0fa4dab -f value -c mac_address -c name -c trunk_details
fa:16:3e:bd:58:45
ocp-phnb2-worker-1-9wpjw
{'trunk_id': '0b491376-57a3-44c4-9576-40581c62b6b5', 'sub_ports': []}

# Here NO IP was removed
[stack@osp16amd ocp-test1]$ openstack port show 40607804-4fca-4db0-9191-8552481d61bf -f value -c mac_address -c name -c trunk_details
fa:16:3e:7b:87:db
ocp-phnb2-worker-1-9wpjw
{'trunk_id': 'cf9547e0-5ce4-4120-9b70-36112c7b359e', 'sub_ports': [{'segmentation_id': 6, 'segmentation_type': 'vlan', 'port_id': '953352a7-04e1-4837-875f-87002d6dd9a4', 'mac_address': 'fa:16:3e:66:bd:2c'}, {'segmentation_id': 57, 'segmentation_type': 'vlan', 'port_id': '7a9e7cd5-f2e8-4e5f-8146-49a504c6f119', 'mac_address': 'fa:16:3e:2d:eb:10'}, {'segmentation_id': 802, 'segmentation_type': 'vlan', 'port_id': '5bfbbaa8-97ea-415b-977d-3d8c2e089e6c', 'mac_address': 'fa:16:3e:c0:64:c1'}, {'segmentation_id': 878, 'segmentation_type': 'vlan', 'port_id': '3c317835-d264-4bf7-b7dc-511c4db6c9e3', 'mac_address': 'fa:16:3e:65:7b:a3'}, {'segmentation_id': 920, 'segmentation_type': 'vlan', 'port_id': 'e75716eb-5fe9-4828-98ae-1ffba35a2b44', 'mac_address': 'fa:16:3e:62:7c:03'}, {'segmentation_id': 1653, 'segmentation_type': 'vlan', 'port_id': 'fce27960-2151-48c8-98e2-db174971ecc1', 'mac_address': 'fa:16:3e:2c:f4:03'}, {'segmentation_id': 1699, 'segmentation_type': 'vlan', 'port_id': '573a65e0-0b6b-461b-b5b2-8e80ecc37258', 'mac_address': 'fa:16:3e:9b:26:b8'}, {'segmentation_id': 2009, 'segmentation_type': 'vlan', 'port_id': '6c98ab03-6175-4047-9caf-94ca0ff43baa', 'mac_address': 'fa:16:3e:30:1c:ea'}, {'segmentation_id': 2138, 'segmentation_type': 'vlan', 'port_id': '1d618b48-fdaa-4651-abb2-d33e67964916', 'mac_address': 'fa:16:3e:d1:3e:5c'}, {'segmentation_id': 2222, 'segmentation_type': 'vlan', 'port_id': 'f8b667d2-5ca3-4b09-80b9-429549939ec9', 'mac_address': 'fa:16:3e:f9:0a:fb'}, {'segmentation_id': 2280, 'segmentation_type': 'vlan', 'port_id': '32bc7b34-6fbf-46cb-ac73-c31943bcdcaa', 'mac_address': 'fa:16:3e:64:ed:4b'}, {'segmentation_id': 2302, 'segmentation_type': 'vlan', 'port_id': 'f2560925-e025-4c2e-b0ac-1c80e2769891', 'mac_address': 'fa:16:3e:d8:1a:e0'}, {'segmentation_id': 2428, 'segmentation_type': 'vlan', 'port_id': 'c922ba07-261e-441e-a76c-8c631299cf07', 'mac_address': 'fa:16:3e:61:32:af'}, {'segmentation_id': 2499, 'segmentation_type': 'vlan', 'port_id': 'a5a49479-89ae-4bcf-b450-1666cbfb7a83', 'mac_address': 'fa:16:3e:80:39:30'}, {'segmentation_id': 2598, 'segmentation_type': 'vlan', 'port_id': '46d19092-a5a9-4f33-9f83-39ef7124d09b', 'mac_address': 'fa:16:3e:4c:5d:08'}, {'segmentation_id': 2656, 'segmentation_type': 'vlan', 'port_id': '5ce9b3ed-d615-4427-a673-18de43f6753b', 'mac_address': 'fa:16:3e:af:22:bd'}, {'segmentation_id': 2935, 'segmentation_type': 'vlan', 'port_id': 'c75c3e74-e24d-457e-ac8d-7264cfdd8973', 'mac_address': 'fa:16:3e:72:5e:df'}, {'segmentation_id': 3011, 'segmentation_type': 'vlan', 'port_id': '6691528a-b062-4eb8-a532-e4e90c817880', 'mac_address': 'fa:16:3e:94:5a:4c'}, {'segmentation_id': 3203, 'segmentation_type': 'vlan', 'port_id': 'dc4fee62-e4fd-48a0-ba49-03d3f99a8fc9', 'mac_address': 'fa:16:3e:65:e8:30'}, {'segmentation_id': 3436, 'segmentation_type': 'vlan', 'port_id': '5adf3b0f-26f7-42af-91d2-08b34b5f1ab0', 'mac_address': 'fa:16:3e:bd:94:71'}, {'segmentation_id': 3475, 'segmentation_type': 'vlan', 'port_id': '762e55b8-fd28-43fa-b086-51560802f640', 'mac_address': 'fa:16:3e:34:7c:8b'}, {'segmentation_id': 3593, 'segmentation_type': 'vlan', 'port_id': 'ba9b5ed1-aa89-4183-bc0b-0b9af4890039', 'mac_address': 'fa:16:3e:9a:45:14'}, {'segmentation_id': 3628, 'segmentation_type': 'vlan', 'port_id': 'ada2be75-2edd-4c42-877b-f08b86133174', 'mac_address': 'fa:16:3e:3b:0f:29'}, {'segmentation_id': 3634, 'segmentation_type': 'vlan', 'port_id': '2336f374-1c4b-4d15-bf19-dcf4508ed236', 'mac_address': 'fa:16:3e:b5:0f:64'}, {'segmentation_id': 3650, 'segmentation_type': 'vlan', 'port_id': '899c5761-4bd6-4f9e-9c77-d5a3a132e95a', 'mac_address': 'fa:16:3e:79:1b:9b'}, {'segmentation_id': 3715, 'segmentation_type': 'vlan', 'port_id': '7d7899e8-a0dd-4d38-b865-15544c2415b3', 'mac_address': 'fa:16:3e:69:36:27'}, {'segmentation_id': 3822, 'segmentation_type': 'vlan', 'port_id': 'e535223e-2fa6-475a-b34f-d332c8084e10', 'mac_address': 'fa:16:3e:b4:a3:17'}, {'segmentation_id': 3937, 'segmentation_type': 'vlan', 'port_id': '2e7ca23a-93dc-4402-a318-6dffc6b2294c', 'mac_address': 'fa:16:3e:eb:59:ea'}, {'segmentation_id': 3977, 'segmentation_type': 'vlan', 'port_id': '7b3dd5db-d1f6-4e1a-b719-3e26e2bffb31', 'mac_address': 'fa:16:3e:42:dd:4e'}, {'segmentation_id': 4005, 'segmentation_type': 'vlan', 'port_id': '2df82508-9f57-4e58-8871-531401a7a9e1', 'mac_address': 'fa:16:3e:b5:84:c4'}]}

[stack@osp16amd ocp-test1]$ oc debug node/ocp-phnb2-worker-1-9wpjw -- ip link
Creating debug namespace/openshift-debug-node-rkknj ...
Starting pod/ocp-phnb2-worker-1-9wpjw-debug ...
To use host binaries, run `chroot /host`
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:7b:87:db brd ff:ff:ff:ff:ff:ff
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:bd:58:45 brd ff:ff:ff:ff:ff:ff

Removing debug pod ...
Removing debug namespace/openshift-debug-node-rkknj ...
~~~

Comment 3 Michał Dulko 2021-02-04 16:27:18 UTC
I think I can confirm this happens. I filed BZ [1] due to this, the culprit is that the trunk port for the worker secondary interfaces should not get created by machine-api/CAPO in the first place.

This means that a possible (untested, but should work unless trunks are recreated by CAPO) workaround for the problem would be to remove the trunks on secondary interfaces.

We'll add a workaround in Kuryr anyway.

[1]  https://bugzilla.redhat.com/show_bug.cgi?id=1925233

Comment 7 rlobillo 2021-02-23 12:17:59 UTC
Verified on OCP4.8.0-0.nightly-2021-02-21-102854 over OSP13 (2021-01-20.1) with amphora provider.


Steps:

1. Create extra network and subnet:

$ openstack network create data-network
$ openstack subnet create data-subnet --network data-network --gateway 10.196.0.1 --subnet-range 10.196.0.0/16 --dns-nameserver 10.46.0.31

2. Create new machineset including 1 worker with 2 interfaces (https://gist.github.com/rlobillo/4e80b1bdf1c5da995378db4aea01c76a)

3. Wait until is new worker is up and remove the secondary IP manually:

$ openstack server list                                                                                                                             
+--------------------------------------+-----------------------------+--------+----------------------------------------------------------------+---------------------------------------+------
-----+                                                                                                                              
| ID                                   | Name                        | Status | Networks                                                       | Image                                 | Flavo
r    |                                                                                                                              
+--------------------------------------+-----------------------------+--------+----------------------------------------------------------------+---------------------------------------+------
-----+                                                                                                                  
| 1b8ebbbd-c460-451c-ab22-ff139ac62b58 | ostest-dzghr-data-0-d7rzc   | ACTIVE | data-network=10.196.0.71; installer_host-network=172.16.40.235 | ostest-dzghr-rhcos                    | m4.xl
arge |                                                                                        
| d70b20e0-48fb-4df8-90f0-380bd4eb749e | ostest-dzghr-worker-0-wkn9v | ACTIVE | installer_host-network=172.16.40.187                           | ostest-dzghr-rhcos                    | m4.xl
arge |                                                                                           
| b9a6e2bb-562f-46bc-b973-ef19db04a1f7 | ostest-dzghr-master-2       | ACTIVE | installer_host-network=172.16.40.84                            | ostest-dzghr-rhcos                    | m4.xl
arge |                                        
| 276f710a-af2d-4c7c-83dc-6bf69c45488c | ostest-dzghr-master-1       | ACTIVE | installer_host-network=172.16.40.216                           | ostest-dzghr-rhcos                    | m4.xl
arge |                                                                                                  
| a2bfb533-49cb-4155-8572-f4257de47c33 | ostest-dzghr-master-0       | ACTIVE | installer_host-network=172.16.40.156                           | ostest-dzghr-rhcos                    | m4.xl
arge |                                                                                                                                                                                       
| f19bc5aa-4232-43a8-9827-17312294b997 | installer_host              | ACTIVE | installer_host-network=172.16.40.120, 10.46.22.245             | rhel-guest-image-8.3-401.x86_64.qcow2 | m1.me
dium |                                                                            
+--------------------------------------+-----------------------------+--------+----------------------------------------------------------------+---------------------------------------+------
-----+                                                                      

$ openstack port list --network data-network | grep 10.196.0.71
| 323ac436-e40b-4c6f-aa87-e41bde227a7a | ostest-dzghr-data-0-d7rzc | fa:16:3e:7a:0c:94 | ip_address='10.196.0.71', subnet_id='dd2c3046-6a23-4e82-94a3-57a556e03fff' | ACTIVE |
$ openstack port set 323ac436-e40b-4c6f-aa87-e41bde227a7a --no-fixed-ip --no-allowed-address  --allowed-address ip-address=10.196.0.0/16
$ openstack server list
+--------------------------------------+-----------------------------+--------+----------------------------------------------------+---------------------------------------+-----------+
| ID                                   | Name                        | Status | Networks                                           | Image                                 | Flavor    |
+--------------------------------------+-----------------------------+--------+----------------------------------------------------+---------------------------------------+-----------+
| 1b8ebbbd-c460-451c-ab22-ff139ac62b58 | ostest-dzghr-data-0-d7rzc   | ACTIVE | installer_host-network=172.16.40.235               | ostest-dzghr-rhcos                    | m4.xlarge |
| d70b20e0-48fb-4df8-90f0-380bd4eb749e | ostest-dzghr-worker-0-wkn9v | ACTIVE | installer_host-network=172.16.40.187               | ostest-dzghr-rhcos                    | m4.xlarge |
| b9a6e2bb-562f-46bc-b973-ef19db04a1f7 | ostest-dzghr-master-2       | ACTIVE | installer_host-network=172.16.40.84                | ostest-dzghr-rhcos                    | m4.xlarge |
| 276f710a-af2d-4c7c-83dc-6bf69c45488c | ostest-dzghr-master-1       | ACTIVE | installer_host-network=172.16.40.216               | ostest-dzghr-rhcos                    | m4.xlarge |
| a2bfb533-49cb-4155-8572-f4257de47c33 | ostest-dzghr-master-0       | ACTIVE | installer_host-network=172.16.40.156               | ostest-dzghr-rhcos                    | m4.xlarge |
| f19bc5aa-4232-43a8-9827-17312294b997 | installer_host              | ACTIVE | installer_host-network=172.16.40.120, 10.46.22.245 | rhel-guest-image-8.3-401.x86_64.qcow2 | m1.medium |
+--------------------------------------+-----------------------------+--------+----------------------------------------------------+---------------------------------------+-----------+
$ oc delete pods -n openshift-kuryr -l app=kuryr-controller
pod "kuryr-controller-566f9cf79f-8794k" deleted

kuryr-controller remains stable after that:

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-6jtlc                     1/1     Running   0          123m
kuryr-cni-f9rk7                     1/1     Running   0          18m
kuryr-cni-n6lf4                     1/1     Running   0          106m
kuryr-cni-qlrdv                     1/1     Running   0          123m
kuryr-cni-v4csm                     1/1     Running   0          123m
kuryr-controller-566f9cf79f-dq24h   1/1     Running   0          12m

Furthermore, kuryr-tempest tests, NP tests and conformance tests
passed for this build. Please refer to the attachment on 
https://bugzilla.redhat.com/show_bug.cgi?id=1927244#c6

Comment 10 errata-xmlrpc 2021-07-27 22:40:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438