Bug 1924917
Summary: | kuryr-controller in crash loop if IP is removed from secondary interfaces | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Robert Heinzmann <rheinzma> |
Component: | Networking | Assignee: | Michał Dulko <mdulko> |
Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | rlobillo |
Version: | 4.6 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:40:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1928029 |
Description
Robert Heinzmann
2021-02-03 21:23:06 UTC
Looks like kuryr is configured to use that interface (the VM trunk port), and it is not supported to have the trunk port of the VM without IP, as that one is the one used by the subports attached to the containers Actually only the IP of the SECONDARY interface (ens4) is removed, not the ip of the PRIMARY interface (ens3) used for kuryr and the subports Note: Port where IP was removed was 52a86be6-763c-497e-9d06-21cff0fa4dab ~~~ [stack@osp16amd ocp-test1]$ openstack network trunk list | grep ocp-phnb2-worker-1 | 0b491376-57a3-44c4-9576-40581c62b6b5 | ocp-phnb2-worker-1-9wpjw | 52a86be6-763c-497e-9d06-21cff0fa4dab | | | cf9547e0-5ce4-4120-9b70-36112c7b359e | ocp-phnb2-worker-1-9wpjw | 40607804-4fca-4db0-9191-8552481d61bf | | # This is one where the IP was removed [stack@osp16amd ocp-test1]$ openstack port show 52a86be6-763c-497e-9d06-21cff0fa4dab -f value -c mac_address -c name -c trunk_details fa:16:3e:bd:58:45 ocp-phnb2-worker-1-9wpjw {'trunk_id': '0b491376-57a3-44c4-9576-40581c62b6b5', 'sub_ports': []} # Here NO IP was removed [stack@osp16amd ocp-test1]$ openstack port show 40607804-4fca-4db0-9191-8552481d61bf -f value -c mac_address -c name -c trunk_details fa:16:3e:7b:87:db ocp-phnb2-worker-1-9wpjw {'trunk_id': 'cf9547e0-5ce4-4120-9b70-36112c7b359e', 'sub_ports': [{'segmentation_id': 6, 'segmentation_type': 'vlan', 'port_id': '953352a7-04e1-4837-875f-87002d6dd9a4', 'mac_address': 'fa:16:3e:66:bd:2c'}, {'segmentation_id': 57, 'segmentation_type': 'vlan', 'port_id': '7a9e7cd5-f2e8-4e5f-8146-49a504c6f119', 'mac_address': 'fa:16:3e:2d:eb:10'}, {'segmentation_id': 802, 'segmentation_type': 'vlan', 'port_id': '5bfbbaa8-97ea-415b-977d-3d8c2e089e6c', 'mac_address': 'fa:16:3e:c0:64:c1'}, {'segmentation_id': 878, 'segmentation_type': 'vlan', 'port_id': '3c317835-d264-4bf7-b7dc-511c4db6c9e3', 'mac_address': 'fa:16:3e:65:7b:a3'}, {'segmentation_id': 920, 'segmentation_type': 'vlan', 'port_id': 'e75716eb-5fe9-4828-98ae-1ffba35a2b44', 'mac_address': 'fa:16:3e:62:7c:03'}, {'segmentation_id': 1653, 'segmentation_type': 'vlan', 'port_id': 'fce27960-2151-48c8-98e2-db174971ecc1', 'mac_address': 'fa:16:3e:2c:f4:03'}, {'segmentation_id': 1699, 'segmentation_type': 'vlan', 'port_id': '573a65e0-0b6b-461b-b5b2-8e80ecc37258', 'mac_address': 'fa:16:3e:9b:26:b8'}, {'segmentation_id': 2009, 'segmentation_type': 'vlan', 'port_id': '6c98ab03-6175-4047-9caf-94ca0ff43baa', 'mac_address': 'fa:16:3e:30:1c:ea'}, {'segmentation_id': 2138, 'segmentation_type': 'vlan', 'port_id': '1d618b48-fdaa-4651-abb2-d33e67964916', 'mac_address': 'fa:16:3e:d1:3e:5c'}, {'segmentation_id': 2222, 'segmentation_type': 'vlan', 'port_id': 'f8b667d2-5ca3-4b09-80b9-429549939ec9', 'mac_address': 'fa:16:3e:f9:0a:fb'}, {'segmentation_id': 2280, 'segmentation_type': 'vlan', 'port_id': '32bc7b34-6fbf-46cb-ac73-c31943bcdcaa', 'mac_address': 'fa:16:3e:64:ed:4b'}, {'segmentation_id': 2302, 'segmentation_type': 'vlan', 'port_id': 'f2560925-e025-4c2e-b0ac-1c80e2769891', 'mac_address': 'fa:16:3e:d8:1a:e0'}, {'segmentation_id': 2428, 'segmentation_type': 'vlan', 'port_id': 'c922ba07-261e-441e-a76c-8c631299cf07', 'mac_address': 'fa:16:3e:61:32:af'}, {'segmentation_id': 2499, 'segmentation_type': 'vlan', 'port_id': 'a5a49479-89ae-4bcf-b450-1666cbfb7a83', 'mac_address': 'fa:16:3e:80:39:30'}, {'segmentation_id': 2598, 'segmentation_type': 'vlan', 'port_id': '46d19092-a5a9-4f33-9f83-39ef7124d09b', 'mac_address': 'fa:16:3e:4c:5d:08'}, {'segmentation_id': 2656, 'segmentation_type': 'vlan', 'port_id': '5ce9b3ed-d615-4427-a673-18de43f6753b', 'mac_address': 'fa:16:3e:af:22:bd'}, {'segmentation_id': 2935, 'segmentation_type': 'vlan', 'port_id': 'c75c3e74-e24d-457e-ac8d-7264cfdd8973', 'mac_address': 'fa:16:3e:72:5e:df'}, {'segmentation_id': 3011, 'segmentation_type': 'vlan', 'port_id': '6691528a-b062-4eb8-a532-e4e90c817880', 'mac_address': 'fa:16:3e:94:5a:4c'}, {'segmentation_id': 3203, 'segmentation_type': 'vlan', 'port_id': 'dc4fee62-e4fd-48a0-ba49-03d3f99a8fc9', 'mac_address': 'fa:16:3e:65:e8:30'}, {'segmentation_id': 3436, 'segmentation_type': 'vlan', 'port_id': '5adf3b0f-26f7-42af-91d2-08b34b5f1ab0', 'mac_address': 'fa:16:3e:bd:94:71'}, {'segmentation_id': 3475, 'segmentation_type': 'vlan', 'port_id': '762e55b8-fd28-43fa-b086-51560802f640', 'mac_address': 'fa:16:3e:34:7c:8b'}, {'segmentation_id': 3593, 'segmentation_type': 'vlan', 'port_id': 'ba9b5ed1-aa89-4183-bc0b-0b9af4890039', 'mac_address': 'fa:16:3e:9a:45:14'}, {'segmentation_id': 3628, 'segmentation_type': 'vlan', 'port_id': 'ada2be75-2edd-4c42-877b-f08b86133174', 'mac_address': 'fa:16:3e:3b:0f:29'}, {'segmentation_id': 3634, 'segmentation_type': 'vlan', 'port_id': '2336f374-1c4b-4d15-bf19-dcf4508ed236', 'mac_address': 'fa:16:3e:b5:0f:64'}, {'segmentation_id': 3650, 'segmentation_type': 'vlan', 'port_id': '899c5761-4bd6-4f9e-9c77-d5a3a132e95a', 'mac_address': 'fa:16:3e:79:1b:9b'}, {'segmentation_id': 3715, 'segmentation_type': 'vlan', 'port_id': '7d7899e8-a0dd-4d38-b865-15544c2415b3', 'mac_address': 'fa:16:3e:69:36:27'}, {'segmentation_id': 3822, 'segmentation_type': 'vlan', 'port_id': 'e535223e-2fa6-475a-b34f-d332c8084e10', 'mac_address': 'fa:16:3e:b4:a3:17'}, {'segmentation_id': 3937, 'segmentation_type': 'vlan', 'port_id': '2e7ca23a-93dc-4402-a318-6dffc6b2294c', 'mac_address': 'fa:16:3e:eb:59:ea'}, {'segmentation_id': 3977, 'segmentation_type': 'vlan', 'port_id': '7b3dd5db-d1f6-4e1a-b719-3e26e2bffb31', 'mac_address': 'fa:16:3e:42:dd:4e'}, {'segmentation_id': 4005, 'segmentation_type': 'vlan', 'port_id': '2df82508-9f57-4e58-8871-531401a7a9e1', 'mac_address': 'fa:16:3e:b5:84:c4'}]} [stack@osp16amd ocp-test1]$ oc debug node/ocp-phnb2-worker-1-9wpjw -- ip link Creating debug namespace/openshift-debug-node-rkknj ... Starting pod/ocp-phnb2-worker-1-9wpjw-debug ... To use host binaries, run `chroot /host` 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether fa:16:3e:7b:87:db brd ff:ff:ff:ff:ff:ff 3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether fa:16:3e:bd:58:45 brd ff:ff:ff:ff:ff:ff Removing debug pod ... Removing debug namespace/openshift-debug-node-rkknj ... ~~~ I think I can confirm this happens. I filed BZ [1] due to this, the culprit is that the trunk port for the worker secondary interfaces should not get created by machine-api/CAPO in the first place. This means that a possible (untested, but should work unless trunks are recreated by CAPO) workaround for the problem would be to remove the trunks on secondary interfaces. We'll add a workaround in Kuryr anyway. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1925233 Verified on OCP4.8.0-0.nightly-2021-02-21-102854 over OSP13 (2021-01-20.1) with amphora provider. Steps: 1. Create extra network and subnet: $ openstack network create data-network $ openstack subnet create data-subnet --network data-network --gateway 10.196.0.1 --subnet-range 10.196.0.0/16 --dns-nameserver 10.46.0.31 2. Create new machineset including 1 worker with 2 interfaces (https://gist.github.com/rlobillo/4e80b1bdf1c5da995378db4aea01c76a) 3. Wait until is new worker is up and remove the secondary IP manually: $ openstack server list +--------------------------------------+-----------------------------+--------+----------------------------------------------------------------+---------------------------------------+------ -----+ | ID | Name | Status | Networks | Image | Flavo r | +--------------------------------------+-----------------------------+--------+----------------------------------------------------------------+---------------------------------------+------ -----+ | 1b8ebbbd-c460-451c-ab22-ff139ac62b58 | ostest-dzghr-data-0-d7rzc | ACTIVE | data-network=10.196.0.71; installer_host-network=172.16.40.235 | ostest-dzghr-rhcos | m4.xl arge | | d70b20e0-48fb-4df8-90f0-380bd4eb749e | ostest-dzghr-worker-0-wkn9v | ACTIVE | installer_host-network=172.16.40.187 | ostest-dzghr-rhcos | m4.xl arge | | b9a6e2bb-562f-46bc-b973-ef19db04a1f7 | ostest-dzghr-master-2 | ACTIVE | installer_host-network=172.16.40.84 | ostest-dzghr-rhcos | m4.xl arge | | 276f710a-af2d-4c7c-83dc-6bf69c45488c | ostest-dzghr-master-1 | ACTIVE | installer_host-network=172.16.40.216 | ostest-dzghr-rhcos | m4.xl arge | | a2bfb533-49cb-4155-8572-f4257de47c33 | ostest-dzghr-master-0 | ACTIVE | installer_host-network=172.16.40.156 | ostest-dzghr-rhcos | m4.xl arge | | f19bc5aa-4232-43a8-9827-17312294b997 | installer_host | ACTIVE | installer_host-network=172.16.40.120, 10.46.22.245 | rhel-guest-image-8.3-401.x86_64.qcow2 | m1.me dium | +--------------------------------------+-----------------------------+--------+----------------------------------------------------------------+---------------------------------------+------ -----+ $ openstack port list --network data-network | grep 10.196.0.71 | 323ac436-e40b-4c6f-aa87-e41bde227a7a | ostest-dzghr-data-0-d7rzc | fa:16:3e:7a:0c:94 | ip_address='10.196.0.71', subnet_id='dd2c3046-6a23-4e82-94a3-57a556e03fff' | ACTIVE | $ openstack port set 323ac436-e40b-4c6f-aa87-e41bde227a7a --no-fixed-ip --no-allowed-address --allowed-address ip-address=10.196.0.0/16 $ openstack server list +--------------------------------------+-----------------------------+--------+----------------------------------------------------+---------------------------------------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------------------------+--------+----------------------------------------------------+---------------------------------------+-----------+ | 1b8ebbbd-c460-451c-ab22-ff139ac62b58 | ostest-dzghr-data-0-d7rzc | ACTIVE | installer_host-network=172.16.40.235 | ostest-dzghr-rhcos | m4.xlarge | | d70b20e0-48fb-4df8-90f0-380bd4eb749e | ostest-dzghr-worker-0-wkn9v | ACTIVE | installer_host-network=172.16.40.187 | ostest-dzghr-rhcos | m4.xlarge | | b9a6e2bb-562f-46bc-b973-ef19db04a1f7 | ostest-dzghr-master-2 | ACTIVE | installer_host-network=172.16.40.84 | ostest-dzghr-rhcos | m4.xlarge | | 276f710a-af2d-4c7c-83dc-6bf69c45488c | ostest-dzghr-master-1 | ACTIVE | installer_host-network=172.16.40.216 | ostest-dzghr-rhcos | m4.xlarge | | a2bfb533-49cb-4155-8572-f4257de47c33 | ostest-dzghr-master-0 | ACTIVE | installer_host-network=172.16.40.156 | ostest-dzghr-rhcos | m4.xlarge | | f19bc5aa-4232-43a8-9827-17312294b997 | installer_host | ACTIVE | installer_host-network=172.16.40.120, 10.46.22.245 | rhel-guest-image-8.3-401.x86_64.qcow2 | m1.medium | +--------------------------------------+-----------------------------+--------+----------------------------------------------------+---------------------------------------+-----------+ $ oc delete pods -n openshift-kuryr -l app=kuryr-controller pod "kuryr-controller-566f9cf79f-8794k" deleted kuryr-controller remains stable after that: $ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-6jtlc 1/1 Running 0 123m kuryr-cni-f9rk7 1/1 Running 0 18m kuryr-cni-n6lf4 1/1 Running 0 106m kuryr-cni-qlrdv 1/1 Running 0 123m kuryr-cni-v4csm 1/1 Running 0 123m kuryr-controller-566f9cf79f-dq24h 1/1 Running 0 12m Furthermore, kuryr-tempest tests, NP tests and conformance tests passed for this build. Please refer to the attachment on https://bugzilla.redhat.com/show_bug.cgi?id=1927244#c6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |