Bug 1688323 - Namespace isolation fails due to race on neutron
Summary: Namespace isolation fails due to race on neutron
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: z11
: 13.0 (Queens)
Assignee: Nate Johnston
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On: 1800847
Blocks: 1779369 1779374
TreeView+ depends on / blocked
 
Reported: 2019-03-13 14:39 UTC by GenadiC
Modified: 2020-03-10 11:27 UTC (History)
13 users (show)

Fixed In Version: openstack-neutron-12.1.0-6.el7ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1779369 1779374 (view as bug list)
Environment:
Last Closed: 2020-03-10 11:26:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1853603 0 None None None 2019-11-22 23:54:56 UTC
OpenStack gerrit 689257 0 'None' MERGED Log OVS firewall conjunction creation 2021-01-08 11:18:01 UTC
OpenStack gerrit 696236 0 'None' MERGED Add more condition to check sg member exist 2021-01-08 11:17:23 UTC
OpenStack gerrit 696977 0 'None' MERGED Add more condition to check sg member exist 2021-01-08 11:17:21 UTC
Red Hat Product Errata RHBA-2020:0770 0 None None None 2020-03-10 11:27:01 UTC

Description GenadiC 2019-03-13 14:39:50 UTC
Description of problem:
When trying to run a test on OCP on OSP environment the namespace isolation test fails (the connectivity between 2 namespaces succeeds when it shouldn't)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create 2 namespaces with pod/service on each of them
2. Try to reach from the pod on one namespace the pod/service on another namespace
3.

Actual results:
The connectivity between 2 namespaces succeeds

Expected results:
The connectivity between 2 namespaces should fail

Additional info:
For me it happens in 50% of attempts by running our automation test on it in kuryr-tempest-plugin.

The debug made by our R@D:

pods IPs and SGs

$ openstack port list | grep -e 10.11.11.11 -e 10.11.12.13
| 1d6ee8b6-7112-4dcd-b5ce-da2e47c36b00 |                                                                                   | fa:16:3e:51:45:cb | ip_address='10.11.11.11', subnet_id='40c2e3b5-ebe9-466a-bf28-841b226f6a36'    | ACTIVE |
| 264d16ae-40d6-4ff2-a2e2-2af23951dab4 |                                                                                   | fa:16:3e:98:b1:8d | ip_address='10.11.12.13', subnet_id='3064009d-f447-4d83-98db-a3b2aee69cb4'    | ACTIVE |

$ openstack port show 1d6ee8b6-7112-4dcd-b5ce-da2e47c36b00
+-----------------------+------------------------------------------------------------------------------------------------------------------+
| Field                 | Value                                                                                                            |
+-----------------------+------------------------------------------------------------------------------------------------------------------+
| admin_state_up        | UP                                                                                                               |
| allowed_address_pairs |                                                                                                                  |
| binding_host_id       | compute-0.localdomain                                                                                            |
| binding_profile       |                                                                                                                  |
| binding_vif_details   | datapath_type='system', ovs_hybrid_plug='False', port_filter='True'                                              |
| binding_vif_type      | ovs                                                                                                              |
| binding_vnic_type     | normal                                                                                                           |
| created_at            | 2019-03-12T15:27:59Z                                                                                             |
| data_plane_status     | None                                                                                                             |
| description           |                                                                                                                  |
| device_id             |                                                                                                                  |
| device_owner          | trunk:subport                                                                                                    |
| dns_assignment        | None                                                                                                             |
| dns_domain            | None                                                                                                             |
| dns_name              | None                                                                                                             |
| extra_dhcp_opts       |                                                                                                                  |
| fixed_ips             | ip_address='10.11.11.11', subnet_id='40c2e3b5-ebe9-466a-bf28-841b226f6a36'                                       |
| id                    | 1d6ee8b6-7112-4dcd-b5ce-da2e47c36b00                                                                             |
| mac_address           | fa:16:3e:51:45:cb                                                                                                |
| name                  |                                                                                                                  |
| network_id            | 0482639a-0d91-41c7-af63-504ae65992a4                                                                             |
| port_security_enabled | True                                                                                                             |
| project_id            | a6e6e3aafb5c4847918c9c86074c9d1a                                                                                 |
| qos_policy_id         | None                                                                                                             |
| revision_number       | 5                                                                                                                |
| security_group_ids    | 0c86db57-aac2-40ec-9cd3-c2de252eb291, 870ad8ad-f548-4bbc-99c5-787c7633d405, 9b37f7be-da6d-4a7b-9841-08e5f7d9da45 |
| status                | ACTIVE                                                                                                           |
| tags                  |                                                                                                                  |
| trunk_details         | None                                                                                                             |
| updated_at            | 2019-03-12T15:28:12Z                                                                                             |
+-----------------------+------------------------------------------------------------------------------------------------------------------+

$ openstack port show 264d16ae-40d6-4ff2-a2e2-2af23951dab4
+-----------------------+------------------------------------------------------------------------------------------------------------------+
| Field                 | Value                                                                                                            |
+-----------------------+------------------------------------------------------------------------------------------------------------------+
| admin_state_up        | UP                                                                                                               |
| allowed_address_pairs |                                                                                                                  |
| binding_host_id       | compute-0.localdomain                                                                                            |
| binding_profile       |                                                                                                                  |
| binding_vif_details   | datapath_type='system', ovs_hybrid_plug='False', port_filter='True'                                              |
| binding_vif_type      | ovs                                                                                                              |
| binding_vnic_type     | normal                                                                                                           |
| created_at            | 2019-03-12T15:28:10Z                                                                                             |
| data_plane_status     | None                                                                                                             |
| description           |                                                                                                                  |
| device_id             |                                                                                                                  |
| device_owner          | trunk:subport                                                                                                    |
| dns_assignment        | None                                                                                                             |
| dns_domain            | None                                                                                                             |
| dns_name              | None                                                                                                             |
| extra_dhcp_opts       |                                                                                                                  |
| fixed_ips             | ip_address='10.11.12.13', subnet_id='3064009d-f447-4d83-98db-a3b2aee69cb4'                                       |
| id                    | 264d16ae-40d6-4ff2-a2e2-2af23951dab4                                                                             |
| mac_address           | fa:16:3e:98:b1:8d                                                                                                |
| name                  |                                                                                                                  |
| network_id            | 313b52de-fda8-454f-aa6a-0ba7071838bb                                                                             |
| port_security_enabled | True                                                                                                             |
| project_id            | a6e6e3aafb5c4847918c9c86074c9d1a                                                                                 |
| qos_policy_id         | None                                                                                                             |
| revision_number       | 5                                                                                                                |
| security_group_ids    | 7bf37e7e-4942-4e72-b009-2d236c89b272, 870ad8ad-f548-4bbc-99c5-787c7633d405, 9b37f7be-da6d-4a7b-9841-08e5f7d9da45 |
| status                | ACTIVE                                                                                                           |
| tags                  |                                                                                                                  |
| trunk_details         | None                                                                                                             |
| updated_at            | 2019-03-12T15:28:21Z                                                                                             |
+-----------------------+------------------------------------------------------------------------------------------------------------------+


LoadBalancer SG:
$ openstack security group show 68d81bbc-ce2f-4a2a-bcf8-61eed09cc247 | grep ingress


| rules           | created_at='2019-03-12T15:29:40Z', description='test/demo:TCP:80', direction='ingress', ethertype='IPv4', id='24c9bb8a-e61b-4633-b761-8c163e017b2b', port_range_max='80', port_range_min='80', protocol='tcp', remote_ip_prefix='192.168.99.0/24', updated_at='2019-03-12T15:29:40Z'                     |
|                 | created_at='2019-03-12T15:29:39Z', description='test/demo:TCP:80', direction='ingress', ethertype='IPv4', id='2a8cc114-c46d-432f-89ec-563d29e02173', port_range_max='80', port_range_min='80', protocol='tcp', remote_group_id='aeb7071e-9af4-4383-b8b9-7ac7c8b36c40', updated_at='2019-03-12T15:29:39Z' |
|                 | created_at='2019-03-12T15:29:40Z', description='test/demo:TCP:80', direction='ingress', ethertype='IPv4', id='3ff46605-7b07-4749-b9eb-c002ba95c006', port_range_max='80', port_range_min='80', protocol='tcp', remote_ip_prefix='172.30.0.0/16', updated_at='2019-03-12T15:29:40Z'                       |
|                 | created_at='2019-03-12T15:29:39Z', description='test/demo:TCP:80', direction='ingress', ethertype='IPv4', id='6fcea2b7-134a-48f0-80ed-ae8391e78f83', port_range_max='80', port_range_min='80', protocol='tcp', remote_group_id='0c86db57-aac2-40ec-9cd3-c2de252eb291', updated_at='2019-03-12T15:29:39Z' |



List of SG ids:
$ openstack security group list | grep 0c86db57-aac2-40ec-9cd3-c2de252eb291
| 0c86db57-aac2-40ec-9cd3-c2de252eb291 | ns/test-sg                                                   |                                                                                                                                                       | a6e6e3aafb5c4847918c9c86074c9d1a | []   |
(shiftstack) [stack@undercloud-0 ~]$ openstack security group list | grep 7bf37e7e-4942-4e72-b009-2d236c89b272
| 7bf37e7e-4942-4e72-b009-2d236c89b272 | ns/test2-sg                                                  |                                                                                                                                                       | a6e6e3aafb5c4847918c9c86074c9d1a | []   |
(shiftstack) [stack@undercloud-0 ~]$ openstack security group list | grep 9b37f7be-da6d-4a7b-9841-08e5f7d9da45
^[[A| 9b37f7be-da6d-4a7b-9841-08e5f7d9da45 | openshift-ansible-openshift.example.com-allow_from_default   | Give access to the services and pods from the default namespace                                                                                       | a6e6e3aafb5c4847918c9c86074c9d1a | []   |
(shiftstack) [stack@undercloud-0 ~]$ openstack security group list | grep 870ad8ad-f548-4bbc-99c5-787c7633d405
| 870ad8ad-f548-4bbc-99c5-787c7633d405 | openshift-ansible-openshift.example.com-pod-service-secgrp   | Give services and nodes access to the pods                                                                                                            | a6e6e3aafb5c4847918c9c86074c9d1a | []   |
(shiftstack) [stack@undercloud-0 ~]$ openstack security group list | grep aeb7071e-9af4-4383-b8b9-7ac7c8b36c40
| aeb7071e-9af4-4383-b8b9-7ac7c8b36c40 | openshift-ansible-openshift.example.com-allow_from_namespace | Give access to the services and pods on the default namespace from the other namespaces                                                               | a6e6e3aafb5c4847918c9c86074c9d1a | []   |



curl to loadbalancer VIP works from port 10.11.11.11 (as expected as that port has sg_id 0c86db57-aac2-40ec-9cd3-c2de252eb291 and that is enabled by the loadbalancer SG ruels) but it also works from 10.11.12.13, which should not work as it is not allowed by any of the ingress rules at the loadbalancer SG rules (port is not from 192.168.99.0/24, nor from 172.30.0.0/16, and it does not include sg ids aeb7071e-9af4-4383-b8b9-7ac7c8b36c40, nor 0c86db57-aac2-40ec-9cd3-c2de252eb291

Comment 5 GenadiC 2019-03-19 12:01:07 UTC
Succeeded to reproduce it again.
It happened on OSP14   -p 2019-02-22.2

Comment 6 Nate Johnston 2019-03-19 14:18:43 UTC
Genadi,

Can you share what procedure you followed to reproduce this on OSP 14?

Thanks,

Nate

Comment 7 Luis Tomas Bolivar 2019-03-20 10:38:38 UTC
(In reply to Nate Johnston from comment #6)
> Genadi,
> 
> Can you share what procedure you followed to reproduce this on OSP 14?
> 
> Thanks,
> 
> Nate

Just to give a bit more context... The way OCP defines namespace isolation is the next:
- Pods in a namespace can only reach other pods in their on namespace plus the ones on the default namespace
- Pods in a namespace can only be reached by other pods in their own namespace plus the ones on the default namespace.

In kuryr we are translating that into OpenStack security groups, so, we create an specific SG group for each namespace and attach them to the pods in that namespace. This SG will allow all the traffic from other ports which include that SG id.

For handling the default namespace exception, we are also creating a couple of extra SGs, one (SG_default) attached to the pods on the default namespace, which will allow all the traffic from ports with the other SG (SG_namespace). And the second one (SG_namespace) that is attached to all the pods but the ones on the default namespace, and which will allow all the traffic from the ports that has SG_default SG.

Comment 8 GenadiC 2019-03-21 06:59:12 UTC
I used the automation test to reproduce the problem
Using the test from kuryr-tempest-plugin did the job.

Run the test test_namespace_sg_svc_isolation in the kuryr-tempest-plugin/kuryr_tempest_plugin/tests/scenario/test_namespace.py

Comment 9 Nate Johnston 2019-04-09 20:34:20 UTC
Genadi,

Can you let me have access to a reproducer environment?  I've tried a couple of ways to reproduce this and I keep having issues, and now I'm about to lose my DSAL host.

Thanks,

Nate

Comment 10 GenadiC 2019-04-10 12:31:10 UTC
Nate,
I am trying to create an environment so you can see the problem,
Will update when ready

Comment 17 GenadiC 2019-05-28 06:30:20 UTC
Provided an environment to Nate as we agreed, so he can take all the logs and see the problem live

Comment 24 GenadiC 2019-06-12 13:48:12 UTC
Agree with Luis to give Networking guys decide on blocker/non blocker part

Comment 30 GenadiC 2019-07-17 12:12:35 UTC
Tried to reproduce for the last week without any success, so closing it for now

Comment 33 Itzik Brown 2019-07-25 12:43:03 UTC
I'm running kuryr_tempest_plugin.tests.scenario.test_namespace.TestNamespaceScenario.test_namespace_sg_svc_isolation after running a test that restarts Neutron.
I'm able to reach from the pod on one namespace the pod/service on another namespace. 
I gave Nate a setup to check.

Comment 56 errata-xmlrpc 2020-03-10 11:26:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0770


Note You need to log in before you can comment on or make changes to this bug.