Bug 1688323

Summary: Namespace isolation fails due to race on neutron
Product: Red Hat OpenStack Reporter: GenadiC <gcheresh>
Component: openstack-neutronAssignee: Nate Johnston <njohnston>
Status: CLOSED ERRATA QA Contact: Jon Uriarte <juriarte>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: akatz, amuller, bhaley, chrisw, ekuris, itbrown, jlibosva, juriarte, ltomasbo, njohnston, racedoro, scohen, slinaber
Target Milestone: z11Keywords: Reopened, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-12.1.0-6.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1779369 1779374 (view as bug list) Environment:
Last Closed: 2020-03-10 11:26:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1800847    
Bug Blocks: 1779369, 1779374    

Description GenadiC 2019-03-13 14:39:50 UTC
Description of problem:
When trying to run a test on OCP on OSP environment the namespace isolation test fails (the connectivity between 2 namespaces succeeds when it shouldn't)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create 2 namespaces with pod/service on each of them
2. Try to reach from the pod on one namespace the pod/service on another namespace
3.

Actual results:
The connectivity between 2 namespaces succeeds

Expected results:
The connectivity between 2 namespaces should fail

Additional info:
For me it happens in 50% of attempts by running our automation test on it in kuryr-tempest-plugin.

The debug made by our R@D:

pods IPs and SGs

$ openstack port list | grep -e 10.11.11.11 -e 10.11.12.13
| 1d6ee8b6-7112-4dcd-b5ce-da2e47c36b00 |                                                                                   | fa:16:3e:51:45:cb | ip_address='10.11.11.11', subnet_id='40c2e3b5-ebe9-466a-bf28-841b226f6a36'    | ACTIVE |
| 264d16ae-40d6-4ff2-a2e2-2af23951dab4 |                                                                                   | fa:16:3e:98:b1:8d | ip_address='10.11.12.13', subnet_id='3064009d-f447-4d83-98db-a3b2aee69cb4'    | ACTIVE |

$ openstack port show 1d6ee8b6-7112-4dcd-b5ce-da2e47c36b00
+-----------------------+------------------------------------------------------------------------------------------------------------------+
| Field                 | Value                                                                                                            |
+-----------------------+------------------------------------------------------------------------------------------------------------------+
| admin_state_up        | UP                                                                                                               |
| allowed_address_pairs |                                                                                                                  |
| binding_host_id       | compute-0.localdomain                                                                                            |
| binding_profile       |                                                                                                                  |
| binding_vif_details   | datapath_type='system', ovs_hybrid_plug='False', port_filter='True'                                              |
| binding_vif_type      | ovs                                                                                                              |
| binding_vnic_type     | normal                                                                                                           |
| created_at            | 2019-03-12T15:27:59Z                                                                                             |
| data_plane_status     | None                                                                                                             |
| description           |                                                                                                                  |
| device_id             |                                                                                                                  |
| device_owner          | trunk:subport                                                                                                    |
| dns_assignment        | None                                                                                                             |
| dns_domain            | None                                                                                                             |
| dns_name              | None                                                                                                             |
| extra_dhcp_opts       |                                                                                                                  |
| fixed_ips             | ip_address='10.11.11.11', subnet_id='40c2e3b5-ebe9-466a-bf28-841b226f6a36'                                       |
| id                    | 1d6ee8b6-7112-4dcd-b5ce-da2e47c36b00                                                                             |
| mac_address           | fa:16:3e:51:45:cb                                                                                                |
| name                  |                                                                                                                  |
| network_id            | 0482639a-0d91-41c7-af63-504ae65992a4                                                                             |
| port_security_enabled | True                                                                                                             |
| project_id            | a6e6e3aafb5c4847918c9c86074c9d1a                                                                                 |
| qos_policy_id         | None                                                                                                             |
| revision_number       | 5                                                                                                                |
| security_group_ids    | 0c86db57-aac2-40ec-9cd3-c2de252eb291, 870ad8ad-f548-4bbc-99c5-787c7633d405, 9b37f7be-da6d-4a7b-9841-08e5f7d9da45 |
| status                | ACTIVE                                                                                                           |
| tags                  |                                                                                                                  |
| trunk_details         | None                                                                                                             |
| updated_at            | 2019-03-12T15:28:12Z                                                                                             |
+-----------------------+------------------------------------------------------------------------------------------------------------------+

$ openstack port show 264d16ae-40d6-4ff2-a2e2-2af23951dab4
+-----------------------+------------------------------------------------------------------------------------------------------------------+
| Field                 | Value                                                                                                            |
+-----------------------+------------------------------------------------------------------------------------------------------------------+
| admin_state_up        | UP                                                                                                               |
| allowed_address_pairs |                                                                                                                  |
| binding_host_id       | compute-0.localdomain                                                                                            |
| binding_profile       |                                                                                                                  |
| binding_vif_details   | datapath_type='system', ovs_hybrid_plug='False', port_filter='True'                                              |
| binding_vif_type      | ovs                                                                                                              |
| binding_vnic_type     | normal                                                                                                           |
| created_at            | 2019-03-12T15:28:10Z                                                                                             |
| data_plane_status     | None                                                                                                             |
| description           |                                                                                                                  |
| device_id             |                                                                                                                  |
| device_owner          | trunk:subport                                                                                                    |
| dns_assignment        | None                                                                                                             |
| dns_domain            | None                                                                                                             |
| dns_name              | None                                                                                                             |
| extra_dhcp_opts       |                                                                                                                  |
| fixed_ips             | ip_address='10.11.12.13', subnet_id='3064009d-f447-4d83-98db-a3b2aee69cb4'                                       |
| id                    | 264d16ae-40d6-4ff2-a2e2-2af23951dab4                                                                             |
| mac_address           | fa:16:3e:98:b1:8d                                                                                                |
| name                  |                                                                                                                  |
| network_id            | 313b52de-fda8-454f-aa6a-0ba7071838bb                                                                             |
| port_security_enabled | True                                                                                                             |
| project_id            | a6e6e3aafb5c4847918c9c86074c9d1a                                                                                 |
| qos_policy_id         | None                                                                                                             |
| revision_number       | 5                                                                                                                |
| security_group_ids    | 7bf37e7e-4942-4e72-b009-2d236c89b272, 870ad8ad-f548-4bbc-99c5-787c7633d405, 9b37f7be-da6d-4a7b-9841-08e5f7d9da45 |
| status                | ACTIVE                                                                                                           |
| tags                  |                                                                                                                  |
| trunk_details         | None                                                                                                             |
| updated_at            | 2019-03-12T15:28:21Z                                                                                             |
+-----------------------+------------------------------------------------------------------------------------------------------------------+


LoadBalancer SG:
$ openstack security group show 68d81bbc-ce2f-4a2a-bcf8-61eed09cc247 | grep ingress


| rules           | created_at='2019-03-12T15:29:40Z', description='test/demo:TCP:80', direction='ingress', ethertype='IPv4', id='24c9bb8a-e61b-4633-b761-8c163e017b2b', port_range_max='80', port_range_min='80', protocol='tcp', remote_ip_prefix='192.168.99.0/24', updated_at='2019-03-12T15:29:40Z'                     |
|                 | created_at='2019-03-12T15:29:39Z', description='test/demo:TCP:80', direction='ingress', ethertype='IPv4', id='2a8cc114-c46d-432f-89ec-563d29e02173', port_range_max='80', port_range_min='80', protocol='tcp', remote_group_id='aeb7071e-9af4-4383-b8b9-7ac7c8b36c40', updated_at='2019-03-12T15:29:39Z' |
|                 | created_at='2019-03-12T15:29:40Z', description='test/demo:TCP:80', direction='ingress', ethertype='IPv4', id='3ff46605-7b07-4749-b9eb-c002ba95c006', port_range_max='80', port_range_min='80', protocol='tcp', remote_ip_prefix='172.30.0.0/16', updated_at='2019-03-12T15:29:40Z'                       |
|                 | created_at='2019-03-12T15:29:39Z', description='test/demo:TCP:80', direction='ingress', ethertype='IPv4', id='6fcea2b7-134a-48f0-80ed-ae8391e78f83', port_range_max='80', port_range_min='80', protocol='tcp', remote_group_id='0c86db57-aac2-40ec-9cd3-c2de252eb291', updated_at='2019-03-12T15:29:39Z' |



List of SG ids:
$ openstack security group list | grep 0c86db57-aac2-40ec-9cd3-c2de252eb291
| 0c86db57-aac2-40ec-9cd3-c2de252eb291 | ns/test-sg                                                   |                                                                                                                                                       | a6e6e3aafb5c4847918c9c86074c9d1a | []   |
(shiftstack) [stack@undercloud-0 ~]$ openstack security group list | grep 7bf37e7e-4942-4e72-b009-2d236c89b272
| 7bf37e7e-4942-4e72-b009-2d236c89b272 | ns/test2-sg                                                  |                                                                                                                                                       | a6e6e3aafb5c4847918c9c86074c9d1a | []   |
(shiftstack) [stack@undercloud-0 ~]$ openstack security group list | grep 9b37f7be-da6d-4a7b-9841-08e5f7d9da45
^[[A| 9b37f7be-da6d-4a7b-9841-08e5f7d9da45 | openshift-ansible-openshift.example.com-allow_from_default   | Give access to the services and pods from the default namespace                                                                                       | a6e6e3aafb5c4847918c9c86074c9d1a | []   |
(shiftstack) [stack@undercloud-0 ~]$ openstack security group list | grep 870ad8ad-f548-4bbc-99c5-787c7633d405
| 870ad8ad-f548-4bbc-99c5-787c7633d405 | openshift-ansible-openshift.example.com-pod-service-secgrp   | Give services and nodes access to the pods                                                                                                            | a6e6e3aafb5c4847918c9c86074c9d1a | []   |
(shiftstack) [stack@undercloud-0 ~]$ openstack security group list | grep aeb7071e-9af4-4383-b8b9-7ac7c8b36c40
| aeb7071e-9af4-4383-b8b9-7ac7c8b36c40 | openshift-ansible-openshift.example.com-allow_from_namespace | Give access to the services and pods on the default namespace from the other namespaces                                                               | a6e6e3aafb5c4847918c9c86074c9d1a | []   |



curl to loadbalancer VIP works from port 10.11.11.11 (as expected as that port has sg_id 0c86db57-aac2-40ec-9cd3-c2de252eb291 and that is enabled by the loadbalancer SG ruels) but it also works from 10.11.12.13, which should not work as it is not allowed by any of the ingress rules at the loadbalancer SG rules (port is not from 192.168.99.0/24, nor from 172.30.0.0/16, and it does not include sg ids aeb7071e-9af4-4383-b8b9-7ac7c8b36c40, nor 0c86db57-aac2-40ec-9cd3-c2de252eb291

Comment 5 GenadiC 2019-03-19 12:01:07 UTC
Succeeded to reproduce it again.
It happened on OSP14   -p 2019-02-22.2

Comment 6 Nate Johnston 2019-03-19 14:18:43 UTC
Genadi,

Can you share what procedure you followed to reproduce this on OSP 14?

Thanks,

Nate

Comment 7 Luis Tomas Bolivar 2019-03-20 10:38:38 UTC
(In reply to Nate Johnston from comment #6)
> Genadi,
> 
> Can you share what procedure you followed to reproduce this on OSP 14?
> 
> Thanks,
> 
> Nate

Just to give a bit more context... The way OCP defines namespace isolation is the next:
- Pods in a namespace can only reach other pods in their on namespace plus the ones on the default namespace
- Pods in a namespace can only be reached by other pods in their own namespace plus the ones on the default namespace.

In kuryr we are translating that into OpenStack security groups, so, we create an specific SG group for each namespace and attach them to the pods in that namespace. This SG will allow all the traffic from other ports which include that SG id.

For handling the default namespace exception, we are also creating a couple of extra SGs, one (SG_default) attached to the pods on the default namespace, which will allow all the traffic from ports with the other SG (SG_namespace). And the second one (SG_namespace) that is attached to all the pods but the ones on the default namespace, and which will allow all the traffic from the ports that has SG_default SG.

Comment 8 GenadiC 2019-03-21 06:59:12 UTC
I used the automation test to reproduce the problem
Using the test from kuryr-tempest-plugin did the job.

Run the test test_namespace_sg_svc_isolation in the kuryr-tempest-plugin/kuryr_tempest_plugin/tests/scenario/test_namespace.py

Comment 9 Nate Johnston 2019-04-09 20:34:20 UTC
Genadi,

Can you let me have access to a reproducer environment?  I've tried a couple of ways to reproduce this and I keep having issues, and now I'm about to lose my DSAL host.

Thanks,

Nate

Comment 10 GenadiC 2019-04-10 12:31:10 UTC
Nate,
I am trying to create an environment so you can see the problem,
Will update when ready

Comment 17 GenadiC 2019-05-28 06:30:20 UTC
Provided an environment to Nate as we agreed, so he can take all the logs and see the problem live

Comment 24 GenadiC 2019-06-12 13:48:12 UTC
Agree with Luis to give Networking guys decide on blocker/non blocker part

Comment 30 GenadiC 2019-07-17 12:12:35 UTC
Tried to reproduce for the last week without any success, so closing it for now

Comment 33 Itzik Brown 2019-07-25 12:43:03 UTC
I'm running kuryr_tempest_plugin.tests.scenario.test_namespace.TestNamespaceScenario.test_namespace_sg_svc_isolation after running a test that restarts Neutron.
I'm able to reach from the pod on one namespace the pod/service on another namespace. 
I gave Nate a setup to check.

Comment 56 errata-xmlrpc 2020-03-10 11:26:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0770