Bug 1613280 - Provisioning two APB services temporarily broke networking in the namespace
Summary: Provisioning two APB services temporarily broke networking in the namespace
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.z
Assignee: Jesus M. Rodriguez
QA Contact: Zihan Tang
URL:
Whiteboard:
Depends On:
Blocks: 1643300 1643301 1643303
TreeView+ depends on / blocked
 
Reported: 2018-08-07 11:11 UTC by Birol Bilgin
Modified: 2019-06-06 06:56 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The Automation Broker always created a network policy to give the transient namespace access to the target namespace. Consequence: Adding a network policy to a namespace that does not have any other network policies in place causes the namespace to be locked down to the newly created policy. Before the network policy, everything was open and namespaces could communicate with each other. Fix: The Automation Broker looks to see if there are any network policies in place for the target namespace. If there are none, the broker will not create a new network policy. The broker will assume that things are open enough to allow the transient namespace we create to communicate with the target namespace. The broker will still create a network policy giving the transient namespace access to the target namespace, if there are other network policies in place for the target namespace. Result: The fix allows the broker to perform the APB actions without affecting existing services running on the target namespace.
Clone Of:
: 1643300 1643301 1643303 (view as bug list)
Environment:
Last Closed: 2019-06-06 06:56:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
broker log showing a successful provision (22.43 KB, text/plain)
2018-09-28 18:08 UTC, Jesus M. Rodriguez
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0788 0 None None None 2019-06-06 06:56:14 UTC

Comment 2 Zhang Cheng 2018-09-13 02:55:21 UTC
I noticed John move target to 4.0 since 3.11 code freeze is nearly, but don't know if developer team accept and think it is a valid bug. 
Suggest you to ask developer team if it is enough you provided information for debugging, and then to check root cause.

You can needinfo to jmatthew

Comment 5 Jesus M. Rodriguez 2018-09-28 17:55:37 UTC
Adding my comment from the support ticket to the bugzilla:

I was able to recreate the scenario on our end using the openshift-ovs-networkpolicy plugin. I had two pods in a test namespace. One pod served up files, the other one fetched files. Then I provisioned a postgresql database into the same test namespace. The fetching and serving pods stopped talking to each other until the provision process was finished. I'm in the process of working on a solution to this problem.

Comment 6 Jesus M. Rodriguez 2018-09-28 18:06:45 UTC
Initial testing of a possible fix seems to work.

With 2 pods talking to each other every 2 seconds and no active network policies on the namespace, I was able to provision a postgresql db using the broker. No new network policy was added, and postgresql db was deployed successfully. The 2 other pods continued to talk.

My test server application logs a request for every hit. The fetcher pod makes a request to the server every 2 seconds.

2018/09/28 15:41:54 10.128.0.133:56400 GET /testfile
2018/09/28 15:41:56 10.128.0.133:56412 GET /testfile
==========     Provision started at 15:41:57        
2018/09/28 15:41:58 10.128.0.133:56420 GET /testfile
2018/09/28 15:42:00 10.128.0.133:56446 GET /testfile
2018/09/28 15:42:02 10.128.0.133:56458 GET /testfile
2018/09/28 15:42:04 10.128.0.133:56492 GET /testfile
2018/09/28 15:42:06 10.128.0.133:56510 GET /testfile
2018/09/28 15:42:08 10.128.0.133:56520 GET /testfile
2018/09/28 15:42:10 10.128.0.133:56534 GET /testfile
2018/09/28 15:42:12 10.128.0.133:56548 GET /testfile
==========     Provision finished at 15:42:12       
2018/09/28 15:42:14 10.128.0.133:56570 GET /testfile
2018/09/28 15:42:16 10.128.0.133:56580 GET /testfile

The broker log snippet, with the provision call present shows that we didn't create a new network policy (that's the change I made recently).

...

time="2018-09-28T15:41:57Z" level=debug msg="authorize decision: allowed"
time="2018-09-28T15:41:57Z" level=debug msg="get spec: 1dda1477cace09730bd8ed7a6505607e"
time="2018-09-28T15:41:57Z" level=debug msg="Injecting PlanID as parameter: { _apb_plan_id: dev }"
time="2018-09-28T15:41:57Z" level=debug msg="Injecting ServiceClassID as parameter: { _apb_service_class_id: 1dda1477cace09730bd8ed7a6505607e }"
time="2018-09-28T15:41:57Z" level=debug msg="Injecting ServiceInstanceID as parameter: { _apb_service_instance_id: 07db1f40-c335-11e8-a7fc-0a580a800009 }"
time="2018-09-28T15:41:57Z" level=debug msg="Injecting lastRequestingUserKey as parameter: { _apb_last_requesting_user: admin }"
time="2018-09-28T15:41:57Z" level=debug msg="get service instance: 07db1f40-c335-11e8-a7fc-0a580a800009"
time="2018-09-28T15:41:57Z" level=debug msg="set service instance: 07db1f40-c335-11e8-a7fc-0a580a800009"
time="2018-09-28T15:41:57Z" level=info msg="ASYNC provisioning in progress"
time="2018-09-28T15:41:57Z" level=debug msg="set job state for instance: 07db1f40-c335-11e8-a7fc-0a580a800009 token: c06b37db-2c8c-4017-b892-507dd37fa3c0"
10.128.0.10 - - [28/Sep/2018:15:41:57 +0000] "PUT /osb/v2/service_instances/07db1f40-c335-11e8-a7fc-0a580a800009?accepts_incomplete=true HTTP/1.1" 202 58
time="2018-09-28T15:41:57Z" level=info msg="============================================================"
time="2018-09-28T15:41:57Z" level=info msg="                       PROVISIONING                         "
time="2018-09-28T15:41:57Z" level=info msg="============================================================"
time="2018-09-28T15:41:57Z" level=info msg="Spec.ID: 1dda1477cace09730bd8ed7a6505607e"
time="2018-09-28T15:41:57Z" level=info msg="Spec.Name: dh-postgresql-apb"
time="2018-09-28T15:41:57Z" level=info msg="Spec.Image: asb-registry.usersys.redhat.com:5000/openshift3/postgresql-apb:v3.11"
time="2018-09-28T15:41:57Z" level=info msg="Spec.Description: SCL PostgreSQL apb implementation"
time="2018-09-28T15:41:57Z" level=info msg="============================================================"
time="2018-09-28T15:41:57Z" level=debug msg="executor::actionStarted"
time="2018-09-28T15:41:57Z" level=debug msg="JobStateSubscriber Notify : msg state {c06b37db-2c8c-4017-b892-507dd37fa3c0 in progress  provision  action started} "
time="2018-09-28T15:41:57Z" level=debug msg="set job state for instance: 07db1f40-c335-11e8-a7fc-0a580a800009 token: c06b37db-2c8c-4017-b892-507dd37fa3c0"
time="2018-09-28T15:41:57Z" level=debug msg="XXX Policies {[]v1.NetworkPolicy{}}\n"
time="2018-09-28T15:41:57Z" level=info msg="No network policies found. Assuming things are open, skip network policy creation"
time="2018-09-28T15:41:57Z" level=debug msg="Trying to create apb sandbox: [ bundle-effa9925-77b3-4af7-8f69-53103fbc1350 ], with edit permissions in namespace dh-postgresql-apb-prov-q2ct4"
time="2018-09-28T15:41:57Z" level=info msg="Creating RoleBinding bundle-effa9925-77b3-4af7-8f69-53103fbc1350"
time="2018-09-28T15:41:57Z" level=info msg="Creating RoleBinding bundle-effa9925-77b3-4af7-8f69-53103fbc1350"
time="2018-09-28T15:41:57Z" level=info msg="Successfully created apb sandbox: [ bundle-effa9925-77b3-4af7-8f69-53103fbc1350 ], with edit permissions in namespace [ dh-postgresql-apb-prov-q2ct4 ]"
time="2018-09-28T15:41:57Z" level=debug msg="Running post create sandbox functions if defined."
time="2018-09-28T15:41:57Z" level=debug msg="ExecutingApb:"
time="2018-09-28T15:41:57Z" level=debug msg="name:[ dh-postgresql-apb ]"
time="2018-09-28T15:41:57Z" level=debug msg="image:[ asb-registry.usersys.redhat.com:5000/openshift3/postgresql-apb:v3.11 ]"
time="2018-09-28T15:41:57Z" level=debug msg="action:[ provision ]"
time="2018-09-28T15:41:57Z" level=debug msg="pullPolicy:[ Always ]"
time="2018-09-28T15:41:57Z" level=debug msg="role:[ edit ]"
time="2018-09-28T15:41:57Z" level=warning msg="Proxy env vars found, but no values configured."
client returned err  configmaps "07db1f40-c335-11e8-a7fc-0a580a800009-state" not found
time="2018-09-28T15:41:57Z" level=info msg="Creating pod \"bundle-effa9925-77b3-4af7-8f69-53103fbc1350\" in the dh-postgresql-apb-prov-q2ct4 namespace"
time="2018-09-28T15:41:57Z" level=debug msg="Watching pod [ bundle-effa9925-77b3-4af7-8f69-53103fbc1350 ] in namespace [ dh-postgresql-apb-prov-q2ct4 ] for completion"
...
time="2018-09-28T15:42:12Z" level=info msg="Request: \"GET /osb/v2/service_instances/07db1f40-c335-11e8-a7fc-0a580a800009/last_operation?operation=c06b37db-2c8c-4017-b892-507dd37fa3c0&plan_id=7f4a5e35e4af2beb70076e72fab0b7ff&service_id=1dda1477cace09730bd8ed7a6505607e HTTP/1.1\\r\\nHost: asb.openshift-ansible-service-broker.svc:1338\\r\\nAccept-Encoding: gzip\\r\\nUser-Agent: Go-http-client/1.1\\r\\nX-Broker-Api-Originating-Identity: kubernetes eyJ1c2VybmFtZSI6ImFkbWluIiwidWlkIjoiIiwiZ3JvdXBzIjpbInN5c3RlbTphdXRoZW50aWNhdGVkOm9hdXRoIiwic3lzdGVtOmF1dGhlbnRpY2F0ZWQiXSwiZXh0cmEiOnsic2NvcGVzLmF1dGhvcml6YXRpb24ub3BlbnNoaWZ0LmlvIjpbInVzZXI6ZnVsbCJdfX0=\\r\\nX-Broker-Api-Version: 2.13\\r\\n\\r\\n\""
time="2018-09-28T15:42:12Z" level=debug msg="service_id: 1dda1477cace09730bd8ed7a6505607e"
time="2018-09-28T15:42:12Z" level=debug msg="plan_id: 7f4a5e35e4af2beb70076e72fab0b7ff"
time="2018-09-28T15:42:12Z" level=debug msg="operation:  c06b37db-2c8c-4017-b892-507dd37fa3c0"
time="2018-09-28T15:42:12Z" level=debug msg="state: succeeded"
time="2018-09-28T15:42:12Z" level=debug msg="description: provision job completed"
10.128.0.10 - - [28/Sep/2018:15:42:12 +0000] "GET /osb/v2/service_instances/07db1f40-c335-11e8-a7fc-0a580a800009/last_operation?operation=c06b37db-2c8c-4017-b892-507dd37fa3c0&plan_id=7f4a5e35e4af2beb70076e72fab0b7ff&service_id=1dda1477cace09730bd8ed7a6505607e HTTP/1.1" 200 71

The provision started at 2018-09-28T15:41:57Z and completed at 2018-09-28T15:42:12Z. And in the sample server log at the top, it continued to receive requests every 2 seconds from the fetcher.

Comment 7 Jesus M. Rodriguez 2018-09-28 18:08:00 UTC
Created attachment 1488183 [details]
broker log showing a successful provision

This log shows a successful provision while not creating a new network policy.

Comment 15 Zihan Tang 2018-11-30 08:40:47 UTC
With ansible-service-broker:v3.9.55, version:1.1.18,
When provision and deprovision, it will not create new networkpolicy, 
but at the end of provision or deprovision , it will try to delete the networkpolicy, and log errors in asb pod:

provision:

[2018-11-30T08:32:27.285Z] [INFO] - No network policies found. Assuming things are open, skip network policy creation
[2018-11-30T08:32:27.285Z] [INFO] - Successfully created apb sandbox: [ apb-415777e0-a438-4c6d-9574-7609111edb97 ], with edit permissions in namespace rh-mediawiki-apb-prov-r9jh5
.....
[2018-11-30T08:33:12.457Z] [DEBUG] - Deleting network policy for pod: apb-415777e0-a438-4c6d-9574-7609111edb97 to grant network access to ns: debug
[2018-11-30T08:33:12.46Z] [ERROR] - unable to delete the network policy object - networkpolicies.networking.k8s.io "apb-415777e0-a438-4c6d-9574-7609111edb97" not found
[2018-11-30T08:33:12.46Z] [DEBUG] - received provision message from buffer

deprovision:
[2018-11-30T08:33:43.858Z] [INFO] - No network policies found. Assuming things are open, skip network policy creation
[2018-11-30T08:33:43.858Z] [INFO] - Successfully created apb sandbox: [ apb-1d94403a-e1dd-4982-b4a5-65146cb8cfc6 ], with edit permissions in namespace rh-mediawiki-apb-depr-jfj9v
...
[2018-11-30T08:34:33.962Z] [DEBUG] - Deleting network policy for pod: apb-1d94403a-e1dd-4982-b4a5-65146cb8cfc6 to grant network access to ns: debug
[2018-11-30T08:34:33.965Z] [ERROR] - unable to delete the network policy object - networkpolicies.networking.k8s.io "apb-1d94403a-e1dd-4982-b4a5-65146cb8cfc6" not found

It's better to remove the delete networkpolicy step or checking before delete.

Comment 17 Jesus M. Rodriguez 2019-03-07 18:58:37 UTC
Fixed by PR https://github.com/openshift/ansible-service-broker/pull/1179
Broker version 1.1.19 or newer

Comment 19 Zihan Tang 2019-04-18 09:59:55 UTC
VERIFIED.
ASB version: 1.1.20

Comment 21 errata-xmlrpc 2019-06-06 06:56:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0788


Note You need to log in before you can comment on or make changes to this bug.