Bug 1613280
Summary: | Provisioning two APB services temporarily broke networking in the namespace | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Birol Bilgin <bbilgin> | ||||
Component: | Service Broker | Assignee: | Jesus M. Rodriguez <jesusr> | ||||
Status: | CLOSED ERRATA | QA Contact: | Zihan Tang <zitang> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.9.0 | CC: | aos-bugs, chezhang, dcaldwel, jdesousa, jiazha, jmatthew, shurley, zitang | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.9.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: The Automation Broker always created a network policy to give the transient namespace access to the target namespace.
Consequence: Adding a network policy to a namespace that does not have any other network policies in place causes the namespace to be locked down to the newly created policy. Before the network policy, everything was open and namespaces could communicate with each other.
Fix: The Automation Broker looks to see if there are any network policies in place for the target namespace. If there are none, the broker will not create a new network policy. The broker will assume that things are open enough to allow the transient namespace we create to communicate with the target namespace.
The broker will still create a network policy giving the transient namespace access to the target namespace, if there are other network policies in place for the target namespace.
Result: The fix allows the broker to perform the APB actions without affecting existing services running on the target namespace.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1643300 1643301 1643303 (view as bug list) | Environment: | |||||
Last Closed: | 2019-06-06 06:56:05 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1643300, 1643301, 1643303 | ||||||
Attachments: |
|
Comment 2
Zhang Cheng
2018-09-13 02:55:21 UTC
Adding my comment from the support ticket to the bugzilla: I was able to recreate the scenario on our end using the openshift-ovs-networkpolicy plugin. I had two pods in a test namespace. One pod served up files, the other one fetched files. Then I provisioned a postgresql database into the same test namespace. The fetching and serving pods stopped talking to each other until the provision process was finished. I'm in the process of working on a solution to this problem. Initial testing of a possible fix seems to work. With 2 pods talking to each other every 2 seconds and no active network policies on the namespace, I was able to provision a postgresql db using the broker. No new network policy was added, and postgresql db was deployed successfully. The 2 other pods continued to talk. My test server application logs a request for every hit. The fetcher pod makes a request to the server every 2 seconds. 2018/09/28 15:41:54 10.128.0.133:56400 GET /testfile 2018/09/28 15:41:56 10.128.0.133:56412 GET /testfile ========== Provision started at 15:41:57 2018/09/28 15:41:58 10.128.0.133:56420 GET /testfile 2018/09/28 15:42:00 10.128.0.133:56446 GET /testfile 2018/09/28 15:42:02 10.128.0.133:56458 GET /testfile 2018/09/28 15:42:04 10.128.0.133:56492 GET /testfile 2018/09/28 15:42:06 10.128.0.133:56510 GET /testfile 2018/09/28 15:42:08 10.128.0.133:56520 GET /testfile 2018/09/28 15:42:10 10.128.0.133:56534 GET /testfile 2018/09/28 15:42:12 10.128.0.133:56548 GET /testfile ========== Provision finished at 15:42:12 2018/09/28 15:42:14 10.128.0.133:56570 GET /testfile 2018/09/28 15:42:16 10.128.0.133:56580 GET /testfile The broker log snippet, with the provision call present shows that we didn't create a new network policy (that's the change I made recently). ... time="2018-09-28T15:41:57Z" level=debug msg="authorize decision: allowed" time="2018-09-28T15:41:57Z" level=debug msg="get spec: 1dda1477cace09730bd8ed7a6505607e" time="2018-09-28T15:41:57Z" level=debug msg="Injecting PlanID as parameter: { _apb_plan_id: dev }" time="2018-09-28T15:41:57Z" level=debug msg="Injecting ServiceClassID as parameter: { _apb_service_class_id: 1dda1477cace09730bd8ed7a6505607e }" time="2018-09-28T15:41:57Z" level=debug msg="Injecting ServiceInstanceID as parameter: { _apb_service_instance_id: 07db1f40-c335-11e8-a7fc-0a580a800009 }" time="2018-09-28T15:41:57Z" level=debug msg="Injecting lastRequestingUserKey as parameter: { _apb_last_requesting_user: admin }" time="2018-09-28T15:41:57Z" level=debug msg="get service instance: 07db1f40-c335-11e8-a7fc-0a580a800009" time="2018-09-28T15:41:57Z" level=debug msg="set service instance: 07db1f40-c335-11e8-a7fc-0a580a800009" time="2018-09-28T15:41:57Z" level=info msg="ASYNC provisioning in progress" time="2018-09-28T15:41:57Z" level=debug msg="set job state for instance: 07db1f40-c335-11e8-a7fc-0a580a800009 token: c06b37db-2c8c-4017-b892-507dd37fa3c0" 10.128.0.10 - - [28/Sep/2018:15:41:57 +0000] "PUT /osb/v2/service_instances/07db1f40-c335-11e8-a7fc-0a580a800009?accepts_incomplete=true HTTP/1.1" 202 58 time="2018-09-28T15:41:57Z" level=info msg="============================================================" time="2018-09-28T15:41:57Z" level=info msg=" PROVISIONING " time="2018-09-28T15:41:57Z" level=info msg="============================================================" time="2018-09-28T15:41:57Z" level=info msg="Spec.ID: 1dda1477cace09730bd8ed7a6505607e" time="2018-09-28T15:41:57Z" level=info msg="Spec.Name: dh-postgresql-apb" time="2018-09-28T15:41:57Z" level=info msg="Spec.Image: asb-registry.usersys.redhat.com:5000/openshift3/postgresql-apb:v3.11" time="2018-09-28T15:41:57Z" level=info msg="Spec.Description: SCL PostgreSQL apb implementation" time="2018-09-28T15:41:57Z" level=info msg="============================================================" time="2018-09-28T15:41:57Z" level=debug msg="executor::actionStarted" time="2018-09-28T15:41:57Z" level=debug msg="JobStateSubscriber Notify : msg state {c06b37db-2c8c-4017-b892-507dd37fa3c0 in progress provision action started} " time="2018-09-28T15:41:57Z" level=debug msg="set job state for instance: 07db1f40-c335-11e8-a7fc-0a580a800009 token: c06b37db-2c8c-4017-b892-507dd37fa3c0" time="2018-09-28T15:41:57Z" level=debug msg="XXX Policies {[]v1.NetworkPolicy{}}\n" time="2018-09-28T15:41:57Z" level=info msg="No network policies found. Assuming things are open, skip network policy creation" time="2018-09-28T15:41:57Z" level=debug msg="Trying to create apb sandbox: [ bundle-effa9925-77b3-4af7-8f69-53103fbc1350 ], with edit permissions in namespace dh-postgresql-apb-prov-q2ct4" time="2018-09-28T15:41:57Z" level=info msg="Creating RoleBinding bundle-effa9925-77b3-4af7-8f69-53103fbc1350" time="2018-09-28T15:41:57Z" level=info msg="Creating RoleBinding bundle-effa9925-77b3-4af7-8f69-53103fbc1350" time="2018-09-28T15:41:57Z" level=info msg="Successfully created apb sandbox: [ bundle-effa9925-77b3-4af7-8f69-53103fbc1350 ], with edit permissions in namespace [ dh-postgresql-apb-prov-q2ct4 ]" time="2018-09-28T15:41:57Z" level=debug msg="Running post create sandbox functions if defined." time="2018-09-28T15:41:57Z" level=debug msg="ExecutingApb:" time="2018-09-28T15:41:57Z" level=debug msg="name:[ dh-postgresql-apb ]" time="2018-09-28T15:41:57Z" level=debug msg="image:[ asb-registry.usersys.redhat.com:5000/openshift3/postgresql-apb:v3.11 ]" time="2018-09-28T15:41:57Z" level=debug msg="action:[ provision ]" time="2018-09-28T15:41:57Z" level=debug msg="pullPolicy:[ Always ]" time="2018-09-28T15:41:57Z" level=debug msg="role:[ edit ]" time="2018-09-28T15:41:57Z" level=warning msg="Proxy env vars found, but no values configured." client returned err configmaps "07db1f40-c335-11e8-a7fc-0a580a800009-state" not found time="2018-09-28T15:41:57Z" level=info msg="Creating pod \"bundle-effa9925-77b3-4af7-8f69-53103fbc1350\" in the dh-postgresql-apb-prov-q2ct4 namespace" time="2018-09-28T15:41:57Z" level=debug msg="Watching pod [ bundle-effa9925-77b3-4af7-8f69-53103fbc1350 ] in namespace [ dh-postgresql-apb-prov-q2ct4 ] for completion" ... time="2018-09-28T15:42:12Z" level=info msg="Request: \"GET /osb/v2/service_instances/07db1f40-c335-11e8-a7fc-0a580a800009/last_operation?operation=c06b37db-2c8c-4017-b892-507dd37fa3c0&plan_id=7f4a5e35e4af2beb70076e72fab0b7ff&service_id=1dda1477cace09730bd8ed7a6505607e HTTP/1.1\\r\\nHost: asb.openshift-ansible-service-broker.svc:1338\\r\\nAccept-Encoding: gzip\\r\\nUser-Agent: Go-http-client/1.1\\r\\nX-Broker-Api-Originating-Identity: kubernetes eyJ1c2VybmFtZSI6ImFkbWluIiwidWlkIjoiIiwiZ3JvdXBzIjpbInN5c3RlbTphdXRoZW50aWNhdGVkOm9hdXRoIiwic3lzdGVtOmF1dGhlbnRpY2F0ZWQiXSwiZXh0cmEiOnsic2NvcGVzLmF1dGhvcml6YXRpb24ub3BlbnNoaWZ0LmlvIjpbInVzZXI6ZnVsbCJdfX0=\\r\\nX-Broker-Api-Version: 2.13\\r\\n\\r\\n\"" time="2018-09-28T15:42:12Z" level=debug msg="service_id: 1dda1477cace09730bd8ed7a6505607e" time="2018-09-28T15:42:12Z" level=debug msg="plan_id: 7f4a5e35e4af2beb70076e72fab0b7ff" time="2018-09-28T15:42:12Z" level=debug msg="operation: c06b37db-2c8c-4017-b892-507dd37fa3c0" time="2018-09-28T15:42:12Z" level=debug msg="state: succeeded" time="2018-09-28T15:42:12Z" level=debug msg="description: provision job completed" 10.128.0.10 - - [28/Sep/2018:15:42:12 +0000] "GET /osb/v2/service_instances/07db1f40-c335-11e8-a7fc-0a580a800009/last_operation?operation=c06b37db-2c8c-4017-b892-507dd37fa3c0&plan_id=7f4a5e35e4af2beb70076e72fab0b7ff&service_id=1dda1477cace09730bd8ed7a6505607e HTTP/1.1" 200 71 The provision started at 2018-09-28T15:41:57Z and completed at 2018-09-28T15:42:12Z. And in the sample server log at the top, it continued to receive requests every 2 seconds from the fetcher. Created attachment 1488183 [details]
broker log showing a successful provision
This log shows a successful provision while not creating a new network policy.
Fixed in bundle-lib 0.2.15 and 0.1.9 https://github.com/automationbroker/bundle-lib/pull/172 https://github.com/automationbroker/bundle-lib/pull/173 Updated 3.10, 3.11 & 4.0 brokers to use latest bundle-lib https://github.com/openshift/ansible-service-broker/pull/1111 https://github.com/openshift/ansible-service-broker/pull/1112 https://github.com/openshift/ansible-service-broker/pull/1113 Also fixed in 3.9 broker https://github.com/openshift/ansible-service-broker/pull/1102 With ansible-service-broker:v3.9.55, version:1.1.18, When provision and deprovision, it will not create new networkpolicy, but at the end of provision or deprovision , it will try to delete the networkpolicy, and log errors in asb pod: provision: [2018-11-30T08:32:27.285Z] [INFO] - No network policies found. Assuming things are open, skip network policy creation [2018-11-30T08:32:27.285Z] [INFO] - Successfully created apb sandbox: [ apb-415777e0-a438-4c6d-9574-7609111edb97 ], with edit permissions in namespace rh-mediawiki-apb-prov-r9jh5 ..... [2018-11-30T08:33:12.457Z] [DEBUG] - Deleting network policy for pod: apb-415777e0-a438-4c6d-9574-7609111edb97 to grant network access to ns: debug [2018-11-30T08:33:12.46Z] [ERROR] - unable to delete the network policy object - networkpolicies.networking.k8s.io "apb-415777e0-a438-4c6d-9574-7609111edb97" not found [2018-11-30T08:33:12.46Z] [DEBUG] - received provision message from buffer deprovision: [2018-11-30T08:33:43.858Z] [INFO] - No network policies found. Assuming things are open, skip network policy creation [2018-11-30T08:33:43.858Z] [INFO] - Successfully created apb sandbox: [ apb-1d94403a-e1dd-4982-b4a5-65146cb8cfc6 ], with edit permissions in namespace rh-mediawiki-apb-depr-jfj9v ... [2018-11-30T08:34:33.962Z] [DEBUG] - Deleting network policy for pod: apb-1d94403a-e1dd-4982-b4a5-65146cb8cfc6 to grant network access to ns: debug [2018-11-30T08:34:33.965Z] [ERROR] - unable to delete the network policy object - networkpolicies.networking.k8s.io "apb-1d94403a-e1dd-4982-b4a5-65146cb8cfc6" not found It's better to remove the delete networkpolicy step or checking before delete. Fixed by PR https://github.com/openshift/ansible-service-broker/pull/1179 Broker version 1.1.19 or newer VERIFIED. ASB version: 1.1.20 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0788 |