Bug 1512430

Summary: [ASB]postgresql APB provision fails in multitenant environment
Product: OpenShift Container Platform Reporter: Weihua Meng <wmeng>
Component: Service BrokerAssignee: Jason Montleon <jmontleo>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.7.0CC: aos-bugs, jiazha, jmatthew, jmontleo, pweil, wmeng, xtian
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-18 13:23:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Jian Zhang 2017-11-13 10:03:14 UTC
I went through the related code, found those warnings("Bind credentials not available yet") is from here: https://github.com/openshift/ansible-service-broker/blob/5426d5e808665a823f42609406b83496d4310eb5/pkg/apb/ext_creds.go#L60.

Seems "Extract Credentials" failed. Just for reference.

Comment 2 Jason Montleon 2017-11-13 18:00:05 UTC
You're failing here:

TASK [rhscl-postgresql-apb-openshift : Wait for postgres to come up] ***********
fatal: [localhost]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for 172.30.145.236:5432"}
	to retry, use: --limit @/opt/apb/actions/provision.retry

It does look like the pod is running:
NAME                    READY     STATUS    RESTARTS   AGE
po/postgresql-1-d4jpp   1/1       Running   0          12m

The ansible wait_for task us trying to poll the port. Is it possible it is unavailable due to firewall policy or misconfiguration of the openshift networking?

Comment 5 Jason Montleon 2017-11-14 13:53:25 UTC
Looking at https://docs.openshift.com/enterprise/3.1/architecture/additional_concepts/sdn.html I see it states:

"The ovs-multitenant plug-in provides OpenShift Enterprise project level isolation for pods and services. Each project receives a unique Virtual Network ID (VNID) that identifies traffic from pods assigned to the project. Pods from different projects cannot send packets to or receive packets from pods and services of a different project.

However, projects which receive VNID 0 are more privileged in that they are allowed to communicate with all other pods, and all other pods can communicate with them. In OpenShift Enterprise clusters, the default project has VNID 0. This facilitates certain services like the load balancer, etc. to communicate with all other pods in the cluster and vice versa."

So this makes sense to me that we're failing in a multitenant environment. We have an APB pod in one project that is attempting to communicate with a pod in another project. By the nature of this plugin that is disallowed.

If we need APB's to communicate with pods they're working to provision, update, or deprovision, we may need to have them launch within the same namespace or ensure that if they're launching in a multitenant environment APB projects are receving VNID 0 so they can talk to pods they launch. This may be a security concern, however.

Comment 6 Jason Montleon 2017-11-14 13:59:28 UTC
Looking at the APB code the only thing happening after the wait is encoding binding credentials. After talking with cchase it sounds like there is no need to wait, so as an immediate fix we can probably just remove the check/wait. It honestly do not know why the wait there and it is quite possibly an artifact from one of the demos that was seeding data as part of the database provision.

Comment 11 Jason Montleon 2017-11-14 15:21:34 UTC
postgresql-apb 3.7.7-2 or better should fix this with package version  postgresql-apb-role-1.0.14-1.el7

When you have the opportunity please also verify that mediawiki, mariadb, and mysql provision properly. Nothing is standing out to me that should prevent this from working, but I'd like to be certain.

Comment 14 Weihua Meng 2017-11-15 02:56:47 UTC
The fix works fine.

# docker images
REPOSITORY                                                                    TAG                 IMAGE ID            CREATED             SIZE
docker.io/ansibleplaybookbundle/rhscl-postgresql-apb                          canary              fd1bdf95874d        10 hours ago        1.225 GB

please help push them to stage, when verified with images on stage registry, I will move status to verified.

Thanks.

Comment 17 Weihua Meng 2017-11-16 03:06:44 UTC
Fixed.
Veirified with latest v3.7 image on redhat/openshift-ovs-multitenant.
openshift3/postgresql-apb:v3.7.7-2

mediawiki, mariaDB, mysqlDB also works fine.

Comment 20 errata-xmlrpc 2017-12-18 13:23:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3464