Bug 1504927

Summary: ASB Failed provision marked successful even on pod error
Product: OpenShift Container Platform Reporter: David Zager <dzager>
Component: Service BrokerAssignee: Jesus M. Rodriguez <jesusr>
Status: CLOSED ERRATA QA Contact: Zhang Cheng <chezhang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.0CC: aos-bugs, jesusr, shurley, wmeng
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Ignoring errors from the provisioning of APBs. Consequence: Failed provisions are marked as successful. Fix: We propagate the error to the appropriate level in the broker, which will send the error message to the service-catalog. Result: Failed provisions will now be marked as failed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:18:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Zager 2017-10-20 19:09:54 UTC
Description of problem:

When provisioning an APB that fails on provision, the ansible service broker still marks the provision as successful.

How reproducible: Always


Steps to Reproduce:
1. Bring up OpenShift cluster + Service Catalog + Ansible Service Broker (broker in developer mode)
2. Push APB that will fail on provision (https://github.com/djzager/fail-apb)
3. Provision APB

Actual results:

## Pod Status

$ oc get pods -n apb-push-fail-apb-prov-vfcv2
NAME                                       READY     STATUS    RESTARTS   AGE
apb-5fd7da56-c557-4263-b142-600966964fd2   0/1       Error     0          2m

## Broker

[2017-10-20T19:04:44.838Z] [NOTICE] [apb-5fd7da56-c557-4263-b142-600966964fd2] APB completed


## Service Catalog

I1020 19:04:51.514189      61 controller_instance.go:1182] ServiceInstance "testproject/apb-push-fail-apb-7dgtx": Polling last operation
I1020 19:04:51.549832      61 controller_instance.go:1315] ServiceInstance "testproject/apb-push-fail-apb-7dgtx": Poll returned "succeeded" : %!q(*string=<nil>)


Expected results:

Broker should report failure of provision.

Comment 1 Jesus M. Rodriguez 2017-10-21 04:33:32 UTC
Seems we are allowing an error to be marked as done:

https://github.com/openshift/ansible-service-broker/blob/master/pkg/apb/ext_creds.go#L70-L72

		} else if podCompleted && err.Error() == failedToExec.Error() {
			log.Notice("[%s] APB completed", podname)
return nil, nil

So we need to decide what's really a "failed" provision or not.

Comment 2 Jesus M. Rodriguez 2017-10-21 05:03:45 UTC
So the ext_creds.go tries to exec into the failed pod:

$ oc exec apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52
b6 broker-bind-creds --namespace=dh-fail-apb-prov-gm9w1                                       
error: cannot exec into a container in a completed pod; current phase is Failed

And looking at ext_creds, we will mark the podCompleted 

podCompleted := strings.Contains(string(output), "current phase is Succeeded") ||      
    strings.Contains(string(output), "cannot exec into a container in a completed pod")

Notice when I ran oc exec I get "completed pod". So this combined with the if condition from comment #1 we will end up marking it successful.

Comment 3 Jesus M. Rodriguez 2017-10-21 05:15:24 UTC
I believe the problem is that we get this same condition when we are waiting for the pod to come up initially while it is still provisioning. So we'll have to balance that with an actual failed provision.

Comment 4 Jesus M. Rodriguez 2017-10-21 14:40:10 UTC
[2017-10-21T04:46:47.425Z] [INFO] Successfully created apb sandbox: [ apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6 ], with edit permissions in namespace dh-fail-apb-prov-gm9w1
[2017-10-21T04:46:47.426Z] [NOTICE] Creating pod "apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6" in the dh-fail-apb-prov-gm9w1 namespace
[2017-10-21T04:46:47.485Z] [DEBUG] Calling monitorOutput on apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6
[2017-10-21T04:46:47.772Z] [INFO] error: unable to upgrade connection: pod does not exist

[2017-10-21T04:46:47.772Z] [WARNING] [apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6] Retry attempt 1: Failed to exec into the container
[2017-10-21T04:46:47.772Z] [WARNING] [apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6] Retry attempt 1: exec into apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6 failed
[2017-10-21T04:46:51.258Z] [DEBUG] service_id: 3b18d5937f0cbb99ee6693993a525be7
[2017-10-21T04:46:51.259Z] [DEBUG] plan_id: e969ce59fdde47f0fda9c871a5e2ef0b
[2017-10-21T04:46:51.259Z] [DEBUG] operation:  61d15ee4-fe22-4ad9-b044-c325e9123a5d
172.17.0.3 - - [21/Oct/2017:04:46:51 +0000] "GET /ansible-service-broker/v2/service_instances/1fb8eff3-e365-46cc-af40-9891df0811f9/last_operation?operation=61d15ee4-fe22-4ad9-b044-c325e9123a
5d&plan_id=e969ce59fdde47f0fda9c871a5e2ef0b&service_id=3b18d5937f0cbb99ee6693993a525be7 HTTP/1.1" 200 29
[2017-10-21T04:46:52.997Z] [NOTICE] [apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6] APB completed
[2017-10-21T04:46:52.998Z] [INFO] Destroying APB sandbox...

Comment 5 Jesus M. Rodriguez 2017-11-02 21:44:24 UTC
Fixed by PR 534: https://github.com/openshift/ansible-service-broker/pull/534

Comment 6 Jesus M. Rodriguez 2017-11-03 13:33:34 UTC
*** Bug 1470851 has been marked as a duplicate of this bug. ***

Comment 8 Zhang Cheng 2017-11-04 08:37:25 UTC
PR was not merged in latest image v3.7.0-0.191.0.0, I will retest while PR is merged.

Comment 9 Zhang Cheng 2017-11-05 15:46:51 UTC
Verified and LGTM with svc-catalog & asb image v3.7.0-0.194.0.0

Logs in asb:
[2017-11-05T15:42:55.213Z] [NOTICE] [apb-5fce1aec-0625-4e34-bca9-1bd968361774] APB failed
[2017-11-05T15:42:55.213Z] [INFO] Destroying APB sandbox...
[2017-11-05T15:42:55.584Z] [ERROR] apb::provision error occurred
[2017-11-05T15:42:55.584Z] [ERROR] APB failed
[2017-11-05T15:42:55.584Z] [ERROR] broker::Provision error occurred.
[2017-11-05T15:42:55.584Z] [ERROR] APB failed
[2017-11-05T15:42:55.584Z] [ERROR] Provision job reporting error: APB failed

Logs in controller-manager:
I1105 15:41:34.708240       1 controller.go:326] ServiceInstance "fail-prov/rh-fail-apb-v8vdd": Creating client for ClusterServiceBroker ansible-service-broker, URL: https://asb.openshift-ansible-service-broker.svc:1338/ansible-service-broker
I1105 15:41:34.708470       1 controller_instance.go:1174] ServiceInstance "fail-prov/rh-fail-apb-v8vdd": Polling last operation
I1105 15:41:34.725917       1 controller_instance.go:1303] ServiceInstance "fail-prov/rh-fail-apb-v8vdd": Poll returned "in progress" : Response description: <nil>
I1105 15:41:34.725951       1 controller_instance.go:1406] ServiceInstance "fail-prov/rh-fail-apb-v8vdd": Last operation not completed (still in progress)

Comment 12 errata-xmlrpc 2017-11-28 22:18:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188