Bug 1504927 - ASB Failed provision marked successful even on pod error
Summary: ASB Failed provision marked successful even on pod error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.7.0
Assignee: Jesus M. Rodriguez
QA Contact: Zhang Cheng
URL:
Whiteboard:
: 1470851 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-20 19:09 UTC by David Zager
Modified: 2017-11-28 22:18 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Ignoring errors from the provisioning of APBs. Consequence: Failed provisions are marked as successful. Fix: We propagate the error to the appropriate level in the broker, which will send the error message to the service-catalog. Result: Failed provisions will now be marked as failed.
Clone Of:
Environment:
Last Closed: 2017-11-28 22:18:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description David Zager 2017-10-20 19:09:54 UTC
Description of problem:

When provisioning an APB that fails on provision, the ansible service broker still marks the provision as successful.

How reproducible: Always


Steps to Reproduce:
1. Bring up OpenShift cluster + Service Catalog + Ansible Service Broker (broker in developer mode)
2. Push APB that will fail on provision (https://github.com/djzager/fail-apb)
3. Provision APB

Actual results:

## Pod Status

$ oc get pods -n apb-push-fail-apb-prov-vfcv2
NAME                                       READY     STATUS    RESTARTS   AGE
apb-5fd7da56-c557-4263-b142-600966964fd2   0/1       Error     0          2m

## Broker

[2017-10-20T19:04:44.838Z] [NOTICE] [apb-5fd7da56-c557-4263-b142-600966964fd2] APB completed


## Service Catalog

I1020 19:04:51.514189      61 controller_instance.go:1182] ServiceInstance "testproject/apb-push-fail-apb-7dgtx": Polling last operation
I1020 19:04:51.549832      61 controller_instance.go:1315] ServiceInstance "testproject/apb-push-fail-apb-7dgtx": Poll returned "succeeded" : %!q(*string=<nil>)


Expected results:

Broker should report failure of provision.

Comment 1 Jesus M. Rodriguez 2017-10-21 04:33:32 UTC
Seems we are allowing an error to be marked as done:

https://github.com/openshift/ansible-service-broker/blob/master/pkg/apb/ext_creds.go#L70-L72

		} else if podCompleted && err.Error() == failedToExec.Error() {
			log.Notice("[%s] APB completed", podname)
return nil, nil

So we need to decide what's really a "failed" provision or not.

Comment 2 Jesus M. Rodriguez 2017-10-21 05:03:45 UTC
So the ext_creds.go tries to exec into the failed pod:

$ oc exec apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52
b6 broker-bind-creds --namespace=dh-fail-apb-prov-gm9w1                                       
error: cannot exec into a container in a completed pod; current phase is Failed

And looking at ext_creds, we will mark the podCompleted 

podCompleted := strings.Contains(string(output), "current phase is Succeeded") ||      
    strings.Contains(string(output), "cannot exec into a container in a completed pod")

Notice when I ran oc exec I get "completed pod". So this combined with the if condition from comment #1 we will end up marking it successful.

Comment 3 Jesus M. Rodriguez 2017-10-21 05:15:24 UTC
I believe the problem is that we get this same condition when we are waiting for the pod to come up initially while it is still provisioning. So we'll have to balance that with an actual failed provision.

Comment 4 Jesus M. Rodriguez 2017-10-21 14:40:10 UTC
[2017-10-21T04:46:47.425Z] [INFO] Successfully created apb sandbox: [ apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6 ], with edit permissions in namespace dh-fail-apb-prov-gm9w1
[2017-10-21T04:46:47.426Z] [NOTICE] Creating pod "apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6" in the dh-fail-apb-prov-gm9w1 namespace
[2017-10-21T04:46:47.485Z] [DEBUG] Calling monitorOutput on apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6
[2017-10-21T04:46:47.772Z] [INFO] error: unable to upgrade connection: pod does not exist

[2017-10-21T04:46:47.772Z] [WARNING] [apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6] Retry attempt 1: Failed to exec into the container
[2017-10-21T04:46:47.772Z] [WARNING] [apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6] Retry attempt 1: exec into apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6 failed
[2017-10-21T04:46:51.258Z] [DEBUG] service_id: 3b18d5937f0cbb99ee6693993a525be7
[2017-10-21T04:46:51.259Z] [DEBUG] plan_id: e969ce59fdde47f0fda9c871a5e2ef0b
[2017-10-21T04:46:51.259Z] [DEBUG] operation:  61d15ee4-fe22-4ad9-b044-c325e9123a5d
172.17.0.3 - - [21/Oct/2017:04:46:51 +0000] "GET /ansible-service-broker/v2/service_instances/1fb8eff3-e365-46cc-af40-9891df0811f9/last_operation?operation=61d15ee4-fe22-4ad9-b044-c325e9123a
5d&plan_id=e969ce59fdde47f0fda9c871a5e2ef0b&service_id=3b18d5937f0cbb99ee6693993a525be7 HTTP/1.1" 200 29
[2017-10-21T04:46:52.997Z] [NOTICE] [apb-7bcfd4a5-8e15-4883-8cb7-39d2511b52b6] APB completed
[2017-10-21T04:46:52.998Z] [INFO] Destroying APB sandbox...

Comment 5 Jesus M. Rodriguez 2017-11-02 21:44:24 UTC
Fixed by PR 534: https://github.com/openshift/ansible-service-broker/pull/534

Comment 6 Jesus M. Rodriguez 2017-11-03 13:33:34 UTC
*** Bug 1470851 has been marked as a duplicate of this bug. ***

Comment 8 Zhang Cheng 2017-11-04 08:37:25 UTC
PR was not merged in latest image v3.7.0-0.191.0.0, I will retest while PR is merged.

Comment 9 Zhang Cheng 2017-11-05 15:46:51 UTC
Verified and LGTM with svc-catalog & asb image v3.7.0-0.194.0.0

Logs in asb:
[2017-11-05T15:42:55.213Z] [NOTICE] [apb-5fce1aec-0625-4e34-bca9-1bd968361774] APB failed
[2017-11-05T15:42:55.213Z] [INFO] Destroying APB sandbox...
[2017-11-05T15:42:55.584Z] [ERROR] apb::provision error occurred
[2017-11-05T15:42:55.584Z] [ERROR] APB failed
[2017-11-05T15:42:55.584Z] [ERROR] broker::Provision error occurred.
[2017-11-05T15:42:55.584Z] [ERROR] APB failed
[2017-11-05T15:42:55.584Z] [ERROR] Provision job reporting error: APB failed

Logs in controller-manager:
I1105 15:41:34.708240       1 controller.go:326] ServiceInstance "fail-prov/rh-fail-apb-v8vdd": Creating client for ClusterServiceBroker ansible-service-broker, URL: https://asb.openshift-ansible-service-broker.svc:1338/ansible-service-broker
I1105 15:41:34.708470       1 controller_instance.go:1174] ServiceInstance "fail-prov/rh-fail-apb-v8vdd": Polling last operation
I1105 15:41:34.725917       1 controller_instance.go:1303] ServiceInstance "fail-prov/rh-fail-apb-v8vdd": Poll returned "in progress" : Response description: <nil>
I1105 15:41:34.725951       1 controller_instance.go:1406] ServiceInstance "fail-prov/rh-fail-apb-v8vdd": Last operation not completed (still in progress)

Comment 12 errata-xmlrpc 2017-11-28 22:18:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.