Description of problem: when redeploy asb pod , then it will recover the 'in-progress' provision jobs, and report error logs as : [2018-04-26T08:10:06.73Z] [ERROR] - Unable to update the job state: 59ea6174-e5a8-4ae6-9128-ceffe7d5f331 - Operation cannot be fulfilled on jobstates.automationbroker.io "59ea6174-e5a8-4ae6-9128-ceffe7d5f331": the object has been modified; please apply your changes to the latest version and try again [2018-04-26T08:10:06.73Z] [ERROR] - Error JobStateSubscriber failed to set state after action provision completed with state in progress err: Operation cannot be fulfilled on jobstates.automationbroker.io "59ea6174-e5a8-4ae6-9128-ceffe7d5f331": the object has been modified; please apply your changes to the latest version and try again. but the provision job is recovered at end. When recover deprovision job , asb report error , and not deprovision not recovered at end. Version-Release number of selected component (if applicable): asb: 1.2.7 How reproducible: always Steps to Reproduce: 1. provision mediawiki apb 2. when it is in progress, delete asb pod. 3. the asb will start a new pod ,the provision job will recover, check the asb log 4. after the provision is recovered, then deprovision the mediawiki apb, when derpovision is in progress, delete asb pod again 5. after asb pod restart, check asb log ,jobstates, serviceinstance Actual results: step 3. asb logs : [2018-04-26T08:10:06.73Z] [ERROR] - Unable to update the job state: 59ea6174-e5a8-4ae6-9128-ceffe7d5f331 - Operation cannot be fulfilled on jobstates.automationbroker.io "59ea6174-e5a8-4ae6-9128-ceffe7d5f331": the object has been modified; please apply your changes to the latest version and try again [2018-04-26T08:10:06.73Z] [ERROR] - Error JobStateSubscriber failed to set state after action provision completed with state in progress err: Operation cannot be fulfilled on jobstates.automationbroker.io "59ea6174-e5a8-4ae6-9128-ceffe7d5f331": the object has been modified; please apply your changes to the latest version and try again. after few minutes, the job is recovered, [root@host-172-16-120-126 ~]# oc describe jobstates 59ea6174-e5a8-4ae6-9128-ceffe7d5f331 Name: 59ea6174-e5a8-4ae6-9128-ceffe7d5f331 Namespace: openshift-ansible-service-broker Labels: instanceId=1e222b51-4929-11e8-ab88-0a580a800003 state=succeeded Annotations: <none> API Version: automationbroker.io/v1 Kind: JobState Metadata: Cluster Name: Creation Timestamp: 2018-04-26T08:09:20Z Generation: 1 Resource Version: 42045 Self Link: /apis/automationbroker.io/v1/namespaces/openshift-ansible-service-broker/jobstates/59ea6174-e5a8-4ae6-9128-ceffe7d5f331 UID: 1fac8075-4929-11e8-a067-fa163eeb6a5a Spec: Description: provision job completed Error: Method: provision Pod Name: State: succeeded Events: <none> step 5 : asb log: [2018-04-26T08:39:38.899Z] [INFO] - Initiating Recovery Process [2018-04-26T08:39:38.909Z] [INFO] - No podname. Attempting to restart job: 1e222b51-4929-11e8-ab88-0a580a800003 [2018-04-26T08:39:38.909Z] [WARNING] - Attempted to recover job 078f21e6-eb09-4a77-b6d5-c60046b7c681, but found an unrecognized MethodType: deprovision, skipping... [2018-04-26T08:39:38.909Z] [ERROR] - invalid work topic jobstates: [root@host-172-16-120-126 ~]# oc describe jobstates 1e222b51-4929-11e8-ab88-0a580a800003 Error from server (NotFound): jobstates.automationbroker.io "1e222b51-4929-11e8-ab88-0a580a800003" not found [root@host-172-16-120-126 ~]# oc describe jobstates 078f21e6-eb09-4a77-b6d5-c60046b7c681 Name: 078f21e6-eb09-4a77-b6d5-c60046b7c681 Namespace: openshift-ansible-service-broker Labels: instanceId=1e222b51-4929-11e8-ab88-0a580a800003 state=in-progress Annotations: <none> API Version: automationbroker.io/v1 Kind: JobState Metadata: Cluster Name: Creation Timestamp: 2018-04-26T08:39:29Z Generation: 1 Resource Version: 45491 Self Link: /apis/automationbroker.io/v1/namespaces/openshift-ansible-service-broker/jobstates/078f21e6-eb09-4a77-b6d5-c60046b7c681 UID: 558a6dd8-492d-11e8-a067-fa163eeb6a5a Spec: Description: action started Error: Method: deprovision Pod Name: State: in-progress Events: <none> [root@host-172-16-120-126 ~]# oc describe serviceinstance.servicecatalog.k8s.io rh-mediawiki-apb-6zhvn -n post-m Name: rh-mediawiki-apb-6zhvn Namespace: post-m Labels: <none> Annotations: <none> API Version: servicecatalog.k8s.io/v1beta1 Kind: ServiceInstance Metadata: Creation Timestamp: 2018-04-26T08:09:18Z Deletion Grace Period Seconds: 0 Deletion Timestamp: 2018-04-26T08:39:28Z Finalizers: kubernetes-incubator/service-catalog Generate Name: rh-mediawiki-apb- Generation: 2 Resource Version: 45544 Self Link: /apis/servicecatalog.k8s.io/v1beta1/namespaces/post-m/serviceinstances/rh-mediawiki-apb-6zhvn UID: 1e222e99-4929-11e8-ab88-0a580a800003 Spec: Cluster Service Class External Name: rh-mediawiki-apb Cluster Service Class Ref: Name: 03b69500305d9859bb9440d9f9023784 Cluster Service Plan External Name: default Cluster Service Plan Ref: Name: 43d3e23d214c26dbebc0879e44425db4 External ID: 1e222b51-4929-11e8-ab88-0a580a800003 Parameters From: Secret Key Ref: Key: parameters Name: rh-mediawiki-apb-parameterslkvbf Update Requests: 0 User Info: Extra: Scopes . Authorization . Openshift . Io: user:full Groups: system:authenticated:oauth system:authenticated UID: Username: zitang Status: Async Op In Progress: true Conditions: Last Transition Time: 2018-04-26T08:39:28Z Message: The instance is being deprovisioned asynchronously (action started) Reason: Deprovisioning Status: False Type: Ready Current Operation: Deprovision Deprovision Status: Required External Properties: Cluster Service Plan External ID: 43d3e23d214c26dbebc0879e44425db4 Cluster Service Plan External Name: default Parameter Checksum: 35560e239ca5d4ed18cd1bea6cc0fc21d139efab17bb109ff3343fc3738342d4 Parameters: Mediawiki _ Admin _ Pass: <redacted> Mediawiki _ Admin _ User: <redacted> Mediawiki _ Db _ Schema: <redacted> Mediawiki _ Site _ Lang: <redacted> Mediawiki _ Site _ Name: <redacted> User Info: Extra: Scopes . Authorization . Openshift . Io: user:full Groups: system:authenticated:oauth system:authenticated UID: Username: zitang In Progress Properties: Cluster Service Plan External ID: 43d3e23d214c26dbebc0879e44425db4 Cluster Service Plan External Name: default Parameter Checksum: 35560e239ca5d4ed18cd1bea6cc0fc21d139efab17bb109ff3343fc3738342d4 Parameters: Mediawiki _ Admin _ Pass: <redacted> Mediawiki _ Admin _ User: <redacted> Mediawiki _ Db _ Schema: <redacted> Mediawiki _ Site _ Lang: <redacted> Mediawiki _ Site _ Name: <redacted> User Info: Extra: Scopes . Authorization . Openshift . Io: user:full Groups: system:authenticated:oauth system:authenticated UID: Username: zitang Last Operation: 078f21e6-eb09-4a77-b6d5-c60046b7c681 Observed Generation: 2 Operation Start Time: 2018-04-26T08:39:28Z Orphan Mitigation In Progress: false Provision Status: Provisioned Reconciled Generation: 1 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ErrorWithParameters 44m service-catalog-controller-manager failed to prepare parameters nil: secrets "rh-mediawiki-apb-parameterslkvbf" not found Normal Provisioning 44m service-catalog-controller-manager The instance is being provisioned asynchronously Normal Provisioning 44m (x5 over 44m) service-catalog-controller-manager The instance is being provisioned asynchronously (action started) Warning ErrorPollingLastOperation 43m service-catalog-controller-manager Error polling last operation: Get https://asb.openshift-ansible-service-broker.svc:1338/ansible-service-broker/v2/service_instances/1e222b51-4929-11e8-ab88-0a580a800003/last_operation?operation=59ea6174-e5a8-4ae6-9128-ceffe7d5f331&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784: dial tcp 172.30.224.51:1338: getsockopt: no route to host Normal ProvisionedSuccessfully 40m service-catalog-controller-manager The instance was provisioned successfully Normal Deprovisioning 14m service-catalog-controller-manager The instance is being deprovisioned asynchronously Warning ErrorPollingLastOperation 14m service-catalog-controller-manager Error polling last operation: Get https://asb.openshift-ansible-service-broker.svc:1338/ansible-service-broker/v2/service_instances/1e222b51-4929-11e8-ab88-0a580a800003/last_operation?operation=078f21e6-eb09-4a77-b6d5-c60046b7c681&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784: dial tcp 172.30.224.51:1338: getsockopt: no route to host Normal Deprovisioning 5m (x8 over 14m) service-catalog-controller-manager The instance is being deprovisioned asynchronously (action started) Expected results: provision and deprovision job can recover successfully. Additional info:
the provision can not always recover successfully, sometimes is in pending status. Status: Async Op In Progress: true Conditions: Last Transition Time: 2018-04-26T10:01:54Z Message: The instance is being provisioned asynchronously (action started) Reason: Provisioning Status: False Type: Ready Current Operation: Provision
Fixed by PR 927 https://github.com/openshift/ansible-service-broker/pull/927
[2018-05-04T02:57:29.055Z] [INFO] - Initiating Recovery Process [2018-05-04T02:57:29.068Z] [DEBUG] - get service instance: dea8214c-4f46-11e8-b11d-0242ac110009 [2018-05-04T02:57:29.077Z] [DEBUG] - get spec: 1dd62d51c52cc2ac404d58abc0c8fa94 [2018-05-04T02:57:29.089Z] [INFO] - No podname. Attempting to restart job: dea8214c-4f46-11e8-b11d-0242ac110009 [2018-05-04T02:57:29.089Z] [DEBUG] - &{dea8214c-4f46-11e8-b11d-0242ac110009 0xc42028f4a0 0xc4205ba0a0 0xc42048a310 map[] } [2018-05-04T02:57:29.089Z] [INFO] - Recovery complete [2018-05-04T02:57:29.089Z] [NOTICE] - recover called
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15985275
Image is ready ,change it ON_QA
Verify failed. ASB version: 1.2.10 provision job will be recovered successfully. deprovision job still recover failed. asb log: [2018-05-08T05:58:13.837Z] [INFO] - Initiating Recovery Process [2018-05-08T05:58:13.859Z] [INFO] - No podname. Attempting to restart job: bd547602-5283-11e8-8906-0a580a80000b [2018-05-08T05:58:13.859Z] [WARNING] - Attempted to recover job 22862f2d-5613-4ee5-9dd7-4c6f649cf13d, but found an unrecognized MethodType: deprovision, skipping... [2018-05-08T05:58:13.86Z] [ERROR] - invalid work topic [root@host-172-16-120-28 ~]# oc describe bundleinstance bd547602-5283-11e8-8906-0a580a80000b Name: bd547602-5283-11e8-8906-0a580a80000b Namespace: openshift-ansible-service-broker Labels: <none> Annotations: <none> API Version: automationbroker.io/v1alpha1 Kind: BundleInstance Metadata: Cluster Name: Creation Timestamp: 2018-05-08T05:50:41Z Generation: 1 Resource Version: 130577 Self Link: /apis/automationbroker.io/v1alpha1/namespaces/openshift-ansible-service-broker/bundleinstances/bd547602-5283-11e8-8906-0a580a80000b UID: bdb22752-5283-11e8-8260-fa163e868f10 Spec: Bundle: Name: 03b69500305d9859bb9440d9f9023784 Context: Namespace: wiki Platform: kubernetes Dashboard URL: Parameters: {"_apb_last_requesting_user":"zitang","_apb_plan_id":"default","_apb_service_class_id":"03b69500305d9859bb9440d9f9023784","_apb_service_instance_id":"bd547602-5283-11e8-8906-0a580a80000b","mediawiki_admin_pass":"dddd","mediawiki_admin_user":"admin","mediawiki_db_schema":"mediawiki","mediawiki_site_lang":"en","mediawiki_site_name":"MediaWiki","mediawiki_volume_size":"1Gi"} Status: Bindings: Jobs: 22862 F 2 D - 5613 - 4 Ee 5 - 9 Dd 7 - 4 C 6 F 649 Cf 13 D: Description: action started Last Modified Time: 2018-05-08T05:58:03Z Method: deprovision Podname: State: in-progress 2 F 365 E 8 C - 25 D 0 - 4378 - Bff 5 - C 609 C 64 F 7 E 2 E: Description: provision job completed Last Modified Time: 2018-05-08T05:52:25Z Method: provision Podname: State: succeeded Last Description: action started State: in-progress Events: <none> step: 1. deprovision mediawiki 2. delete asb pod during deprovision.
PR https://github.com/openshift/ansible-service-broker/pull/953
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=695427
image in upstream and downstream is still 1.2.11, not 1.2.12 for test. marked as MODIFIED.
image is ready ,change it to ON_QA
verified. asb : 1.2.12