Bug 1572129 - ASB report error when recover in-progress mediawiki provision and deprovision jobs
Summary: ASB report error when recover in-progress mediawiki provision and deprovision...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.10.0
Assignee: Jesus M. Rodriguez
QA Contact: Zihan Tang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-26 09:03 UTC by Zihan Tang
Modified: 2018-12-20 21:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-12-20 21:12:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Zihan Tang 2018-04-26 09:03:51 UTC
Description of problem:
when redeploy asb pod , then it will recover the 'in-progress' provision jobs,  and report error logs as : 
[2018-04-26T08:10:06.73Z] [ERROR] - Unable to update the job state: 59ea6174-e5a8-4ae6-9128-ceffe7d5f331 - Operation cannot be fulfilled on jobstates.automationbroker.io "59ea6174-e5a8-4ae6-9128-ceffe7d5f331": the object has been modified; please apply your changes to the latest version and try again
[2018-04-26T08:10:06.73Z] [ERROR] - Error JobStateSubscriber failed to set state after action provision completed with state in progress err: Operation cannot be fulfilled on jobstates.automationbroker.io "59ea6174-e5a8-4ae6-9128-ceffe7d5f331": the object has been modified; please apply your changes to the latest version and try again.

but the provision job is recovered at end. 

When recover deprovision job , asb report error , and not deprovision not recovered at end.

Version-Release number of selected component (if applicable):
asb: 1.2.7

How reproducible:
always

Steps to Reproduce:
1. provision mediawiki apb
2. when it is in progress, delete asb pod.
3. the asb will start a new pod ,the provision job will recover, check the asb log
4. after the provision is recovered, then deprovision the mediawiki apb, when derpovision is in progress, delete asb pod again
5. after asb pod restart, check asb log ,jobstates, serviceinstance

Actual results:
step 3. asb logs : 
[2018-04-26T08:10:06.73Z] [ERROR] - Unable to update the job state: 59ea6174-e5a8-4ae6-9128-ceffe7d5f331 - Operation cannot be fulfilled on jobstates.automationbroker.io "59ea6174-e5a8-4ae6-9128-ceffe7d5f331": the object has been modified; please apply your changes to the latest version and try again
[2018-04-26T08:10:06.73Z] [ERROR] - Error JobStateSubscriber failed to set state after action provision completed with state in progress err: Operation cannot be fulfilled on jobstates.automationbroker.io "59ea6174-e5a8-4ae6-9128-ceffe7d5f331": the object has been modified; please apply your changes to the latest version and try again.

after few minutes, the job is recovered, 
[root@host-172-16-120-126 ~]# oc describe jobstates 59ea6174-e5a8-4ae6-9128-ceffe7d5f331
Name:         59ea6174-e5a8-4ae6-9128-ceffe7d5f331
Namespace:    openshift-ansible-service-broker
Labels:       instanceId=1e222b51-4929-11e8-ab88-0a580a800003
              state=succeeded
Annotations:  <none>
API Version:  automationbroker.io/v1
Kind:         JobState
Metadata:
  Cluster Name:        
  Creation Timestamp:  2018-04-26T08:09:20Z
  Generation:          1
  Resource Version:    42045
  Self Link:           /apis/automationbroker.io/v1/namespaces/openshift-ansible-service-broker/jobstates/59ea6174-e5a8-4ae6-9128-ceffe7d5f331
  UID:                 1fac8075-4929-11e8-a067-fa163eeb6a5a
Spec:
  Description:  provision job completed
  Error:        
  Method:       provision
  Pod Name:     
  State:        succeeded
Events:         <none>

step 5 : 
asb log:
[2018-04-26T08:39:38.899Z] [INFO] - Initiating Recovery Process
[2018-04-26T08:39:38.909Z] [INFO] - No podname. Attempting to restart job: 1e222b51-4929-11e8-ab88-0a580a800003
[2018-04-26T08:39:38.909Z] [WARNING] - Attempted to recover job 078f21e6-eb09-4a77-b6d5-c60046b7c681, but found an unrecognized MethodType: deprovision, skipping...
[2018-04-26T08:39:38.909Z] [ERROR] - invalid work topic

jobstates:
[root@host-172-16-120-126 ~]# oc describe jobstates 1e222b51-4929-11e8-ab88-0a580a800003 
Error from server (NotFound): jobstates.automationbroker.io "1e222b51-4929-11e8-ab88-0a580a800003" not found
[root@host-172-16-120-126 ~]# oc describe jobstates 078f21e6-eb09-4a77-b6d5-c60046b7c681
Name:         078f21e6-eb09-4a77-b6d5-c60046b7c681
Namespace:    openshift-ansible-service-broker
Labels:       instanceId=1e222b51-4929-11e8-ab88-0a580a800003
              state=in-progress
Annotations:  <none>
API Version:  automationbroker.io/v1
Kind:         JobState
Metadata:
  Cluster Name:        
  Creation Timestamp:  2018-04-26T08:39:29Z
  Generation:          1
  Resource Version:    45491
  Self Link:           /apis/automationbroker.io/v1/namespaces/openshift-ansible-service-broker/jobstates/078f21e6-eb09-4a77-b6d5-c60046b7c681
  UID:                 558a6dd8-492d-11e8-a067-fa163eeb6a5a
Spec:
  Description:  action started
  Error:        
  Method:       deprovision
  Pod Name:     
  State:        in-progress
Events:         <none>

[root@host-172-16-120-126 ~]# oc describe serviceinstance.servicecatalog.k8s.io rh-mediawiki-apb-6zhvn -n post-m
Name:         rh-mediawiki-apb-6zhvn
Namespace:    post-m
Labels:       <none>
Annotations:  <none>
API Version:  servicecatalog.k8s.io/v1beta1
Kind:         ServiceInstance
Metadata:
  Creation Timestamp:             2018-04-26T08:09:18Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2018-04-26T08:39:28Z
  Finalizers:
    kubernetes-incubator/service-catalog
  Generate Name:     rh-mediawiki-apb-
  Generation:        2
  Resource Version:  45544
  Self Link:         /apis/servicecatalog.k8s.io/v1beta1/namespaces/post-m/serviceinstances/rh-mediawiki-apb-6zhvn
  UID:               1e222e99-4929-11e8-ab88-0a580a800003
Spec:
  Cluster Service Class External Name:  rh-mediawiki-apb
  Cluster Service Class Ref:
    Name:                              03b69500305d9859bb9440d9f9023784
  Cluster Service Plan External Name:  default
  Cluster Service Plan Ref:
    Name:       43d3e23d214c26dbebc0879e44425db4
  External ID:  1e222b51-4929-11e8-ab88-0a580a800003
  Parameters From:
    Secret Key Ref:
      Key:          parameters
      Name:         rh-mediawiki-apb-parameterslkvbf
  Update Requests:  0
  User Info:
    Extra:
      Scopes . Authorization . Openshift . Io:
        user:full
    Groups:
      system:authenticated:oauth
      system:authenticated
    UID:       
    Username:  zitang
Status:
  Async Op In Progress:  true
  Conditions:
    Last Transition Time:  2018-04-26T08:39:28Z
    Message:               The instance is being deprovisioned asynchronously (action started)
    Reason:                Deprovisioning
    Status:                False
    Type:                  Ready
  Current Operation:       Deprovision
  Deprovision Status:      Required
  External Properties:
    Cluster Service Plan External ID:    43d3e23d214c26dbebc0879e44425db4
    Cluster Service Plan External Name:  default
    Parameter Checksum:                  35560e239ca5d4ed18cd1bea6cc0fc21d139efab17bb109ff3343fc3738342d4
    Parameters:
      Mediawiki _ Admin _ Pass:  <redacted>
      Mediawiki _ Admin _ User:  <redacted>
      Mediawiki _ Db _ Schema:   <redacted>
      Mediawiki _ Site _ Lang:   <redacted>
      Mediawiki _ Site _ Name:   <redacted>
    User Info:
      Extra:
        Scopes . Authorization . Openshift . Io:
          user:full
      Groups:
        system:authenticated:oauth
        system:authenticated
      UID:       
      Username:  zitang
  In Progress Properties:
    Cluster Service Plan External ID:    43d3e23d214c26dbebc0879e44425db4
    Cluster Service Plan External Name:  default
    Parameter Checksum:                  35560e239ca5d4ed18cd1bea6cc0fc21d139efab17bb109ff3343fc3738342d4
    Parameters:
      Mediawiki _ Admin _ Pass:  <redacted>
      Mediawiki _ Admin _ User:  <redacted>
      Mediawiki _ Db _ Schema:   <redacted>
      Mediawiki _ Site _ Lang:   <redacted>
      Mediawiki _ Site _ Name:   <redacted>
    User Info:
      Extra:
        Scopes . Authorization . Openshift . Io:
          user:full
      Groups:
        system:authenticated:oauth
        system:authenticated
      UID:                        
      Username:                   zitang
  Last Operation:                 078f21e6-eb09-4a77-b6d5-c60046b7c681
  Observed Generation:            2
  Operation Start Time:           2018-04-26T08:39:28Z
  Orphan Mitigation In Progress:  false
  Provision Status:               Provisioned
  Reconciled Generation:          1
Events:
  Type     Reason                     Age                From                                Message
  ----     ------                     ----               ----                                -------
  Warning  ErrorWithParameters        44m                service-catalog-controller-manager  failed to prepare parameters nil: secrets "rh-mediawiki-apb-parameterslkvbf" not found
  Normal   Provisioning               44m                service-catalog-controller-manager  The instance is being provisioned asynchronously
  Normal   Provisioning               44m (x5 over 44m)  service-catalog-controller-manager  The instance is being provisioned asynchronously (action started)
  Warning  ErrorPollingLastOperation  43m                service-catalog-controller-manager  Error polling last operation: Get https://asb.openshift-ansible-service-broker.svc:1338/ansible-service-broker/v2/service_instances/1e222b51-4929-11e8-ab88-0a580a800003/last_operation?operation=59ea6174-e5a8-4ae6-9128-ceffe7d5f331&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784: dial tcp 172.30.224.51:1338: getsockopt: no route to host
  Normal   ProvisionedSuccessfully    40m                service-catalog-controller-manager  The instance was provisioned successfully
  Normal   Deprovisioning             14m                service-catalog-controller-manager  The instance is being deprovisioned asynchronously
  Warning  ErrorPollingLastOperation  14m                service-catalog-controller-manager  Error polling last operation: Get https://asb.openshift-ansible-service-broker.svc:1338/ansible-service-broker/v2/service_instances/1e222b51-4929-11e8-ab88-0a580a800003/last_operation?operation=078f21e6-eb09-4a77-b6d5-c60046b7c681&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784: dial tcp 172.30.224.51:1338: getsockopt: no route to host
  Normal   Deprovisioning             5m (x8 over 14m)   service-catalog-controller-manager  The instance is being deprovisioned asynchronously (action started)


Expected results:
provision and deprovision job can recover successfully.

Additional info:

Comment 1 Zihan Tang 2018-04-26 10:18:30 UTC
the provision can not always recover successfully, sometimes is in pending status.

Status:
  Async Op In Progress:  true
  Conditions:
    Last Transition Time:  2018-04-26T10:01:54Z
    Message:               The instance is being provisioned asynchronously (action started)
    Reason:                Provisioning
    Status:                False
    Type:                  Ready
  Current Operation:       Provision

Comment 2 Jesus M. Rodriguez 2018-05-04 03:01:56 UTC
Fixed by PR 927
https://github.com/openshift/ansible-service-broker/pull/927

Comment 3 Jesus M. Rodriguez 2018-05-04 03:03:30 UTC
[2018-05-04T02:57:29.055Z] [INFO] - Initiating Recovery Process                                                                                                                                                   
[2018-05-04T02:57:29.068Z] [DEBUG] - get service instance: dea8214c-4f46-11e8-b11d-0242ac110009                                                                                                                   
[2018-05-04T02:57:29.077Z] [DEBUG] - get spec: 1dd62d51c52cc2ac404d58abc0c8fa94                                                                                                                                   
[2018-05-04T02:57:29.089Z] [INFO] - No podname. Attempting to restart job: dea8214c-4f46-11e8-b11d-0242ac110009                                                                                                   
[2018-05-04T02:57:29.089Z] [DEBUG] - &{dea8214c-4f46-11e8-b11d-0242ac110009 0xc42028f4a0 0xc4205ba0a0 0xc42048a310 map[] }                                                                                        
[2018-05-04T02:57:29.089Z] [INFO] - Recovery complete                                                                                                                                                             
[2018-05-04T02:57:29.089Z] [NOTICE] - recover called

Comment 5 Zihan Tang 2018-05-08 05:59:32 UTC
Image is ready ,change it ON_QA

Comment 6 Zihan Tang 2018-05-08 06:06:01 UTC
Verify failed.
ASB version: 1.2.10

provision job will be recovered successfully. 
deprovision job still recover failed.

asb log:
[2018-05-08T05:58:13.837Z] [INFO] - Initiating Recovery Process
[2018-05-08T05:58:13.859Z] [INFO] - No podname. Attempting to restart job: bd547602-5283-11e8-8906-0a580a80000b
[2018-05-08T05:58:13.859Z] [WARNING] - Attempted to recover job 22862f2d-5613-4ee5-9dd7-4c6f649cf13d, but found an unrecognized MethodType: deprovision, skipping...
[2018-05-08T05:58:13.86Z] [ERROR] - invalid work topic

[root@host-172-16-120-28 ~]# oc describe bundleinstance  bd547602-5283-11e8-8906-0a580a80000b
Name:         bd547602-5283-11e8-8906-0a580a80000b
Namespace:    openshift-ansible-service-broker
Labels:       <none>
Annotations:  <none>
API Version:  automationbroker.io/v1alpha1
Kind:         BundleInstance
Metadata:
  Cluster Name:        
  Creation Timestamp:  2018-05-08T05:50:41Z
  Generation:          1
  Resource Version:    130577
  Self Link:           /apis/automationbroker.io/v1alpha1/namespaces/openshift-ansible-service-broker/bundleinstances/bd547602-5283-11e8-8906-0a580a80000b
  UID:                 bdb22752-5283-11e8-8260-fa163e868f10
Spec:
  Bundle:
    Name:  03b69500305d9859bb9440d9f9023784
  Context:
    Namespace:    wiki
    Platform:     kubernetes
  Dashboard URL:  
  Parameters:     {"_apb_last_requesting_user":"zitang","_apb_plan_id":"default","_apb_service_class_id":"03b69500305d9859bb9440d9f9023784","_apb_service_instance_id":"bd547602-5283-11e8-8906-0a580a80000b","mediawiki_admin_pass":"dddd","mediawiki_admin_user":"admin","mediawiki_db_schema":"mediawiki","mediawiki_site_lang":"en","mediawiki_site_name":"MediaWiki","mediawiki_volume_size":"1Gi"}
Status:
  Bindings:
  Jobs:
    22862 F 2 D - 5613 - 4 Ee 5 - 9 Dd 7 - 4 C 6 F 649 Cf 13 D:
      Description:         action started
      Last Modified Time:  2018-05-08T05:58:03Z
      Method:              deprovision
      Podname:             
      State:               in-progress
    2 F 365 E 8 C - 25 D 0 - 4378 - Bff 5 - C 609 C 64 F 7 E 2 E:
      Description:         provision job completed
      Last Modified Time:  2018-05-08T05:52:25Z
      Method:              provision
      Podname:             
      State:               succeeded
  Last Description:        action started
  State:                   in-progress
Events:                    <none>

step:
1. deprovision mediawiki
2. delete asb pod during deprovision.

Comment 7 Jesus M. Rodriguez 2018-05-16 00:02:22 UTC
PR https://github.com/openshift/ansible-service-broker/pull/953

Comment 9 Zihan Tang 2018-05-21 02:30:42 UTC
image in upstream and downstream is still 1.2.11,  not 1.2.12 for test. 
marked as MODIFIED.

Comment 10 Zihan Tang 2018-05-22 07:52:50 UTC
image is ready ,change it to ON_QA

Comment 11 Zihan Tang 2018-05-22 08:02:20 UTC
verified.
asb : 1.2.12


Note You need to log in before you can comment on or make changes to this bug.