Bug 1268080

Summary: ChangeMembersDomainOp are not cleared by oo-admin-clear-pending-ops
Product: OpenShift Container Platform Reporter: Eric Rich <erich>
Component: NodeAssignee: Abhishek Gupta <abhgupta>
Status: CLOSED ERRATA QA Contact: Jianwei Hou <jhou>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.2.0CC: adellape, aos-bugs, erich, jokerman, mbarrett, mmccomas, pep, tiwillia, xiama
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-openshift-origin-controller-1.38.4.1-1.el6op Doc Type: Bug Fix
Doc Text:
Previously, pending operations failed to run and were added back to the job queue if the parent operation did not exist. When a parent operation was missing, child operations would never complete. This bug fix ensures that if a parent operation does not exist, the discrepancy is logged and the child operation moves on. As a result, if a parent operation is deleted or is otherwise missing, the remaining child operations are still be able to be completed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-17 17:10:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1273542    

Description Eric Rich 2015-10-01 18:49:04 UTC
Description of problem:

If there are ChangeMembersDomainOp pending operations it does not seem as if these are cleared. 

Version-Release number of selected component (if applicable): 2.2

How reproducible: Intermittent

Steps to Reproduce: Unknown

Actual results:

see follow-up comment (as it contains sensitive information). 

   Summary: 
       # oo-admin-clear-pending-ops
       Failed to clear op for domain (DOMAIN) - #<ChangeMembersDomainOp _id: UUID, created_at: 2015-09-30 16:20:47 UTC, parent_op_id: nil, state: "init", queued_at: 0, completed_app_ids: nil, on_completion_method: nil, _type: "ChangeMembersDomainOp", members_added: nil, members_removed: [["UUID", "user"]], roles_changed: nil> 
       ...
      0 applications were cleaned up. 0 users were cleaned up. 1 domains were cleaned up. 0 teams were cleaned up.

Expected results:

The error thrown by (puts "Failed to clear op for domain (#{d.namespace}) - #{op.inspect} ") should not be seen (in the code below) as the operation should be deleted with ( dlist.each { |op| op.delete } or   dlist.each { |op| op.delete }) 

Additional info:

The workaround to this is to set the pending_ops for a domain to "[ ]" via mongo

$domain_count = 0
def clean_domain(d)
  $domain_count += 1
  d.reload
  dlist = d.pending_ops.select { |op| op._type.nil? }
  dlist.each { |op| op.delete }
  d.pending_ops.delete_if { |op| op.nil? }
  d.run_jobs rescue nil
  d.reload
  dlist = d.pending_ops.select { |op| op.completed? }
  dlist.each { |op| op.delete }
  d.pending_ops.each { |op|
    unless op.completed?
      puts "Failed to clear op for domain (#{d.namespace}) - #{op.inspect} "
      op.state = :queued if op.state == :init
    end
  }
  d.save!
end

Comment 14 openshift-github-bot 2015-10-20 20:13:48 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/8826681b7eca191934338dacebd819ef3dd84d74
Bug 1268080: Handling missing parent domain ops during app op execution

Comment 23 Ma xiaoqiang 2015-11-24 00:30:40 UTC
Thanks for your help. Get the expected result. Move it to VERIFIED.

Comment 25 errata-xmlrpc 2015-12-17 17:10:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2666.html