Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1129877

Summary: After killing the worker running a task the task status is successful
Product: Red Hat Satellite Reporter: Brian Bouterse <bmbouter>
Component: OtherAssignee: Stephen Benjamin <stbenjam>
Status: CLOSED CURRENTRELEASE QA Contact: Tazim Kolhar <tkolhar>
Severity: high Docs Contact:
Priority: high    
Version: UnspecifiedCC: bbuckingham, bkearney, cduryee, cwelton, ipanova, kbidarka, mhrivnak, mmccune, pulp-bugs, pulp-qe-list, stbenjam, sthirugn, tkolhar
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1120270 Environment:
Last Closed: 2015-08-12 13:58:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1120270    
Bug Blocks: 1145795    

Description Brian Bouterse 2014-08-13 20:19:43 UTC
+++ This bug was initially created as a clone of Bug #1120270 +++

Description of problem:
After killing worker running a task,the task halts immediately, and the worker dies, but the status of the task is successful.

before kill -9

Operations:  sync
Resources:   pup (repository)
State:       Running
Start Time:  2014-07-16T14:06:44Z
Finish Time: Incomplete
Task Id:     e5cc116b-9266-4afa-b144-b425bb7450cd

Operations:  sync
Resources:   pup1 (repository)
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete



right after kill -9 

Operations:  sync
Resources:   pup (repository)
State:       Successful
Start Time:  2014-07-16T14:07:36Z
Finish Time: 2014-07-16T14:07:37Z
Task Id:     e5cc116b-9266-4afa-b144-b425bb7450cd

Operations:  sync
Resources:   pup1 (repository)
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     091a2bec-5c45-4494-b367-e1370bd23398

Operations:  publish
Resources:   pup (repository)
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     0c83b413-d55d-4549-8c34-0b1f6dcb497a



after 5mins

Operations:  sync
Resources:   pup (repository)
State:       Successful
Start Time:  2014-07-16T14:07:36Z
Finish Time: 2014-07-16T14:07:37Z
Task Id:     e5cc116b-9266-4afa-b144-b425bb7450cd

Operations:  sync
Resources:   pup1 (repository)
State:       Cancelled
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     091a2bec-5c45-4494-b367-e1370bd23398

Operations:  publish
Resources:   pup (repository)
State:       Cancelled
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     0c83b413-d55d-4549-8c34-0b1f6dcb497a

Version-Release number of selected component (if applicable):
2.4.0-0.24.beta


How reproducible:
always

Steps to Reproduce:
1. have 1 worker
2. create and sync 2 repos
3. kill the worker

Actual results:
the state of the task is 'successful'

Expected results:

the state of the task is 'cancelled'
Additional info:

--- Additional comment from  on 2014-08-13 16:16:25 EDT ---

This BZ is very much related to [0], but it does claim that killed worker leaves task state incorrect in a different way (by marking it successful when it should be cancelled). Given the importance of Pulp 2.4.1 having all task states being accurate I'm adjusting the priority of this BZ to high so it can be fixed along with [0].

[0]:  https://bugzilla.redhat.com/show_bug.cgi?id=1129858

Comment 2 Brad Buckingham 2014-08-28 14:54:33 UTC
Completing the triage of this bug and moving it to ON_QA (since it should be included as part of Snap7).

Comment 3 Kedar Bidarkar 2014-09-01 09:26:51 UTC
This looks to be a pulp bug, please provide what needs to be tested at the satellite6, to verify this.

Comment 4 Kedar Bidarkar 2014-09-01 09:29:13 UTC
I am assuming canceling a sync at the sync status page should show the state as "cancelled" instead of "sync complete". IS it so?

Comment 6 Stephen Benjamin 2014-09-01 09:49:20 UTC
@Kedar - The verification steps are to reduce Pulp to one worker:

1. Comment out the existing DEFAULT_PULP_CONCURRENCY line in /etc/init.d/pulp_workers and add:
  DEFAULT_PULP_CONCURRENCY=1

2. katello-service restart

3. Sync a new large repo

4. Go to the Dynflow page, and look for the worker running the task on Actions::Pulp::Repository::Sync:

  queue: reserved_resource_worker-0.lab.bos.redhat.com.dq


5. ps -Af | grep reserved_resource_worker-0

6. kill -9 all the processes for that worker

7. Task should go to stopped/error (Although it doesn't, the Pulp::Sync succeeds, and the task itself goes to paused/error).

Comment 8 Brad Buckingham 2014-09-02 13:17:40 UTC
I am no longer able to access the task referenced in #5; however, I would generally expect it to be in a final state (e.g. stopped) with a result indicating warning or error.  I agree that 'paused' state would be misleading as the user is not able to resume the task.

From the behavior, it sounds like there may be a Sat6 change needed to ensure the proper state in dynflow.

Comment 9 Stephen Benjamin 2014-09-02 13:32:06 UTC
The task isn't available anymore because testing this bug really messed up my pulp instance for other testing and I needed to do a katello-reset (and upgrade to compose 3 anyway).

We have some work to do about cleaning up after bad things happen to some of our orchestration tasks, including this one when a worker is killed -- it takes a while for us to notice, and we don't really do anything about it, just pause the task.  Resuming isn't sufficient to correct the failure, either.

Comment 16 Tazim Kolhar 2015-06-02 09:07:37 UTC
VERIFIED:
# rpm -qa | grep foreman
foreman-1.7.2.25-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.12.5-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
ibm-hs21-04.lab.bos.redhat.com-foreman-proxy-1.0-1.noarch
ruby193-rubygem-foreman_docker-1.2.0.14-1.el7sat.noarch
foreman-debug-1.7.2.25-1.el7sat.noarch
foreman-ovirt-1.7.2.25-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.1.0-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.6-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
foreman-vmware-1.7.2.25-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-proxy-1.7.2.4-1.el7sat.noarch
ibm-hs21-04.lab.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-hs21-04.lab.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-gce-1.7.2.25-1.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.12-1.el7sat.noarch
foreman-compute-1.7.2.25-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.14-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
foreman-libvirt-1.7.2.25-1.el7sat.noarch
foreman-postgresql-1.7.2.25-1.el7sat.noarch

steps:
1. Comment out the existing DEFAULT_PULP_CONCURRENCY line in /etc/init.d/pulp_workers and add:
  DEFAULT_PULP_CONCURRENCY=1
2. katello-service restart
3. Sync a new large repo
4. Go to the Dynflow page, and look for the worker running the task on Actions::Pulp::Repository::Sync:

  queue: reserved_resource_worker-0.lab.bos.redhat.com.dq
5. ps -Af | grep reserved_resource_worker-0
6. kill -9 all the processes for that worker
7. Task should go to stopped/error

Comment 17 Bryan Kearney 2015-08-11 13:23:53 UTC
This bug is slated to be released with Satellite 6.1.

Comment 18 Bryan Kearney 2015-08-12 13:58:54 UTC
This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.