Bug 1129877
| Summary: | After killing the worker running a task the task status is successful | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Brian Bouterse <bmbouter> |
| Component: | Other | Assignee: | Stephen Benjamin <stbenjam> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Tazim Kolhar <tkolhar> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | Unspecified | CC: | bbuckingham, bkearney, cduryee, cwelton, ipanova, kbidarka, mhrivnak, mmccune, pulp-bugs, pulp-qe-list, stbenjam, sthirugn, tkolhar |
| Target Milestone: | Unspecified | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1120270 | Environment: | |
| Last Closed: | 2015-08-12 13:58:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1120270 | ||
| Bug Blocks: | 1145795 | ||
|
Description
Brian Bouterse
2014-08-13 20:19:43 UTC
Completing the triage of this bug and moving it to ON_QA (since it should be included as part of Snap7). This looks to be a pulp bug, please provide what needs to be tested at the satellite6, to verify this. I am assuming canceling a sync at the sync status page should show the state as "cancelled" instead of "sync complete". IS it so? @Kedar - The verification steps are to reduce Pulp to one worker: 1. Comment out the existing DEFAULT_PULP_CONCURRENCY line in /etc/init.d/pulp_workers and add: DEFAULT_PULP_CONCURRENCY=1 2. katello-service restart 3. Sync a new large repo 4. Go to the Dynflow page, and look for the worker running the task on Actions::Pulp::Repository::Sync: queue: reserved_resource_worker-0.lab.bos.redhat.com.dq 5. ps -Af | grep reserved_resource_worker-0 6. kill -9 all the processes for that worker 7. Task should go to stopped/error (Although it doesn't, the Pulp::Sync succeeds, and the task itself goes to paused/error). I am no longer able to access the task referenced in #5; however, I would generally expect it to be in a final state (e.g. stopped) with a result indicating warning or error. I agree that 'paused' state would be misleading as the user is not able to resume the task. From the behavior, it sounds like there may be a Sat6 change needed to ensure the proper state in dynflow. The task isn't available anymore because testing this bug really messed up my pulp instance for other testing and I needed to do a katello-reset (and upgrade to compose 3 anyway). We have some work to do about cleaning up after bad things happen to some of our orchestration tasks, including this one when a worker is killed -- it takes a while for us to notice, and we don't really do anything about it, just pause the task. Resuming isn't sufficient to correct the failure, either. VERIFIED: # rpm -qa | grep foreman foreman-1.7.2.25-1.el7sat.noarch ruby193-rubygem-foreman-tasks-0.6.12.5-1.el7sat.noarch ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch ibm-hs21-04.lab.bos.redhat.com-foreman-proxy-1.0-1.noarch ruby193-rubygem-foreman_docker-1.2.0.14-1.el7sat.noarch foreman-debug-1.7.2.25-1.el7sat.noarch foreman-ovirt-1.7.2.25-1.el7sat.noarch ruby193-rubygem-foreman-redhat_access-0.1.0-1.el7sat.noarch rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch rubygem-hammer_cli_foreman_docker-0.0.3.6-1.el7sat.noarch foreman-selinux-1.7.2.13-1.el7sat.noarch ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch foreman-vmware-1.7.2.25-1.el7sat.noarch ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch foreman-proxy-1.7.2.4-1.el7sat.noarch ibm-hs21-04.lab.bos.redhat.com-foreman-client-1.0-1.noarch ibm-hs21-04.lab.bos.redhat.com-foreman-proxy-client-1.0-1.noarch foreman-gce-1.7.2.25-1.el7sat.noarch rubygem-hammer_cli_foreman-0.1.4.12-1.el7sat.noarch foreman-compute-1.7.2.25-1.el7sat.noarch ruby193-rubygem-foreman_discovery-2.0.0.14-1.el7sat.noarch rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch foreman-libvirt-1.7.2.25-1.el7sat.noarch foreman-postgresql-1.7.2.25-1.el7sat.noarch steps: 1. Comment out the existing DEFAULT_PULP_CONCURRENCY line in /etc/init.d/pulp_workers and add: DEFAULT_PULP_CONCURRENCY=1 2. katello-service restart 3. Sync a new large repo 4. Go to the Dynflow page, and look for the worker running the task on Actions::Pulp::Repository::Sync: queue: reserved_resource_worker-0.lab.bos.redhat.com.dq 5. ps -Af | grep reserved_resource_worker-0 6. kill -9 all the processes for that worker 7. Task should go to stopped/error This bug is slated to be released with Satellite 6.1. This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015. |