Bug 1651367
| Summary: | Actions::Candlepin::ListenOnCandlepinEvents occasionally not starting after unclean shutdown of the executor | |||
|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Ivan Necas <inecas> | |
| Component: | Tasks Plugin | Assignee: | Adam Ruzicka <aruzicka> | |
| Status: | CLOSED ERRATA | QA Contact: | jcallaha | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 6.4 | CC: | andrew.schofield, aruzicka, avroy, inecas, ktordeur, mmccune, pcreech, pmoravec, rbertolj | |
| Target Milestone: | 6.5.0 | Keywords: | Triaged | |
| Target Release: | Unused | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | tfm-rubygem-dynflow-1.1.3-1 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1666893 (view as bug list) | Environment: | ||
| Last Closed: | 2019-05-14 12:38:54 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1666893 | |||
Created redmine issue http://projects.theforeman.org/issues/25526 from this bug To recapitulate, if an execution plan is deleted while it still holds the singleton action lock (needed for LOCE and EQM), the singleton action lock will not be unlocked during invalidation. This leads to LOCE and EQM not starting on dynflow executor restart. New steps to reproduce based on previous findings: 1. force-kill the dynflow_executor 2. Find which execution plan holds the singleton lock and delete the execution plan 3. systemctl restart foreman-tasks /sbin/service foreman-tasks stop
/sbin/foreman-rake foreman_tasks:cleanup:run TASK_SEARCH='label=Actions::Candlepin::ListenOnCandlepinEvents' STATES="running,paused"
/sbin/foreman-rake foreman_tasks:cleanup:run TASK_SEARCH='label=Actions::Katello::EventQueue::Monitor' STATES="running,paused"
#Delete the action locks
/bin/echo "delete from dynflow_coordinator_records where class = 'Dynflow::Coordinator::SingletonActionLock';" | /bin/sudo -u postgres psql -d foreman
#start foreman-tasks
/bin/systemctl start foreman-tasks
To determine the system is hitting the issue: service foreman-tasks stop /bin/echo "select * from dynflow_coordinator_records where class = 'Dynflow::Coordinator::SingletonActionLock';" | /bin/sudo -u postgres psql -d foreman If it's returning non-empty table, it most probably means the issue has been hit and https://bugzilla.redhat.com/show_bug.cgi?id=1651367#c8 should be applied Verified in Satellite 6.5.0 Snap 11 Followed the reproducer steps -bash-4.2# ps -aux | grep dynflow foreman+ 14365 0.0 0.2 1961072 53392 ? Sl 20:49 0:02 ruby /usr/bin/smart_proxy_dynflow_core -d -p /var/run/foreman-proxy/smart_proxy_dynflow_core.pid foreman 14520 2.7 2.5 3236688 635688 ? Sl 20:49 2:29 dynflow_executor foreman 14523 0.0 0.4 504352 117068 ? Sl 20:49 0:00 dynflow_executor_monitor root 32069 0.0 0.0 112712 980 pts/0 S+ 22:21 0:00 grep --color=auto dynflow -bash-4.2# -bash-4.2# kill 14520 -bash-4.2# -bash-4.2# ps -aux | grep dynflow foreman+ 14365 0.0 0.2 1961072 53392 ? Sl 20:49 0:02 ruby /usr/bin/smart_proxy_dynflow_core -d -p /var/run/foreman-proxy/smart_proxy_dynflow_core.pid foreman 14523 0.0 0.4 504352 117068 ? Sl 20:49 0:00 dynflow_executor_monitor root 32090 0.0 0.0 112712 980 pts/0 S+ 22:21 0:00 grep --color=auto dynflow -bash-4.2# -bash-4.2# systemctl restart foreman-tasks -bash-4.2# After the restart, the listen on candlepin task is shown as running, as well as the monitor event queue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:1222 |
Description of problem: In some occasion (not having a reliable reproducer at the moment), there can be an invalid lock records be present, preventing the ListenOnCandlepinEvents task to not start after the foreman-tasks is stopped Version-Release number of selected component (if applicable): How reproducible: occasionally Steps to Reproduce: 1. force-kill the dynflow_executor 2. systemctl restart foreman-tasks Actual results: the ListenOnCandlepinEvents not getting started Expected results: The task gets started Additional info: One indicator is, when the foreman-tasks service is stopped, the following query still returns some results (even though it should not): select * from dynflow_coordinator_records where class = 'Dynflow::Coordinator::SingletonActionLock'; singleton-action:Actions::Candlepin::ListenOnCandlepinEvents | Dynflow::Coordinator::SingletonActionLock | execution-plan:f16e072d-04e4-4b88-9db7-a9349bc7476c | {"class":" Dynflow::Coordinator::SingletonActionLock","owner_id":"execution-plan:f16e072d-04e4-4b88-9db7-a9349bc7476c","execution_plan_id":"f16e072d-04e4-4b88-9db7-a9349bc7476c","id": "singleton-action:Actions::Candlepin::ListenOnCandlepinEvents"} singleton-action:Actions::Katello::EventQueue::Monitor | Dynflow::Coordinator::SingletonActionLock | execution-plan:b340536b-5b55-4891-ad7a-b65a012bf5ff | {"class":" Dynflow::Coordinator::SingletonActionLock","owner_id":"execution-plan:b340536b-5b55-4891-ad7a-b65a012bf5ff","execution_plan_id":"b340536b-5b55-4891-ad7a-b65a012bf5ff","id": "singleton-action:Actions::Katello::EventQueue::Monitor"} Seems related to https://bugzilla.redhat.com/show_bug.cgi?id=1642369