Bug 1203050
| Summary: | "Generate applicability" tasks pile up (pulp?) | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Corey Welton <cwelton> |
| Component: | Pulp | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | Katello QA List <katello-qa-list> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.2.9 | CC: | adrianol.redhat, arydekul, bbuckingham, bkearney, cwelton, dzhukous, egolov, ekin.meroglu, elavarde, inecas, jhunt, jnikolak, jorgen.langgat, katello-bugs, ktrufano, mhrivnak, mmccune, nicholas.tian, pgozart, rhbgs.10.bigi_gigi, sauchter, schamilt, stbenjam, ttereshc, xdmoon |
| Target Milestone: | Unspecified | Keywords: | Reopened, Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-09-27 19:53:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Action:
Actions::Pulp::Consumer::GenerateApplicability
Input:
{"uuids"=>["805890e3-97f2-4938-9a1f-9bef422f69e0"],
"remote_user"=>"admin",
"remote_cp_user"=>"foreman_admin",
"locale"=>"en"}
Output:
{"pulp_tasks"=>
[{"exception"=>nil,
"task_type"=>
"pulp.server.managers.consumer.applicability.regenerate_applicability_for_consumers",
"_href"=>"/pulp/api/v2/tasks/7e16610e-e66b-4a4c-b955-67e9e7ae70c6/",
"task_id"=>"7e16610e-e66b-4a4c-b955-67e9e7ae70c6",
"tags"=>["pulp:action:content_applicability_regeneration"],
"finish_time"=>nil,
"start_time"=>nil,
"traceback"=>nil,
"spawned_tasks"=>[],
"progress_report"=>{},
"queue"=>
"reserved_resource_worker-2.lab.eng.bos.redhat.com.dq",
"state"=>"waiting",
"worker_name"=>
"reserved_resource_worker-2.lab.eng.bos.redhat.com",
"result"=>nil,
"error"=>nil,
"_id"=>{"$oid"=>"550884f3ce8460f5cb567502"},
"id"=>"550884f307c6a9275dc0b69f"}],
"poll_attempts"=>{"total"=>1470, "failed"=>0}}
Please retest with snap7, compose 2. This does seem to be fixed in latest compose. VERIFIED, will reopen if things begin showing up again. This bug is slated to be released with Satellite 6.1. This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015. Dear friends,
I think the bug came back again
I installed the latest version of Satellite 6.1.1 , I enabled repositories and put to sample sync all .
Once downloaded, no sync task done yet .
Below the logs :
====> Plan
Actions::Katello::Repository::Sync
Actions::Pulp::Repository::Sync
Actions::ElasticSearch::Repository::IndexContent
Actions::ElasticSearch::Reindex
Actions::Katello::Foreman::ContentUpdate
Actions::Katello::Repository::CorrectChecksum
Actions::Katello::Repository::UpdateMedia
Actions::Katello::Repository::ErrataMail
Actions::Pulp::Repository::RegenerateApplicability
====> Run
3: Actions::Pulp::Repository::Sync (success) [ 2097.38s / 46.67s ]
5: Actions::ElasticSearch::Repository::IndexContent (success) [ 4.27s / 4.27s ]
15: Actions::Katello::Repository::ErrataMail (success) [ 0.15s / 0.15s ]
17: Actions::Katello::Repository::Sync (success) [ 4.28s / 4.28s ]
***************
20: Actions::Pulp::Repository::RegenerateApplicability (waiting for Pulp to start the task) [ 160514.40s / 639.45s ] Cancel
Started at: 2015-09-01 21:05:38 UTC
Ended at: 2015-09-03 17:40:52 UTC
Real time: 160514.40s
Execution time (excluding suspended state): 639.45s
Started at: 2015-09-01 21:05:38 UTC
Ended at: 2015-09-03 17:45:51 UTC
Real time: 160812.90s
Execution time (excluding suspended state): 639.55s
Input:
---
pulp_id: Default_Organization-JBoss_Enterprise_Application_Platform-JBoss_Enterprise_Application_Platform_5_RHEL_6_Server_RPMs_i386_6_1
remote_user: admin-ffd85ed4
remote_cp_user: admin
locale: en
Output:
---
pulp_tasks:
- exception:
task_type: pulp.server.managers.consumer.applicability.regenerate_applicability_for_repos
_href: /pulp/api/v2/tasks/c0b8a99b-3bbf-438b-8063-2e812ad05591/
task_id: c0b8a99b-3bbf-438b-8063-2e812ad05591
tags:
- pulp:action:content_applicability_regeneration
finish_time:
start_time:
traceback:
spawned_tasks: []
progress_report: {}
queue: None.dq
state: waiting
worker_name:
result:
error:
_id:
$oid: 55e61322eeb2aa1f61bb9d2f
id: 55e61322e7db850ab1aa0de9
poll_attempts:
total: 2576
failed: 0
***************
====> Finalize
7: Actions::ElasticSearch::Reindex (pending)
9: Actions::Katello::Foreman::ContentUpdate (pending)
11: Actions::Katello::Repository::CorrectChecksum (pending)
13: Actions::Katello::Repository::UpdateMedia (pending)
16: Actions::Katello::Repository::ErrataMail (pending)
18: Actions::Katello::Repository::Sync (pending)
Best Regards,
Adriano Oliveira
Reopened the bug, because I've also experienced this with 6.1.3 and customer on case - 01544959 This casuses the following bug https://bugzilla.redhat.com/1285276 I resolved this task by rebooting/then resuming the tasks More info: the tasks pop up whenever a system installs new packages or binds to new repo I've just ran into this in 6.1.5. Syncing a lot of repos during the night, and when I came into work this morning none has finished, they all just stay at pending, and no new task will run, just get put to pending. D'oh, I read the date wrong, the new cases are from last year, I'll leave it closed, seems like it's fixed in 6.2. I'm running 6.1.9. there are more than 450 this jobs piling up. So I think the issue is still there. Nicholas: We closed this as NOTABUG but going to update it to CURRENTRELEASE as this is no longer occurring in our latest 6.2 release. If you are still experiencing this on 6.1.X we would recommend an upgrade to 6.2. If you need assistance with planning and execution of this upgrade, feel free to reach out to Red Hat support. Version 6.2.11, I regularly see tasks spending a lot of time in suspended state. 17: Actions::Pulp::Repository::RegenerateApplicability (suspended) [ 13089.58s / 45.12s ] Started at: 2017-08-28 03:05:14 UTC Ended at: 2017-08-28 06:43:24 UTC Real time: 13089.58s Execution time (excluding suspended state): 45.12s Input: --- pulp_id: ID-SD-VM-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_7_Server_-_Optional_RPMs_x86_64_7Server contents_changed: true remote_user: admin remote_cp_user: admin Output: --- pulp_task_group: group_id: ddf7d165-0db8-4112-89cc-e9d0cbfc2b4d accepted: 0 finished: 0 running: 0 canceled: 0 waiting: 1 skipped: 0 suspended: 0 error: 0 total: 1 poll_attempts: total: 835 failed: 0 (In reply to Bengt Giger from comment #31) > Version 6.2.11, I regularly see tasks spending a lot of time in suspended > state. Hi Bengt, You said you see them regularly. What do you do with such tasks? Are they completed at some point or you cancel them somehow? Do you know if some Katello/Pulp services were restarted after this task was scheduled? Thanks! We let them end cleanly, too much fear to introduce inconsistencies. I don't know if the tasks in questions use system resources. We have tried to optimize the sync plans of our 19 organization throughout the night so they do not interfere too much. During the syncs our server with 16 CPUs and 48 GB memory is almost saturated and the load of such a single task is unclear to me. The services are stopped and restarted for backup before the heavy sync work starts, so no: there is no restart during the sync tasks. But I see sometimes a product taking 15 minutes to sync, and for the next org it takes one hour, during the same night. I tried to find principles behind this differing behavior, without success. (In reply to Bengt Giger from comment #35) > We let them end cleanly, too much fear to introduce inconsistencies. I don't > know if the tasks in questions use system resources. We have tried to > optimize the sync plans of our 19 organization throughout the night so they > do not interfere too much. During the syncs our server with 16 CPUs and 48 > GB memory is almost saturated and the load of such a single task is unclear > to me. > > The services are stopped and restarted for backup before the heavy sync work > starts, so no: there is no restart during the sync tasks. > > But I see sometimes a product taking 15 minutes to sync, and for the next > org it takes one hour, during the same night. I tried to find principles > behind this differing behavior, without success. So applicability regeneration task is not stuck forever and is completed eventually. This can happen for several reasons, one of them is workers are busy with syncs or something else and this task is just in a queue, waiting to be picked up. This behavior is unrelated to this BZ where the issue is with *consumer* applicability tasks which are never completed. This BZ was opened initially for consumer applicability regeneration tasks which are stuck. In 6.2.10 the way how those tasks are scheduled was changed (https://bugzilla.redhat.com/show_bug.cgi?id=1255901) so they won't pile up because they are no longer waiting on each other, just for workers to pick them up. Some of the reports here refer to another applicability task (repository one, not consumer). This one is usually scheduled after sync is complete. To be able to regenerate applicability in parallel, many repo applicability tasks can be created (tens or hundreds of them). That's ok and expected. They will be completed eventually. There is a case when few of those applicability tasks can get stuck and the fix is on the way https://bugzilla.redhat.com/show_bug.cgi?id=1468078 If on 6.2.9, I suggest to upgrade for the improvement/fix mentioned above. Based on https://bugzilla.redhat.com/show_bug.cgi?id=1203050#c38, I am closing this out as NEXTRELEASE. If you are still seeing this after 6.3 is released, please feel free to re-open. |
Description of problem: When running a populated capsule long enough, eventually you will get a very large queue of "Generate applicability {"system_ids"=>[7], "locale"=>"en"}". At some point one task seems to hang/wedge (no real errors thrown?) but others keep piling on and never complete, presumably because they are waiting on this one to complete. Version-Release number of selected component (if applicable): Satellite-6.1.0-RHEL-7-20150311.1 How reproducible: Not sure exactly, but run an instance long enough and you'll probably see it. Steps to Reproduce: 1. Populate a satellite with a variety of repo content, etc., and register capsules to it, sync capsules, etc -- general content management tasks. If it is perhaps useful, register clients direct to capsules (not main sat itself) 2. Use capsule for a couple days 3. View Tasks Actual results: Eventually you will get a large number of tasks queued up like Generate applicability {"system_ids"=>[14], "locale"=>"en"} running pending 2015-03-17 20:30:04 UTC foreman_admin Generate applicability {"system_ids"=>[14], "locale"=>"en"} running pending 2015-03-17 20:29:58 UTC foreman_admin Generate applicability {"system_ids"=>[14], "locale"=>"en"} running pending 2015-03-17 20:24:13 UTC foreman_admin Generate applicability {"system_ids"=>[7], "locale"=>"en"} running pending 2015-03-17 20:17:14 UTC foreman_admin Generate applicability {"system_ids"=>[7], "locale"=>"en"} running pending 2015-03-17 20:17:02 UTC foreman_admin Generate applicability {"system_ids"=>[7], "locale"=>"en"} running pending 2015-03-17 20:16:55 UTC foreman_admin At some point, one of these wedged on my system (at this point, around 8 hours ago) and subsequently no more thereafter are completing, and I have about 1.5 pages worth of "pending" dynflow tasks This apparently comes somewhere from "Actions::Pulp::Consumer::GenerateApplicability" Expected results: Generate applicability (in pulp?) does not wedge. Additional info: