Bug 1203050

Summary:	"Generate applicability" tasks pile up (pulp?)
Product:	Red Hat Satellite	Reporter:	Corey Welton <cwelton>
Component:	Pulp	Assignee:	satellite6-bugs <satellite6-bugs>
Status:	CLOSED NEXTRELEASE	QA Contact:	Katello QA List <katello-qa-list>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.2.9	CC:	adrianol.redhat, arydekul, bbuckingham, bkearney, cwelton, dzhukous, egolov, ekin.meroglu, elavarde, inecas, jhunt, jnikolak, jorgen.langgat, katello-bugs, ktrufano, mhrivnak, mmccune, nicholas.tian, pgozart, rhbgs.10.bigi_gigi, sauchter, schamilt, stbenjam, ttereshc, xdmoon
Target Milestone:	Unspecified	Keywords:	Reopened, Triaged
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-09-27 19:53:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Corey Welton 2015-03-18 02:17:59 UTC

Description of problem:

When running a populated capsule long enough, eventually you will get a very large queue of "Generate applicability {"system_ids"=>[7], "locale"=>"en"}".  At some point one task seems to hang/wedge (no real errors thrown?) but others keep piling on and never complete, presumably because they are waiting on this one to complete.

Version-Release number of selected component (if applicable):
Satellite-6.1.0-RHEL-7-20150311.1

How reproducible:
Not sure exactly, but run an instance long enough and you'll probably see it.

Steps to Reproduce:
1.  Populate a satellite with a variety of repo content, etc., and register capsules to it, sync capsules, etc -- general content management tasks.  If it is perhaps useful, register clients direct to capsules (not main sat itself)
2.  Use capsule for a couple days
3.  View Tasks

Actual results:

Eventually you will get a large number of tasks queued up like

Generate applicability {"system_ids"=>[14], "locale"=>"en"}	running	pending	2015-03-17 20:30:04 UTC	foreman_admin
Generate applicability {"system_ids"=>[14], "locale"=>"en"}	running	pending	2015-03-17 20:29:58 UTC	foreman_admin
Generate applicability {"system_ids"=>[14], "locale"=>"en"}	running	pending	2015-03-17 20:24:13 UTC	foreman_admin
Generate applicability {"system_ids"=>[7], "locale"=>"en"}	running	pending	2015-03-17 20:17:14 UTC	foreman_admin
Generate applicability {"system_ids"=>[7], "locale"=>"en"}	running	pending	2015-03-17 20:17:02 UTC	foreman_admin
Generate applicability {"system_ids"=>[7], "locale"=>"en"}	running	pending	2015-03-17 20:16:55 UTC	foreman_admin

At some point, one of these wedged on my system (at this point, around 8 hours ago) and subsequently no more thereafter are completing, and I have about 1.5 pages worth of "pending" dynflow tasks

This apparently comes somewhere from "Actions::Pulp::Consumer::GenerateApplicability"


Expected results:


Generate applicability (in pulp?) does not wedge.


Additional info:

Comment 1 Corey Welton 2015-03-18 02:19:47 UTC

Action:
Actions::Pulp::Consumer::GenerateApplicability
Input:
{"uuids"=>["805890e3-97f2-4938-9a1f-9bef422f69e0"],
 "remote_user"=>"admin",
 "remote_cp_user"=>"foreman_admin",
 "locale"=>"en"}
Output:
{"pulp_tasks"=>
  [{"exception"=>nil,
    "task_type"=>
     "pulp.server.managers.consumer.applicability.regenerate_applicability_for_consumers",
    "_href"=>"/pulp/api/v2/tasks/7e16610e-e66b-4a4c-b955-67e9e7ae70c6/",
    "task_id"=>"7e16610e-e66b-4a4c-b955-67e9e7ae70c6",
    "tags"=>["pulp:action:content_applicability_regeneration"],
    "finish_time"=>nil,
    "start_time"=>nil,
    "traceback"=>nil,
    "spawned_tasks"=>[],
    "progress_report"=>{},
    "queue"=>
     "reserved_resource_worker-2.lab.eng.bos.redhat.com.dq",
    "state"=>"waiting",
    "worker_name"=>
     "reserved_resource_worker-2.lab.eng.bos.redhat.com",
    "result"=>nil,
    "error"=>nil,
    "_id"=>{"$oid"=>"550884f3ce8460f5cb567502"},
    "id"=>"550884f307c6a9275dc0b69f"}],
 "poll_attempts"=>{"total"=>1470, "failed"=>0}}

Comment 3 Bryan Kearney 2015-03-20 15:12:57 UTC

Please retest with snap7, compose 2.

Comment 4 Corey Welton 2015-03-23 23:42:23 UTC

This does seem to be fixed in latest compose.  VERIFIED, will reopen if things begin showing up again.

Comment 12 Bryan Kearney 2015-08-11 13:36:07 UTC

This bug is slated to be released with Satellite 6.1.

Comment 13 Bryan Kearney 2015-08-12 14:01:16 UTC

This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.

Comment 14 Adriano Oliveira 2015-09-03 17:52:21 UTC

Dear friends,

I think the bug came back again
I installed the latest version of Satellite 6.1.1 , I enabled repositories and put to sample sync all .
Once downloaded, no sync task done yet .

Below the logs :

====> Plan

Actions::Katello::Repository::Sync

Actions::Pulp::Repository::Sync

Actions::ElasticSearch::Repository::IndexContent

Actions::ElasticSearch::Reindex

Actions::Katello::Foreman::ContentUpdate

Actions::Katello::Repository::CorrectChecksum

Actions::Katello::Repository::UpdateMedia

Actions::Katello::Repository::ErrataMail

Actions::Pulp::Repository::RegenerateApplicability

====> Run

3: Actions::Pulp::Repository::Sync (success) [ 2097.38s / 46.67s ]
5: Actions::ElasticSearch::Repository::IndexContent (success) [ 4.27s / 4.27s ]
15: Actions::Katello::Repository::ErrataMail (success) [ 0.15s / 0.15s ]
17: Actions::Katello::Repository::Sync (success) [ 4.28s / 4.28s ]

***************
20: Actions::Pulp::Repository::RegenerateApplicability (waiting for Pulp to start the task) [ 160514.40s / 639.45s ]  Cancel
Started at: 2015-09-01 21:05:38 UTC

Ended at: 2015-09-03 17:40:52 UTC

Real time: 160514.40s

Execution time (excluding suspended state): 639.45s

Started at: 2015-09-01 21:05:38 UTC

Ended at: 2015-09-03 17:45:51 UTC

Real time: 160812.90s

Execution time (excluding suspended state): 639.55s

Input:

---
pulp_id: Default_Organization-JBoss_Enterprise_Application_Platform-JBoss_Enterprise_Application_Platform_5_RHEL_6_Server_RPMs_i386_6_1
remote_user: admin-ffd85ed4
remote_cp_user: admin
locale: en
Output:

---
pulp_tasks:
- exception: 
  task_type: pulp.server.managers.consumer.applicability.regenerate_applicability_for_repos
  _href: /pulp/api/v2/tasks/c0b8a99b-3bbf-438b-8063-2e812ad05591/
  task_id: c0b8a99b-3bbf-438b-8063-2e812ad05591
  tags:
  - pulp:action:content_applicability_regeneration
  finish_time: 
  start_time: 
  traceback: 
  spawned_tasks: []
  progress_report: {}
  queue: None.dq
  state: waiting
  worker_name: 
  result: 
  error: 
  _id:
    $oid: 55e61322eeb2aa1f61bb9d2f
  id: 55e61322e7db850ab1aa0de9
poll_attempts:
  total: 2576
  failed: 0

***************


====> Finalize

7: Actions::ElasticSearch::Reindex (pending)
9: Actions::Katello::Foreman::ContentUpdate (pending)
11: Actions::Katello::Repository::CorrectChecksum (pending)
13: Actions::Katello::Repository::UpdateMedia (pending)
16: Actions::Katello::Repository::ErrataMail (pending)
18: Actions::Katello::Repository::Sync (pending)


Best Regards,
Adriano Oliveira

Comment 17 jnikolak 2015-11-26 08:08:13 UTC

Reopened the bug, because I've also experienced this with 6.1.3 and customer 
on case - 01544959

This casuses the following bug
https://bugzilla.redhat.com/1285276

Comment 18 jnikolak 2015-12-03 07:58:01 UTC

I resolved this task by rebooting/then resuming the tasks

More info:
the tasks pop up whenever a system installs new packages or binds to new repo

Comment 20 Jorgen Langgat 2016-01-06 08:23:58 UTC

I've just ran into this in 6.1.5.

Syncing a lot of repos during the night, and when I came into work this morning none has finished, they all just stay at pending, and no new task will run, just get put to pending.

Comment 22 Stephen Benjamin 2017-01-16 17:18:36 UTC

D'oh, I read the date wrong, the new cases are from last year, I'll leave it closed, seems like it's fixed in 6.2.

Comment 23 nicholas.tian 2017-04-26 14:39:17 UTC

I'm running 6.1.9.  there are more than 450 this jobs piling up. So I think the issue is still there.

Comment 26 Mike McCune 2017-05-30 15:09:42 UTC

Nicholas:

We closed this as NOTABUG but going to update it to CURRENTRELEASE as this is no longer occurring in our latest 6.2 release.

If you are still experiencing this on 6.1.X we would recommend an upgrade to 6.2. If you need assistance with planning and execution of this upgrade, feel free to reach out to Red Hat support.

Comment 31 Bengt Giger 2017-08-28 07:00:07 UTC

Version 6.2.11, I regularly see tasks spending a lot of time in suspended state. 

17: Actions::Pulp::Repository::RegenerateApplicability (suspended) [ 13089.58s / 45.12s ]

Started at: 2017-08-28 03:05:14 UTC

Ended at: 2017-08-28 06:43:24 UTC

Real time: 13089.58s

Execution time (excluding suspended state): 45.12s

Input:

---
pulp_id: ID-SD-VM-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_7_Server_-_Optional_RPMs_x86_64_7Server
contents_changed: true
remote_user: admin
remote_cp_user: admin

Output:

---
pulp_task_group:
  group_id: ddf7d165-0db8-4112-89cc-e9d0cbfc2b4d
  accepted: 0
  finished: 0
  running: 0
  canceled: 0
  waiting: 1
  skipped: 0
  suspended: 0
  error: 0
  total: 1
poll_attempts:
  total: 835
  failed: 0

Comment 34 Tanya Tereshchenko 2017-09-01 10:40:56 UTC

(In reply to Bengt Giger from comment #31)
> Version 6.2.11, I regularly see tasks spending a lot of time in suspended
> state. 

Hi Bengt, 
You said you see them regularly. What do you do with such tasks? Are they completed at some point or you cancel them somehow? 
Do you know if some Katello/Pulp services were restarted after this task was scheduled?

Thanks!

Comment 35 Bengt Giger 2017-09-01 12:12:16 UTC

We let them end cleanly, too much fear to introduce inconsistencies. I don't know if the tasks in questions use system resources. We have tried to optimize the sync plans of our 19 organization throughout the night so they do not interfere too much. During the syncs our server with 16 CPUs and 48 GB memory is almost saturated and the load of such a single task is unclear to me. 

The services are stopped and restarted for backup before the heavy sync work starts, so no: there is no restart during the sync tasks.

But I see sometimes a product taking 15 minutes to sync, and for the next org it takes one hour, during the same night. I tried to find principles behind this differing behavior, without success.

Comment 37 Tanya Tereshchenko 2017-09-04 01:41:18 UTC

(In reply to Bengt Giger from comment #35)
> We let them end cleanly, too much fear to introduce inconsistencies. I don't
> know if the tasks in questions use system resources. We have tried to
> optimize the sync plans of our 19 organization throughout the night so they
> do not interfere too much. During the syncs our server with 16 CPUs and 48
> GB memory is almost saturated and the load of such a single task is unclear
> to me. 
> 
> The services are stopped and restarted for backup before the heavy sync work
> starts, so no: there is no restart during the sync tasks.
> 
> But I see sometimes a product taking 15 minutes to sync, and for the next
> org it takes one hour, during the same night. I tried to find principles
> behind this differing behavior, without success.

So applicability regeneration task is not stuck forever and is completed eventually. This can happen for several reasons, one of them is workers are busy with syncs or something else and this task is just in a queue, waiting to be picked up.

This behavior is unrelated to this BZ where the issue is with *consumer* applicability tasks which are never completed.

Comment 38 Tanya Tereshchenko 2017-09-04 02:02:39 UTC

This BZ was opened initially for consumer applicability regeneration tasks which are stuck. In 6.2.10 the way how those tasks are scheduled was changed (https://bugzilla.redhat.com/show_bug.cgi?id=1255901) so they won't pile up because they are no longer waiting on each other, just for workers to pick them up.

Some of the reports here refer to another applicability task (repository one, not consumer). This one is usually scheduled after sync is complete. To be able to regenerate applicability in parallel, many repo applicability tasks can be created (tens or hundreds of them). That's ok and expected. They will be completed eventually.

There is a case when few of those applicability tasks can get stuck and the fix is on the way https://bugzilla.redhat.com/show_bug.cgi?id=1468078

If on 6.2.9, I suggest to upgrade for the improvement/fix mentioned above.

Comment 43 Bryan Kearney 2017-09-27 19:53:35 UTC

Based on https://bugzilla.redhat.com/show_bug.cgi?id=1203050#c38, I am closing this out as NEXTRELEASE. If you are still seeing this after 6.3 is released, please feel free to re-open.