1300811 – Unacked messages in resource_manager queue

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1300811 - Unacked messages in resource_manager queue

Summary: Unacked messages in resource_manager queue

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Pulp
Sub Component:
Version:	6.1.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	Unspecified
Assignee:	Mike McCune
QA Contact:	Kedar Bidarkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-21 19:00 UTC by Brian Bouterse
Modified:	2022-07-09 07:40 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-02-15 15:52:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Pulp Redmine	1564	0	High	CLOSED - CURRENTRELEASE	Unacked messages in resource_manager queue	2016-05-27 16:00:48 UTC
Red Hat Product Errata	RHSA-2016:0174	0	normal	SHIPPED_LIVE	Moderate: Satellite 6.1.7 security, bug and enhancement fix update	2016-02-15 20:50:32 UTC

Description Brian Bouterse 2016-01-21 19:00:34 UTC

After executing a "heavy" workload, the resource_manager queue may have unacked messages in it. To reproduce you can:

1. Execute the heavy workload and let it finish. We've triggered this with a `hammer product synchronize --id xx` which would sync ~30 repos (often with no new packages) and then publish those repos.

2. List the qpid queue statistics:   `qpid-stat -q --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671`

3. Observe a output like the following:

queue                                                                             dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind 
=================================================================================================================================================================
<snip>
resource_manager                                                                  Y                     19   338    319    33.4k   567k     534k        1     2
<snip>

4. Observe that all Pulp tasks have completed with state finished, and yet there are still messages in the 'msg' column. 19 in the example above. You can verify all Pulp tasks as being completed by connecting to mongo, switching to the pulp_database, and running `db.task_status.find({ state:{ $ne: "finished" } } ).count()`. If all pulp tasks are finished that query will return 0.

This has been observed on both EL6 and EL7.

Comment 1 Brian Bouterse 2016-01-21 19:04:25 UTC

These unacked messages are being processed correctly, so the user should not be able to observe the problem while using Satellite. It is not expected to cause incorrect operation. Based on input from the Qpid team, and from my own experiments, this bug will not cause deadlock.

Fixing this is important because it makes the possible deadlocking bug[0] more difficult to determine if a system actually is deadlocked.

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1288223

Comment 2 pulp-infra@redhat.com 2016-01-21 19:24:50 UTC

The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 3 pulp-infra@redhat.com 2016-01-21 19:24:52 UTC

The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 4 pulp-infra@redhat.com 2016-01-22 16:30:19 UTC

The Pulp upstream bug priority is at High. Updating the external tracker on this bug.

Comment 5 pulp-infra@redhat.com 2016-01-26 15:00:27 UTC

The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 6 Brian Bouterse 2016-01-27 02:23:49 UTC

I've tested this upstream, and they builds completed. I then installed it on the reproducer machine and it fully resolved the issue**.

Downstream should build python-kombu to be the same as what is inside upstream python-kombu-3.0.24-11.pulp. That was created with this commit[0], which I believe is the only commit that is not already applied to downstream. The downstream python-kombu should contain exactly the same source and patches as [1]. This is for 6.1.z.

For 6.2.0, upstream will be building a newer version of python-kombu which will be included in an upcoming beta and will later be built in downstream. I'll ensure that version will also contain this fix.

[0]: https://github.com/pulp/pulp/commit/0d2de5d5272bab25c25e392b5b887665b5f89465
[1]: https://github.com/pulp/pulp/tree/0d2de5d5272bab25c25e392b5b887665b5f89465/deps/python-kombu

**: 
1) 2 task fail, but it ends up in an error state and pulp reports some
normal pulp error (ie: a connection error).

2) After the reproducer was run, there were 0 tasks left in the
resource_manager queue (or any worker) so everything got processed. This is very different than the what happened before.

3) I verified that the resource_manager emitted a task xxx started and a
corresponding task xxx stopped. None were "unrun" according to the logs. This is also different than before.

Comment 7 pulp-infra@redhat.com 2016-01-27 02:30:18 UTC

The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 10 Kedar Bidarkar 2016-02-09 05:49:05 UTC

VERIFIED with sat6.1.7 compose1 



~]# qpid-stat -q --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 | grep -ie queue -ie resource_manager
Queues
  queue                                                                                dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  
 resource_manager                                                                     Y                      0   207    207       0    343k     343k        1     2

-----------------------------------------------------------------------------
 ~]# mongo pulp_database
MongoDB shell version: 2.4.9
connecting to: pulp_database
Server has startup warnings: 
Fri Feb  5 04:36:00.647 [initandlisten] 
Fri Feb  5 04:36:00.647 [initandlisten] ** WARNING: Readahead for /var/lib/mongodb is set to 4096KB
Fri Feb  5 04:36:00.647 [initandlisten] **          We suggest setting it to 256KB (512 sectors) or less
Fri Feb  5 04:36:00.647 [initandlisten] **          http://dochub.mongodb.org/core/readahead
> db.task_status.find({ state:{ $ne: "finished" } } ).count()
0
-----------------------------------------------------------------------------
task_status in finished state is 0   and 
msg cloumn is 0

Comment 12 errata-xmlrpc 2016-02-15 15:52:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0174

Comment 13 pulp-infra@redhat.com 2016-02-23 21:00:16 UTC

The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug.

Comment 14 pulp-infra@redhat.com 2016-05-27 16:00:49 UTC

The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Note You need to log in before you can comment on or make changes to this bug.