Bug 1300811 - Unacked messages in resource_manager queue
Summary: Unacked messages in resource_manager queue
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.1.6
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: Unspecified
Assignee: Mike McCune
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-21 19:00 UTC by Brian Bouterse
Modified: 2022-07-09 07:40 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-15 15:52:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 1564 0 High CLOSED - CURRENTRELEASE Unacked messages in resource_manager queue 2016-05-27 16:00:48 UTC
Red Hat Product Errata RHSA-2016:0174 0 normal SHIPPED_LIVE Moderate: Satellite 6.1.7 security, bug and enhancement fix update 2016-02-15 20:50:32 UTC

Description Brian Bouterse 2016-01-21 19:00:34 UTC
After executing a "heavy" workload, the resource_manager queue may have unacked messages in it. To reproduce you can:

1. Execute the heavy workload and let it finish. We've triggered this with a `hammer product synchronize --id xx` which would sync ~30 repos (often with no new packages) and then publish those repos.

2. List the qpid queue statistics:   `qpid-stat -q --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671`

3. Observe a output like the following:

queue                                                                             dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind 
=================================================================================================================================================================
<snip>
resource_manager                                                                  Y                     19   338    319    33.4k   567k     534k        1     2
<snip>

4. Observe that all Pulp tasks have completed with state finished, and yet there are still messages in the 'msg' column. 19 in the example above. You can verify all Pulp tasks as being completed by connecting to mongo, switching to the pulp_database, and running `db.task_status.find({ state:{ $ne: "finished" } } ).count()`. If all pulp tasks are finished that query will return 0.

This has been observed on both EL6 and EL7.

Comment 1 Brian Bouterse 2016-01-21 19:04:25 UTC
These unacked messages are being processed correctly, so the user should not be able to observe the problem while using Satellite. It is not expected to cause incorrect operation. Based on input from the Qpid team, and from my own experiments, this bug will not cause deadlock.

Fixing this is important because it makes the possible deadlocking bug[0] more difficult to determine if a system actually is deadlocked.

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1288223

Comment 2 pulp-infra@redhat.com 2016-01-21 19:24:50 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 3 pulp-infra@redhat.com 2016-01-21 19:24:52 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 4 pulp-infra@redhat.com 2016-01-22 16:30:19 UTC
The Pulp upstream bug priority is at High. Updating the external tracker on this bug.

Comment 5 pulp-infra@redhat.com 2016-01-26 15:00:27 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 6 Brian Bouterse 2016-01-27 02:23:49 UTC
I've tested this upstream, and they builds completed. I then installed it on the reproducer machine and it fully resolved the issue**.

Downstream should build python-kombu to be the same as what is inside upstream python-kombu-3.0.24-11.pulp. That was created with this commit[0], which I believe is the only commit that is not already applied to downstream. The downstream python-kombu should contain exactly the same source and patches as [1]. This is for 6.1.z.

For 6.2.0, upstream will be building a newer version of python-kombu which will be included in an upcoming beta and will later be built in downstream. I'll ensure that version will also contain this fix.

[0]: https://github.com/pulp/pulp/commit/0d2de5d5272bab25c25e392b5b887665b5f89465
[1]: https://github.com/pulp/pulp/tree/0d2de5d5272bab25c25e392b5b887665b5f89465/deps/python-kombu

**: 
1) 2 task fail, but it ends up in an error state and pulp reports some
normal pulp error (ie: a connection error).

2) After the reproducer was run, there were 0 tasks left in the
resource_manager queue (or any worker) so everything got processed. This is very different than the what happened before.

3) I verified that the resource_manager emitted a task xxx started and a
corresponding task xxx stopped. None were "unrun" according to the logs. This is also different than before.

Comment 7 pulp-infra@redhat.com 2016-01-27 02:30:18 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 10 Kedar Bidarkar 2016-02-09 05:49:05 UTC
VERIFIED with sat6.1.7 compose1 



~]# qpid-stat -q --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 | grep -ie queue -ie resource_manager
Queues
  queue                                                                                dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  
 resource_manager                                                                     Y                      0   207    207       0    343k     343k        1     2

-----------------------------------------------------------------------------
 ~]# mongo pulp_database
MongoDB shell version: 2.4.9
connecting to: pulp_database
Server has startup warnings: 
Fri Feb  5 04:36:00.647 [initandlisten] 
Fri Feb  5 04:36:00.647 [initandlisten] ** WARNING: Readahead for /var/lib/mongodb is set to 4096KB
Fri Feb  5 04:36:00.647 [initandlisten] **          We suggest setting it to 256KB (512 sectors) or less
Fri Feb  5 04:36:00.647 [initandlisten] **          http://dochub.mongodb.org/core/readahead
> db.task_status.find({ state:{ $ne: "finished" } } ).count()
0
-----------------------------------------------------------------------------
task_status in finished state is 0   and 
msg cloumn is 0

Comment 12 errata-xmlrpc 2016-02-15 15:52:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0174

Comment 13 pulp-infra@redhat.com 2016-02-23 21:00:16 UTC
The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug.

Comment 14 pulp-infra@redhat.com 2016-05-27 16:00:49 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.


Note You need to log in before you can comment on or make changes to this bug.