1159961 – pulp.agent.<uuid> queue not deleted when consumer is deleted

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1159961 - pulp.agent.<uuid> queue not deleted when consumer is deleted

Summary: pulp.agent.<uuid> queue not deleted when consumer is deleted

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Pulp
Sub Component:
Version:	6.0.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	Unspecified
Assignee:	satellite6-bugs
QA Contact:	Corey Welton
Docs Contact:
URL:
Whiteboard:
Depends On:	1159303
Blocks:	sat6-pulp-blocker
TreeView+	depends on / blocked

Reported:	2014-11-03 18:02 UTC by Justin Sherrill
Modified:	2021-04-06 18:03 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1159303
Environment:
Last Closed:	2015-08-12 16:03:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Pulp Redmine	603	0	High	CLOSED - CURRENTRELEASE	pulp.agent.<uuid> queue not deleted when consumer is deleted	Never

Description Justin Sherrill 2014-11-03 18:02:35 UTC

+++ This bug was initially created as a clone of Bug #1159303 +++

I heard this second hand, so this BZ is about investigating if this is an issue or not. The observation that there might be a problem is on an installation of Pulp where the broker has many (100+) pulp.agent.<uuid> queues, but only a few consumers registered to the system.

Expected Behavior:
I expect that when I create a consumer a pulp.agent.<uuid> queue is created, and when I delete a consumer the corresponding pulp.agent.<uuid> is also removed from the broker.

Why is this important:
Lingering, unnecessary queues could cause qpid to run out of file descriptors.

--- Additional comment from  on 2014-10-31 10:20:11 EDT ---

After some investigation it was determined that Pulp does not remove old consumer subscribers during unregistration. The deletion of the pulp.agent.<uuid> is the expected behavior, so we should fix this.

After some discussion with jortel, we determined two changes need to be made:

1) pulp.agent.<uuid> needs to be 100% managed server side. This includes the creation and deletion of the queues. The 'manager' code for the consumer can handle this responsibility. Goferd needs to be adjusted to optionally not declare queues from the consumer side, and the pulp-agent and katello-agent will need small changes so that they know the server will create and delete the consumer queues.

2) A reaper task that removes these queues needs to be added. This runs periodically with some frequency and removes "orphaned_queues" that have been orphaned after an unregister event longer than X amount of time. These two options will be specified in server.conf and will come with defaults.

3) Docs need to be written on this indicating how cleanup occurs.

The reaper design is necessary versus a "delete right now during unregistration" design because there is one final message that needs to be delivered to the consumer, and we should allow a reasonable amount of time for that to occur before the server force-deletes the queue.

--- Additional comment from Pavel Moravec on 2014-10-31 13:51:44 EDT ---

(In reply to bbouters from comment #1)
> 2) A reaper task that removes these queues needs to be added. This runs
> periodically with some frequency and removes "orphaned_queues" that have
> been orphaned after an unregister event longer than X amount of time. These
> two options will be specified in server.conf and will come with defaults.

Note you can use auto-delete queues with deletion timeout (x-declare argument qpid.auto_delete_timeout). If a queue loses its latest consumer, the qpid broker waits for the timeout seconds and then it deletes the queue itself. This might simplify the implementation. Note however that if the queue never has a consumer, it wont be deleted any time (the auto-delete timer is triggered _only_ when latest consumer unsubscribes). Also I am not sure if the same functionality is available in RabbitMQ.

--- Additional comment from  on 2014-10-31 16:20:58 EDT ---

Auto delete would be an elegant way to solve this, but I'm not sure how it would allow for consumers to be powered off for longer than the auto_delete timeout. A consumer that was running, but then had its agent service stopped or is powered off will wake up at some point in the future to find its queue missing.

Today, if an agent finds its queue missing it recreates it so it would recover, but any meaningful messages that were issued to the consumer (ie: bind or unbind) while it is turned off would be missing.

This could cause some very unexpected behaviors for the user where they remember doing an action such as binding a group of consumers to a given repo, but whoever wasn't powered on at that moment doesn't receive the config.

Suggestions for how we can make the auto_delete approach work are welcome because it is more elegant but I'm not sure how to resolve this one lingering problem.

--- Additional comment from Pavel Moravec on 2014-11-03 05:48:14 EST ---

(In reply to bbouters from comment #3)
> Auto delete would be an elegant way to solve this, but I'm not sure how it
> would allow for consumers to be powered off for longer than the auto_delete
> timeout. A consumer that was running, but then had its agent service stopped
> or is powered off will wake up at some point in the future to find its queue
> missing.

Good point. What about auto_delete queue (optionally with some timeout) that has alternate exchange set, as well as the exchange routing messages to these queues has the same alt.exchange set. Plus having one durable auxiliary queue that gets all messages routed via the alt.exchange. Then:

- when deleting the queue due to auto-delete parameter, all messages in the queue are redirected via the alt.exchange to the aux.queue
- when a consumer is off and broker should deliver a message to its (deleted) queue, it finds the original exchange does not have a matching binding (to the deleted queue) so the broker re-routes the message to the alt.exchange, i.e. to the aux.queue

When a consumer is starting, if it detects its queue is gone, it would have to call QMF method "queueMoveMessages" that moves messages from aux.queue to the proper newly created pulp.agent.<uuid> queue, with some proper filter (to move just messages relevant to the pulp consumer).

Gotchas:
- needs some more testing and probably bigger code changes
- if some default exchange (like "" one) is used for distributing the messages to pulp.agent.<uuid> queues, you cant set alternate exchange for such an exchange. But you can create a new exchange for this traffic.
- this solution does not cover situations when a pulp consumer wont power on at all (not sure if allowed scenario). Then the aux.queue would keep messages for this consumer forever. But IMHO this problem is common to any solution - is there a way how to identify a consumer wont ever power on?

Comment 1 RHEL Program Management 2014-11-03 18:02:55 UTC

Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 4 pulp-infra@redhat.com 2015-04-23 16:39:58 UTC

The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 5 Bryan Kearney 2015-04-27 20:43:56 UTC

Upstream has marked this as resolved in pulp 2.6.0 which is delivered by Satellite 6.1.0. I am moving this to ON_QA.

Comment 6 Justin Sherrill 2015-05-05 21:11:12 UTC

So to test this you can run this command:

qpid-stat --ssl-certificate /etc/pki/katello/certs/java-client.crt --ssl-key /etc/pki/katello/private/java-client.key -b "amqps://$(hostname -f):5671" -q | grep pulp.agent

this will print out all the qpid queues belonging to client agents. An example:

  pulp.agent.1204f585-f9ac-4871-953f-a3190da3f541                       Y                      0     0      0       0      0        0         1     1

the UUID here is the client consumer uuid (which you can get from running subscription-manager identity on a client).

so to test this:

1.  Register a content-host with subscription-manager
2.  run 'subscription-manager identity' and record the UUID
3.  Install katello-agent and make sure goferd is running (service goferd start)
3.  Run the above command and verify that there is a queue with that UUID
4.  Delete the content host either from the UI or from running 'subscription-manager unregister'
5.  re-run that command and see if the queue is removed.  It may take up to 10 minutes for that to happen.


You may also want to simply re-register the same client a bunch of times (such as 20) and verify after 15 minutes that there are not 20 extra queues.  The consumers being deleted should have their queues cleaned up.

Comment 8 Corey Welton 2015-07-23 01:39:47 UTC

Verified in SNAP14


* ran the following in a loop on a client box.

[root@mgmt5 ~]# cat multiregister.sh 
echo -n "Loop how many times? "
read loopcount
increment="0"
while  [ $increment -lt $loopcount ] 
do 
  subscription-manager register --org="cswiii" --activationkey="ak-sat6-tools-rhel5-abc" --force
  subscription-manager identity >> identities.txt
  subscription-manager unregister
  ((increment+=1))
done

* incremented re-registration 15 times. 
* ran the following on a sat host

watch "qpid-stat --ssl-certificate /etc/pki/katello/certs/java-client.crt --ssl-key /etc/pki/katello/private/java-client.key -b "amqps://$(hostname -f):5671" -q | grep pulp.agent|wc -l"


the number of uuids steadily rose for a while, and then eventually/subsequently began to drop.

I also cross-referenced the uuids placed in 'identities.txt' from the script above and assured they were no longer showing up in the output from the `qpid-stat` command above.

Comment 9 Bryan Kearney 2015-08-12 16:03:31 UTC

This bug was fixed in Satellite 6.1.1 which was delivered on 12 August, 2015.

Note You need to log in before you can comment on or make changes to this bug.