Bug 1159303 - pulp.agent.<uuid> queue not deleted when consumer is deleted
Summary: pulp.agent.<uuid> queue not deleted when consumer is deleted
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Pulp
Classification: Retired
Component: consumers
Version: 2.4.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.6.0
Assignee: Jeff Ortel
QA Contact: Irina Gulina
URL:
Whiteboard:
Depends On: 1174361 1175512
Blocks: 1139277 1159281 1159961
TreeView+ depends on / blocked
 
Reported: 2014-10-31 12:46 UTC by Brian Bouterse
Modified: 2019-05-20 11:20 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
: 1159961 (view as bug list)
Environment:
Last Closed: 2015-02-28 22:42:48 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 603 0 None None None Never
Red Hat Knowledge Base (Solution) 1342303 0 None None None Never

Description Brian Bouterse 2014-10-31 12:46:27 UTC
I heard this second hand, so this BZ is about investigating if this is an issue or not. The observation that there might be a problem is on an installation of Pulp where the broker has many (100+) pulp.agent.<uuid> queues, but only a few consumers registered to the system.

Expected Behavior:
I expect that when I create a consumer a pulp.agent.<uuid> queue is created, and when I delete a consumer the corresponding pulp.agent.<uuid> is also removed from the broker.

Why is this important:
Lingering, unnecessary queues could cause qpid to run out of file descriptors.

Comment 1 Brian Bouterse 2014-10-31 14:20:11 UTC
After some investigation it was determined that Pulp does not remove old consumer subscribers during unregistration. The deletion of the pulp.agent.<uuid> is the expected behavior, so we should fix this.

After some discussion with jortel, we determined two changes need to be made:

1) pulp.agent.<uuid> needs to be 100% managed server side. This includes the creation and deletion of the queues. The 'manager' code for the consumer can handle this responsibility. Goferd needs to be adjusted to optionally not declare queues from the consumer side, and the pulp-agent and katello-agent will need small changes so that they know the server will create and delete the consumer queues.

2) A reaper task that removes these queues needs to be added. This runs periodically with some frequency and removes "orphaned_queues" that have been orphaned after an unregister event longer than X amount of time. These two options will be specified in server.conf and will come with defaults.

3) Docs need to be written on this indicating how cleanup occurs.

The reaper design is necessary versus a "delete right now during unregistration" design because there is one final message that needs to be delivered to the consumer, and we should allow a reasonable amount of time for that to occur before the server force-deletes the queue.

Comment 2 Pavel Moravec 2014-10-31 17:51:44 UTC
(In reply to bbouters from comment #1)
> 2) A reaper task that removes these queues needs to be added. This runs
> periodically with some frequency and removes "orphaned_queues" that have
> been orphaned after an unregister event longer than X amount of time. These
> two options will be specified in server.conf and will come with defaults.

Note you can use auto-delete queues with deletion timeout (x-declare argument qpid.auto_delete_timeout). If a queue loses its latest consumer, the qpid broker waits for the timeout seconds and then it deletes the queue itself. This might simplify the implementation. Note however that if the queue never has a consumer, it wont be deleted any time (the auto-delete timer is triggered _only_ when latest consumer unsubscribes). Also I am not sure if the same functionality is available in RabbitMQ.

Comment 3 Brian Bouterse 2014-10-31 20:20:58 UTC
Auto delete would be an elegant way to solve this, but I'm not sure how it would allow for consumers to be powered off for longer than the auto_delete timeout. A consumer that was running, but then had its agent service stopped or is powered off will wake up at some point in the future to find its queue missing.

Today, if an agent finds its queue missing it recreates it so it would recover, but any meaningful messages that were issued to the consumer (ie: bind or unbind) while it is turned off would be missing.

This could cause some very unexpected behaviors for the user where they remember doing an action such as binding a group of consumers to a given repo, but whoever wasn't powered on at that moment doesn't receive the config.

Suggestions for how we can make the auto_delete approach work are welcome because it is more elegant but I'm not sure how to resolve this one lingering problem.

Comment 4 Pavel Moravec 2014-11-03 10:48:14 UTC
(In reply to bbouters from comment #3)
> Auto delete would be an elegant way to solve this, but I'm not sure how it
> would allow for consumers to be powered off for longer than the auto_delete
> timeout. A consumer that was running, but then had its agent service stopped
> or is powered off will wake up at some point in the future to find its queue
> missing.

Good point. What about auto_delete queue (optionally with some timeout) that has alternate exchange set, as well as the exchange routing messages to these queues has the same alt.exchange set. Plus having one durable auxiliary queue that gets all messages routed via the alt.exchange. Then:

- when deleting the queue due to auto-delete parameter, all messages in the queue are redirected via the alt.exchange to the aux.queue
- when a consumer is off and broker should deliver a message to its (deleted) queue, it finds the original exchange does not have a matching binding (to the deleted queue) so the broker re-routes the message to the alt.exchange, i.e. to the aux.queue

When a consumer is starting, if it detects its queue is gone, it would have to call QMF method "queueMoveMessages" that moves messages from aux.queue to the proper newly created pulp.agent.<uuid> queue, with some proper filter (to move just messages relevant to the pulp consumer).

Gotchas:
- needs some more testing and probably bigger code changes
- if some default exchange (like "" one) is used for distributing the messages to pulp.agent.<uuid> queues, you cant set alternate exchange for such an exchange. But you can create a new exchange for this traffic.
- this solution does not cover situations when a pulp consumer wont power on at all (not sure if allowed scenario). Then the aux.queue would keep messages for this consumer forever. But IMHO this problem is common to any solution - is there a way how to identify a consumer wont ever power on?

Comment 5 Brian Bouterse 2014-11-03 19:48:37 UTC
This design introduces a lot of complexity, but it doesn't fully address all of the gotchas. I think marking a queue as safe-to-be-deleted no sooner than X minutes after the unregistration event occurs seems much simpler and does correctly handle all cases.

Comment 6 Brian Bouterse 2014-11-04 15:47:37 UTC
Raising the priority given the bugs this BZ blocks.

Comment 10 Pavel Moravec 2014-11-30 09:36:19 UTC
FYI trivial reproducer for this is just registering and unregistering a content host via subscription-manager, like:

subscription-manager register --org="Default_Organization" --environment="Library" --username=admin --password=<Sat6_admin_password>
subscription-manager unregister
subscription-manager clean

Doing so, one extra pulp.agent.<uuid> queue is created and not deleted.

Note that this reproducer covers just one use case, not all of them.

Comment 11 Jeff Ortel 2015-01-29 17:26:44 UTC
https://github.com/pulp/pulp/pull/1583

Comment 12 Chris Duryee 2015-02-10 22:33:02 UTC
2.6.0-0.7.beta

Comment 13 Irina Gulina 2015-02-20 10:59:05 UTC
>> rpm -qa pulp-server
pulp-server-2.6.0-0.7.beta.fc20.noarch

>> pulp-consumer -u admin -p admin register --consumer-id bobik
Consumer [bobik] successfully registered

>> qpid-stat -q | grep bobik
  pulp.agent.bobik                                                                     Y                      0     0      0       0      0        0         1     1

>> pulp-consumer unregister 
Consumer [bobik] successfully unregistered

>>  qpid-stat -q | grep bobik
#

Comment 14 Irina Gulina 2015-02-20 16:49:07 UTC
the previous comment was with running gofer.
here is test #2 - with not running agent:

>> pulp-consumer -u admin -p admin register --consumer-id lelik
Consumer [lelik] successfully registered

>> qpid-stat -q | grep lelik
  pulp.agent.lelik                                                                     Y                      0     0      0       0      0        0         0     1


>> systemctl stop goferd
>> systemctl status goferd
goferd.service - Gofer Agent
   Loaded: loaded (/usr/lib/systemd/system/goferd.service; enabled)
   Active: inactive (dead) since Fri 2015-02-20 16:01:27 UTC; 2s ago
 Main PID: 31303 (code=killed, signal=TERM)
...
>> qpid-stat -q | grep lelik
  pulp.agent.lelik                                                                     Y                      0     0      0       0      0        0         0     1


>> pulp-consumer unregister
Consumer [lelik] successfully unregistered

>> date
Fri Feb 20 16:11:22 UTC 2015

>>  qpid-stat -q | grep lelik
  pulp.agent.lelik                                                                     Y                      1     1      0     763    763        0         0     1

>> date
Fri Feb 20 16:20:52 UTC 2015

>> qpid-stat -q | grep lelik
  pulp.agent.lelik                                                                     Y                      1     1      0     763    763        0         0     1

>>  date
Fri Feb 20 16:24:39 UTC 2015

>> qpid-stat -q | grep lelik
#

Comment 15 Brian Bouterse 2015-02-28 22:42:48 UTC
Moved to https://pulp.plan.io/issues/603


Note You need to log in before you can comment on or make changes to this bug.