1154559 – goferd NotFound: no such queue: pulp.agent error handling missing, too frequent log messages

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1154559 - goferd NotFound: no such queue: pulp.agent error handling missing, too frequent log messages

Summary: goferd NotFound: no such queue: pulp.agent error handling missing, too freque...

Keywords:
Status:	CLOSED DUPLICATE of bug 1115988
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	katello-agent
Sub Component:
Version:	6.0.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Unspecified
Assignee:	satellite6-bugs
QA Contact:	Katello QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1122832 1175448
TreeView+	depends on / blocked

Reported:	2014-10-20 07:43 UTC by Peter Vreman
Modified:	2019-07-11 08:17 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-03-08 14:19:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Peter Vreman 2014-10-20 07:43:30 UTC

Description of problem:
After restoring a Satellite6 Server some nodes could not connect anymore to their original qpid queue. This error is reported every 10 seconds:

goferd: [ERROR][pulp.agent.13728a81-ffb6-42c2-8d34-96d345c750e4] gofer.transport.qpid.consumer:117 - 5f8645bc-2047-4644-bb44-c02121a0edb2
goferd: [ERROR][pulp.agent.13728a81-ffb6-42c2-8d34-96d345c750e4] gofer.transport.qpid.consumer:117 - Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get return self.__receiver.fetch(timeout=timeout) File "<string>", line 6, in fetch File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1030, in fetch self._ecwait(lambda: self.linked) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait result = self._ewait(lambda: self.closed or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 994, in _ewait self.check_error() File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 983, in check_error raise self.error NotFound: no such queue: pulp.agent.13728a81-ffb6-42c2-8d34-96d345c750e4

After restarting goferd (server goferd restart). The it will be connecting again to a queue without reporting futher issues:
goferd: [INFO][pulp.agent.13728a81-ffb6-42c2-8d34-96d345c750e4] gofer.transport.qpid.broker:83 - {li-lc-1017.hag.hilti.com:5671} connected to AMQP


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install Sat6 on VM
2. Create Sat6 VM snapshot
3. Register System to Sat6
4. Install katello-agent on System
4. Revert ot Sat6 VM snapshot before the system was registered
5. Watch syslog on System
6. Restart goferd on System

Actual results:
Errors every 10 seconds polluting the syslog. After restart then the error is gone.


Expected results:
Handle "NotFound: no such queue" as a permanent error. Because the Server is running again and the client can establish a connection. Some error handling alternatives are:
- Stop trying this old queue, allocate a new queue
- Reduce frequency of error
- Stop the goferd process



Additional info:

Comment 1 RHEL Program Management 2014-10-20 08:03:01 UTC

Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 3 Jeff Ortel 2014-12-19 00:28:24 UTC

Here is what I think is happening.  During the broker restart that happens as part of the sat6 revert, the agent tries to reconnect.  After it has successfully reconnected, that agent resumes fetching messages.  Except, because of the reset on the sat6 server, the queue no longer exists on the broker.  So, the fetch fails.

If not, I don't see how this situation can happen in the field under normal circumstances.  The agent declares the queue just prior to fetching messages.  Once the receiver starts fetching, the qpid broker will not permit the queue to be deleted.  For just this reason.

Do we support sat6 customers reverting their sat6 servers to earlier VM snapshots like this?

Comment 4 Jeff Ortel 2014-12-19 00:40:56 UTC

I suspect this can be re-created by running the broker without persistence and restarting it with an agent attached.  It's probably reasonable for the agent to make an effort to re-declare the queue when this exception is raised while fetching.

Comment 5 Michael Hrivnak 2015-10-01 17:35:00 UTC

Jeff, since a lot has changed since you last commented on this, can you re-fresh your analysis?

This seems like a bigger issue than just a missing queue: if you register a system, then revert satellite, your satellite will have no knowledge of that system. Would it be reasonable to just re-register the system? Is it straight-forward how to tell katello-agent to forget its previous registration and start from scratch?

Comment 6 Jeff Ortel 2015-10-08 14:52:36 UTC

Yes, if the system is re-registered and the agent restarted, the situation would be remedied.  In 6.0, the agent would think it's registered (even though it's not) when the consumer certificate is detected.  The queue would be re-declared and it would resume fetching messages.  However, in 6.1, in addition to detecting the certificate, registration is confirmed with a REST call to the satellite.  In this case (sat has been reverted) registration would not be confirmed.  Essentially know that it is no longer registered.  As a result, the agent would not re-create the queue and fetch messages.

In all cases, restarting the affected agents is required to clear things up.  If we want support reverting the satellite without agent restart, we'd need to add some additional functionality.  We'd need goferd to treat the missing queue as an event resulting in the reload of the plugin.  During the reload, the condition of no longer being registered would be detected and handled appropriately.

Comment 7 Peter Vreman 2016-03-08 14:19:35 UTC


*** This bug has been marked as a duplicate of bug 1115988 ***

Note You need to log in before you can comment on or make changes to this bug.