Bug 1353458 - qdrouterd stops responding on any connection while waiting for hung DNS PTR query
Summary: qdrouterd stops responding on any connection while waiting for hung DNS PTR q...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: katello-agent
Version: 6.1.9
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: Unspecified
Assignee: Ted Ross
QA Contact: jcallaha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-07 07:37 UTC by Pavel Moravec
Modified: 2023-09-14 03:28 UTC (History)
12 users (show)

Fixed In Version: qpid-dispatch-0.4-17
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-10 08:13:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Apache JIRA DISPATCH-443 0 None None None 2016-07-07 07:39:16 UTC
Red Hat Issue Tracker SAT-20055 0 None None None 2023-09-14 03:28:34 UTC
Red Hat Knowledge Base (Solution) 2429011 0 None None None 2016-07-07 07:38:59 UTC
Red Hat Product Errata RHBA-2016:2699 0 normal SHIPPED_LIVE Satellite 6.2.4 Async Bug Release 2016-11-10 13:12:22 UTC

Description Pavel Moravec 2016-07-07 07:37:33 UTC
Description of problem:
Assume scenario:
- a goferd client connects to qdrouterd
- qdrouterd does reverse DNS lookup (PTR query) for client's IP address against a DNS server
- assume the DNS server is broken, such that it does not reply to the query or replies after say one minute
- qdrouterd while waiting to the response stops send any AMQP data to any other connection

Consequences:
- that causes inter-qdrouterd connection timeouts due to unresponded heartbeats
- any communication between Satellite and katello-agent is postponed/delayed, causing optionally task timeouts

Please backport https://issues.apache.org/jira/browse/DISPATCH-443 once the fix is available.


Version-Release number of selected component (if applicable):
qpid-dispatch-router-0.4-11.el7.x86_64


How reproducible:
100%


Steps to Reproduce:
0. Setup Satellite with Capsule, either with external DNS server
1. Break your DNS server such that it does not respond (on time) to some DNS PTR queries (i.e. remove some IP range from its managed rages)
2. Kick off goferd on a client such that DNS PTR query against its IP address is responded after a long time or never.
3. Observe no communication can flow through the qdrouterd where the goferd is connecting to - including inter-qdrouterd communication or new task (package install) request.


Actual results:
- Package installs to other clients will timeout (assuming the DNS query is still being "processed").
- inter-qdrouterd connection flapping (see https://access.redhat.com/solutions/2429011 for particular logs)


Expected results:
- other clients can communicate with the qdrouterd, they can accept and acknowledge tasks (to istall a package) etc.
- inter-qdrouterd connection is stable


Additional info:

Comment 2 Ted Ross 2016-07-07 20:01:04 UTC
A fix for this issue has been committed to the master branch upstream.

https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;a=patch;h=cf3c874

This is a low-risk update and is ready for back-port to the product builds if approved.

Comment 8 Pavel Moravec 2016-11-09 09:27:48 UTC
Hi Matej,
having Interconnect / qdrouterd knowledge, would you be able to reproduce (or even verify) this?

Comment 10 Bryan Kearney 2016-11-09 21:04:59 UTC
I am moving this to VERIFIED. We have not been able to reproduce the issue, and we have already deployed this code at certain customers with no negative imapct. Therefore, we are markign this as verified to deliver with 6.2.4. If you are still seeing this issue after 6.2.4 please feel free to re-open and provide additional information on how to reproduce.

Comment 12 errata-xmlrpc 2016-11-10 08:13:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2699

Comment 13 Red Hat Bugzilla 2023-09-14 03:27:42 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.