Bug 728631

Summary: matahari agents fail to determine domain name for DNS SRV record query
Product: Red Hat Enterprise Linux 6 Reporter: Dave Johnson <dajohnso>
Component: matahariAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: Dave Johnson <dajohnso>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: matahari-maint, rbryant, whayutin
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: matahari-0.4.2-4.el6 Doc Type: Bug Fix
Doc Text:
No description needed
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:39:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dns config tgz file none

Description Dave Johnson 2011-08-05 21:01:33 UTC
Created attachment 516959 [details]
dns config tgz file

Description of problem:
========================
With a default matahari configuration and a resolvable _matahari DNS SRV record, the matahari agents fail to find the broker specified through DNS.  It was only when I specified MATAHARI_BROKER=domain_name (in this case, test.com) would the agent connect to the broker.  It seems like the failover to query for a DNS SRV fails to determine the domain name and ends up querying for _matahari._tcp.127.0.0.1 instead of _matahari._tcp.test.com.  This is seen in the DNS server's /var/log/messages: 

named[13961]: client 10.16.44.45#57274: query (cache) '_matahari._tcp.127.0.0.1/SRV/IN' denied

I setup three boxes, one for the dns server, one for the broker, and one for the agent.  DNS configuration is attached but it was a simple configuration which had both forward & reverse lookups for the broker and agent hosts as well as a DNS SRV record:

    _matahari._tcp			86400 IN SRV 1 1 49000 broker.test.com.

I updated the broker and agent DNS config and made sure `hostname -f` returned correctly (like broker.test.com) and `domainname` returned test.com.  Both the broker and the agent could resolve test.com: 

    [root@ibm-hs21-03 init.d]# nslookup test.com
    Server:		10.16.65.39
    Address:	10.16.65.39#53

    Name:	test.com
    Address: 10.16.65.39
    
    [root@ibm-hs21-03 init.d]# hostname
    agent
    [root@ibm-hs21-03 init.d]# domainname
    test.com
    [root@ibm-hs21-03 init.d]# hostname -f
    agent.test.com

Both could also query DNS for the matahari service:

    [root@ibm-hs21-03 init.d]# nslookup
    > set type=SRV
    > _matahari._tcp
    Server:		10.16.65.39
    Address:	10.16.65.39#53

    _matahari._tcp.test.com	service = 1 1 49000 broker.test.com.


I spoke to Adam who discovered that when starting the agent without a broker, the same thing seemed to happen.  Fairly sure this is related if not the same issue.

[root@ibm-hs21-03 init.d]# QPID_LOG_ENABLE=debug+ matahari-qmf-networkd
2011-08-05 16:07:46 info SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it.
2011-08-05 16:07:46 debug Created connection amqp:tcp:127.0.0.1:5672 with {}
2011-08-05 16:07:46 debug Created connection amqp:tcp:localhost:49000 with {reconnect:False}
2011-08-05 16:07:46 info Trying to connect to amqp:tcp:localhost:49000...
2011-08-05 16:07:46 debug Created IO thread: 0
2011-08-05 16:07:46 debug TCPConnector created for 0-10
2011-08-05 16:07:46 info Connection  connected to tcp:localhost:49000
2011-08-05 16:07:46 warning Connect failed: Connection refused
2011-08-05 16:07:46 warning Connection  closed
2011-08-05 16:07:46 debug Exception constructed: Connection  closed
2011-08-05 16:07:46 info Failed to connect to amqp:tcp:localhost:49000: Connection  closed
2011-08-05 16:07:46 debug Created connection amqp:tcp::49000 with {reconnect:False}
2011-08-05 16:07:46 info Trying to connect to amqp:tcp::49000...
2011-08-05 16:07:46 debug Exception constructed: Invalid URL: amqp:tcp::49000 (qpid/Url.cpp:237)
2011-08-05 16:07:46 debug Created connection amqp:tcp::49000amqp:tcp:localhost:49000 with {reconnect:False}
2011-08-05 16:07:46 info Trying to connect to amqp:tcp::49000amqp:tcp:localhost:49000...
2011-08-05 16:07:46 debug Exception constructed: Invalid URL: amqp:tcp::49000amqp:tcp:localhost:49000 (qpid/Url.cpp:237)


Version-Release number of selected component (if applicable):
matahari-0.4.2-2.el6.x86_64
matahari-agent-lib-0.4.2-2.el6.x86_64
matahari-broker-0.4.2-2.el6.x86_64
matahari-host-0.4.2-2.el6.x86_64
matahari-lib-0.4.2-2.el6.x86_64
matahari-network-0.4.2-2.el6.x86_64
matahari-service-0.4.2-2.el6.x86_64
matahari-sysconfig-0.4.2-2.el6.x86_64
qpid-cpp-client-0.10-6.el6.x86_64
qpid-cpp-client-devel-0.10-6.el6.x86_64
qpid-cpp-client-ssl-0.10-6.el6.x86_64
qpid-cpp-server-0.10-6.el6.x86_64
qpid-cpp-server-ssl-0.10-6.el6.x86_64
qpid-qmf-0.10-6.el6.x86_64
sigar-1.6.5-0.1.git833ca18.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1.  setup attached DNS config (obviously you will need to update appropriate IP addresses and associated subnet zones depending on hosts used)
2.  configure broker to use DNS
3.  configure agent to use DNS
4.  verify hostname, domainname, hostname -f returns correctly
5.  verify nslookup test.com resolves
6.  verify the matahari service record returns correctly (example above)
7.  verify default agent config /etc/sysconfig/matahari, MATAHARI_BROKER=127.0.0.1
8.  start matahari-broker on broker server
9.  start matahari-agent on agent system
10.  see agent's /var/log/messages not finding a broker, keeps retrying
11. see dns's /var/log/mesages error _matahari._tcp.127.0.0.1/SRV/IN' denied
12. stop matahari agent
13. set /etc/sysconfig/matahari, MATAHARI_BROKER=test.com
14. start matahari-agent on agent system
15. see agent's /var/log/messages, agent connects to broker

Comment 2 Andrew Beekhof 2011-08-09 04:52:34 UTC
A related patch has been committed upstream: https://github.com/beekhof/matahari/commit/e878490

Comment 3 Andrew Beekhof 2011-08-09 07:57:43 UTC
A related patch has been committed upstream: https://github.com/beekhof/matahari/commit/e878490

Comment 5 Dave Johnson 2011-09-13 13:48:09 UTC
good 2 go in v0.4.2-2

[root@agent ~]# /usr/sbin/matahari-qmf-networkd -vvv
mh_hostname: Got hostname: agent.test.com
mh_dnsdomainname: Got dnsdomainname: 'test.com'
mh_connect: SRV query successful: _matahari._tcp.test.com
mh_connect: Trying: amqp:tcp:broker.test.com:49000
mh_os_uuid: Got uuid: ce76da54cbe7c1d9ed1d44ce00000023
mh_hostname: Got hostname: agent.test.com
mh_hostname: Got hostname: agent.test.com
mh_os_uuid: Got uuid: ce76da54cbe7c1d9ed1d44ce00000023
mainloop_add_qmf: Added source: 1
run: Starting agent mainloop

Comment 6 Russell Bryant 2011-11-16 21:28:39 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No description needed

Comment 7 errata-xmlrpc 2011-12-06 11:39:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1569.html