Bug 731858

Summary: matahari agents fail to use broker from DNS SRV record
Product: Red Hat Enterprise Linux 6 Reporter: Dave Johnson <dajohnso>
Component: matahariAssignee: Russell Bryant <rbryant>
Status: CLOSED ERRATA QA Contact: Dave Johnson <dajohnso>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: matahari-maint, rbryant
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: matahari-0.4.3-1.el6 Doc Type: Bug Fix
Doc Text:
No description required
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:40:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743047    

Description Dave Johnson 2011-08-18 20:01:31 UTC
Description of problem:
===================================
Trying to verify bug 728631 but it is not working.  I can see that the agent does pull the correct hostname for the broker from the dns srv record but it fails to use that value on the connection attempt.  If I pass the broker hostname to the agent, it does successfully connect.  Not sure but I wonder if changes from bug 731534 are related.

[root@agent ~]# matahari-qmf-hostd --reconnect=yes -vvv
mh_domainname: Got domainname: test.com
mh_connect: SRV record resolved to: broker.test.com
mh_connect: Trying: amqp:tcp:localhost:49000
2011-08-18 15:43:44 warning Connect failed: Connection refused
2011-08-18 15:43:44 warning Connection  closed
2011-08-18 15:43:47 warning Connect failed: Connection refused
2011-08-18 15:43:47 warning Connection  closed
^C[root@agent ~]# matahari-qmf-hostd --reconnect=yes -vvv -b broker.test.com
mh_connect: Trying: amqp:tcp:broker.test.com:49000
mh_uuid: Got uuid: 2d7a1209f4aa98ef0b2ed34a00000027
mh_hostname: Got hostname: agent.test.com
mh_uuid: Got uuid: 2d7a1209f4aa98ef0b2ed34a00000027
mh_hostname: Got hostname: agent.test.com
mainloop_add_qmf: Added source: 1
heartbeat: Updating stats: 1 5
mh_hostname: Got hostname: agent.test.com
run: Starting agent mainloop


Version-Release number of selected component (if applicable):
=================================================================
v0.4.2-7

How reproducible:
==================
100%

Steps to Reproduce:
========================
1.  get two systems configured with dns
2.  add dns srv record for matahari
3.  install/start matahari broker 
4.  on separate system, install matahari agent
5.  start matahari agent
6.  on broker, see that the agent is not connected
7.  on agent, stop agent
8.  on agent, see that it can connect to broker, matahari-qmf-hostd --reconnect=yes -vvv -b <broker_addr>
9.  see that it pulls the dns srv record but doesn;t use it, matahari-qmf-hostd --reconnect=yes -vvv

  
Actual results:
==================
Fails to connect

Expected results:
==================
Successful connection

Comment 2 Russell Bryant 2011-08-19 11:43:27 UTC
I merged a patch to fix this upstream:

https://github.com/matahari/matahari/commit/bc4fc3f8cc2961329666d5e54248e76401383fe0

Comment 4 Dave Johnson 2011-08-29 17:50:47 UTC
This still doesn't seem to work with or without passing -D <domain>

[root@dell-pem610-01 matahari]# ping broker.test.com
PING broker.test.com (10.16.65.39) 56(84) bytes of data.
64 bytes from broker.test.com (10.16.65.39): icmp_seq=1 ttl=64 time=1.68 ms

[root@dell-pem610-01 matahari]# dnsdomainname
test.com

[root@dell-pem610-01 matahari]# hostname -f
agent2.test.com

[root@dell-pem610-01 matahari]# dig SRV _matahari._tcp.test.com

; <<>> DiG 9.7.3-P3-RedHat-9.7.3-6.P3.el6 <<>> SRV _matahari._tcp.test.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46571
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2

;; QUESTION SECTION:
;_matahari._tcp.test.com.	IN	SRV

;; ANSWER SECTION:
_matahari._tcp.test.com. 86400	IN	SRV	1 1 49000 broker.test.com.

;; AUTHORITY SECTION:
test.com.		172800	IN	NS	ns.test.com.

;; ADDITIONAL SECTION:
broker.test.com.	172800	IN	A	10.16.65.39
ns.test.com.		172800	IN	A	10.16.120.65

;; Query time: 1 msec
;; SERVER: 10.16.120.65#53(10.16.120.65)
;; WHEN: Mon Aug 29 13:44:59 2011
;; MSG SIZE  rcvd: 125


[root@dell-pem610-01 matahari]# /usr/sbin/matahari-qmf-hostd -vvv
mh_connect: Could not resolve SRV record for _matahari._tcp.
mh_connect: Trying: amqp:tcp:localhost:49000
2011-08-29 13:43:28 warning Connect failed: Connection refused
2011-08-29 13:43:28 warning Connection  closed
2011-08-29 13:43:31 warning Connect failed: Connection refused
2011-08-29 13:43:31 warning Connection  closed

[root@dell-pem610-01 matahari]# /usr/sbin/matahari-qmf-hostd -vvv -D test.com
mh_connect: Could not resolve SRV record for _matahari._tcp.
mh_connect: Trying: amqp:tcp:localhost:49000
2011-08-29 13:45:31 warning Connect failed: Connection refused
2011-08-29 13:45:31 warning Connection  closed

Comment 5 Dave Johnson 2011-08-29 17:52:35 UTC
comment 4 should have included this as well

[root@dell-pem610-01 matahari]# /usr/sbin/matahari-qmf-hostd -vvv -b broker.test.com
mh_connect: Trying: amqp:tcp:broker.test.com:49000
mh_uuid: Got uuid: 9c12c5c52efedcfa7d43d68400000040
mh_hostname: Got hostname: agent2.test.com
mh_uuid: Got uuid: 9c12c5c52efedcfa7d43d68400000040
mh_hostname: Got hostname: agent2.test.com
mainloop_add_qmf: Added source: 1
heartbeat: Updating stats: 1 5
mh_hostname: Got hostname: agent2.test.com
run: Starting agent mainloop
^C2011-08-29 13:51:39 warning Connection [10.16.66.71:54955-broker.test.com:49000] closed

[root@dell-pem610-01 matahari]# rpm -qa | grep matahari
matahari-lib-0.4.2-9.el6.x86_64
matahari-agent-lib-0.4.2-9.el6.x86_64
matahari-host-0.4.2-9.el6.x86_64

Comment 6 Russell Bryant 2011-08-29 18:15:41 UTC
Please try this:

# matahari-qmf-hostd -vvv -b test.com -D

Comment 7 Russell Bryant 2011-08-30 17:42:34 UTC
I verified that the code in Matahari is correct and that the failure occurs when we do not get a domain name from sigar.

I looked at the sigar code and it is simply using getdomainname().  So, the system must have it set for this to work.  You can check it via the domainname command:

    # domainname

You can set it (temporarily, anyway) via the same command:

    # domainname example.com

and then it works as expected.

However, it is worth noting that domainname and dnsdomainname are not the same thing, and dnsdomainname is really what we want for this.  I think what sigar is providing is correct, but it's just not the information we want.  I will need to figure out how to most appropriately get the dnsdomainname and update Matahari to use that, instead.

Comment 8 Russell Bryant 2011-08-30 19:11:30 UTC
I have posted a patch to the mailing list.  I'm going to wait a little bit for feedback before I merge to the upstream repo.

https://github.com/russellb/matahari/commit/3f9905a21f7147e77313e0821a13b2f87e007ee1

Comment 9 Russell Bryant 2011-08-30 21:00:04 UTC
Well, I was going to wait longer but accidentally pushed this while pushing something else.  So, it's done unless someone has additional feedback.

https://github.com/matahari/matahari/commit/de1d1425a7e0b0a9e3c1c5e49d7c3d3d7a2dd8b4

Comment 10 Dave Johnson 2011-09-13 13:44:47 UTC
good 2 go in v0.4.2-2

[root@agent ~]# /usr/sbin/matahari-qmf-hostd -vvv
mh_hostname: Got hostname: agent.test.com
mh_dnsdomainname: Got dnsdomainname: 'test.com'
mh_connect: SRV query successful: _matahari._tcp.test.com
mh_connect: Trying: amqp:tcp:broker.test.com:49000
mh_os_uuid: Got uuid: ce76da54cbe7c1d9ed1d44ce00000023
mh_hostname: Got hostname: agent.test.com
mh_os_uuid: Got uuid: ce76da54cbe7c1d9ed1d44ce00000023
mh_hostname: Got hostname: agent.test.com
mainloop_add_qmf: Added source: 1
heartbeat: Updating stats: 1 5
mh_hostname: Got hostname: agent.test.com

Comment 11 Russell Bryant 2011-11-16 21:45:34 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No description required

Comment 12 errata-xmlrpc 2011-12-06 11:40:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1569.html