Bug 1241131

Summary: DNS server is not accessible by different overcloud hosts
Product: Red Hat OpenStack Reporter: Ofer Blaut <oblaut>
Component: rhosp-directorAssignee: James Slagle <jslagle>
Status: CLOSED ERRATA QA Contact: Ofer Blaut <oblaut>
Severity: urgent Docs Contact:
Priority: high    
Version: DirectorCC: calfonso, dmacpher, dprince, dsneddon, ggillies, hbrock, jdonohue, jslagle, mburns, oblaut, ohochman, rhel-osp-director-maint, rrosa
Target Milestone: gaKeywords: Triaged
Target Release: Director   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: instack-undercloud-2.1.2-21.el7ost Doc Type: Bug Fix
Doc Text:
Nodes would end up out of synchronization due to a lack of access to NTP servers. This was because not all nodes routed access to the required servers (NTP, DNS, etc). This fix sets the Undercloud as a gateway for non-Controller nodes. This provides non-Controller nodes with access to external services such as DNS and NTP, which aids synchronization.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-05 13:58:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1191185, 1243520    

Description Ofer Blaut 2015-07-08 14:11:03 UTC
Description of problem:

Following try to validate NTP server bug - https://bugzilla.redhat.com/show_bug.cgi?id=1233916#c17

It seems that if DNS server address is located on external or internal networks, some overcloud hosts will not reach it since not every host have external or internal network 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.can't ping DNS server configured during cloud deploy from CEPH/Compute nodes 

Actual results:

Expected results:

Additional info:

Comment 1 Mike Burns 2015-07-08 17:20:19 UTC
Dan, any ideas for how to resolve this?

Comment 3 Dan Sneddon 2015-07-09 21:06:20 UTC
One possible fix for this is a system management network. I'm tracking that in BZ 1240395.

There is an upstream patch for that bug which is in review: https://review.openstack.org/#/c/199800/

That probably won't make it into GA, but there is a slim chance.

Comment 4 Omri Hochman 2015-07-14 17:08:54 UTC
One of the bad outcomes of it is that ntpd on the nodes is not able to be sync correctly - so each node get different timing.

Comment 5 Dan Sneddon 2015-07-14 18:13:25 UTC
The latest on this is that Dan Prince is working on a patch which would allow the ctlplane to be used with static IPs and an external gateway. That would provide a way for nodes to reach NTP without having to rely on the undercloud as a SPOF.

Comment 6 Dan Sneddon 2015-07-14 19:29:01 UTC
(In reply to Dan Sneddon from comment #5)
> The latest on this is that Dan Prince is working on a patch which would
> allow the ctlplane to be used with static IPs and an external gateway. That
> would provide a way for nodes to reach NTP without having to rely on the
> undercloud as a SPOF.

Should read "reach DNS", but the same approach will also provide access to NTP.

Comment 7 Dan Prince 2015-07-15 13:48:33 UTC
When using network isolation we have the default gateway (which is likely used to access the DNS server) on different networks. For compute/ceph roles the default gateway is going to be on the ctlplane network. For the controller node it would be on the external network (public traffic).

So long as you're router (either the undercloud, or a real router) can route traffic from the ctlplane to the external network (where the external DNS server resides) I think this should work fine.

If however you put your DNS server on one of the isolated networks (internal_api, storage, etc) then it would only be accessible by select roles, unless again you've gone and put routes in place on your ctlplane router to handle this.

Could we just treat as a missing route issue? As in there needs to be a route added somewhere (either the undercloud or the gateway router) to handle this traffic?

Comment 8 Hugh Brock 2015-07-15 14:10:42 UTC
So the problem with comment 7 is:

* it requires a router, of some kind

* the router needs to not serve DHCP because neutron is going to do that

* there needs to be nothing else on the subnet behind the router, because neutron is going to be serving dhcp on it

I'm having a hard time imagining how we will get our *own* IT to let us set that up in the lab, much less explaining to a customer that that's what we need.

I think a better solution is probably to get all nodes onto a network which has external connectivity, but where we do not serve DHCP. What's wrong with just using the external API network for this? The controllers already use that network as a gateway if I'm not mistaken.

Alternatively we could define a gateway for the internal API network, but that seems a little screwy.

Comment 9 Hugh Brock 2015-07-15 14:29:42 UTC
OK, having said comment 8 ...

Is the least invasive solution here not simply setting the undercloud up as the external gateway for everything but the controllers? It's already routing traffic on that subnet and serving dhcp... I don't think we should leave it this way but I think we could ship with it.

Comment 10 James Slagle 2015-07-15 16:37:39 UTC
ok, i believe the only patch needed to set this up is to enable ip forwarding on the undercloud. i've linked Ben's patches for that.

Comment 14 errata-xmlrpc 2015-08-05 13:58:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.